Big Data Patterns, Mechanisms > Storage Patterns > Cloud-based Big Data Storage
Cloud-based Big Data Storage (Buhler, Erl, Khattak)
How can large amounts of data be stored without investing in any Big Data storage infrastructure and only paying for the used storage space?
Problem
Solution
Application
Mechanisms
A distributed file system or a NoSQL deployed in the cloud is used for data storage. The application of this pattern requires the IT team to have cloud skills, such as knowledge of cloud provider-specific APIs, in order to import data into the cloud and be able to manipulate data. The Cloud-based Big Data Storage pattern is generally applied together with the Cloud-based Big Data Processing pattern. When applying the Cloud-based Big Data Storage pattern, data processing delays may occur if the datasets are not already in the cloud due to the time required to import large datasets into the cloud.
The pay-per-use and elastic nature of the cloud is put to use by storing data in the cloud. Use of cloud also provides more scalability than in-house cluster potentially due the larger infrastructure backing of the cloud provider.
In the diagram, importing large amounts of structured, semi-structured and unstructured data requires a cluster-based storage infrastructure. The enterprise opts for cloud storage to save the data. The resulting costs are within the allocated IT budget of the enterprise, for the enterprise only pays for the amount of storage space utilized.