Big Data Patterns, Mechanisms > Data Processing Patterns > Cloud-based Big Data Processing
Cloud-based Big Data Processing (Buhler, Erl, Khattak)
How can large amounts of data be processed without investing in any Big Data processing infrastructure and only paying for the amount of time the processing resources are actually used?
Problem
Solution
Application
Mechanisms
A processing engine deployed in a cloud environment is used. Instead of using the in-house cluster, the processing engine makes use of cloud-provided cluster. Apart from requiring the IT team to have cloud skills, the application of this pattern further requires datasets to be available from cloud-based storage device(s). Hence, the Cloud-based Big Data Processing pattern is applied together with the Cloud-based Big Data Storage pattern.
Cloud processing resources are used to process large amounts of data while only paying for the duration during which the processing resources are in use. The elastic nature of the cloud can further be utilized to scale-out or scale-in instantly as per the processing load. This also enables running Big Data projects independently from the in-house systems, such as for ad-hoc data analysis or setting up a proof-of-concept Big Data solution environment.
- A large dataset needs to be processed towards the end of the day using a cloud-based cluster.
- The cluster remains in use for thirty minutes.
- Once the processing is complete, the processing resources are returned to the pool of resources.
- The enterprise only incurs a thirty-minute usage charge each day.