Big Data Patterns | Design Patterns | Cloud-based Big Data Processing

Big Data Patterns, Mechanisms > Data Processing Patterns > Cloud-based Big Data Processing

Home > Design Patterns > Cloud-based Big Data Processing

Cloud-based Big Data Processing (Buhler, Erl, Khattak)

How can large amounts of data be processed without investing in any Big Data processing infrastructure and only paying for the amount of time the processing resources are actually used?

Problem

Building a large enough cluster for processing high volume data not only requires upfront investment but also suffers from underutilization, resulting in waste.

Solution

Instead of creating an in-house cluster, cloud processing resources are utilized for processing large datasets as a cost-saving measure.

Application

A data processing engine deployed on a cluster of machines in the cloud is used to process data on a pay-per-use basis.

Mechanisms

Coordination Engine, Processing Engine, Resource Manager, Storage Device

A processing engine deployed in a cloud environment is used. Instead of using the in-house cluster, the processing engine makes use of cloud-provided cluster. Apart from requiring the IT team to have cloud skills, the application of this pattern further requires datasets to be available from cloud-based storage device(s). Hence, the Cloud-based Big Data Processing pattern is applied together with the Cloud-based Big Data Storage pattern.

Cloud-based Big Data Processing: Cloud processing resources are used to process large amounts of data while only paying for the duration during which the processing resources are in use. The elastic nature of the cloud can further be utilized to scale-out or scale-in instantly as per the processing load. This also enables running Big Data projects independently from the in-house systems, such as for ad-hoc data analysis or setting up a proof-of-concept Big Data solution environment.

Cloud processing resources are used to process large amounts of data while only paying for the duration during which the processing resources are in use. The elastic nature of the cloud can further be utilized to scale-out or scale-in instantly as per the processing load. This also enables running Big Data projects independently from the in-house systems, such as for ad-hoc data analysis or setting up a proof-of-concept Big Data solution environment.

A large dataset needs to be processed towards the end of the day using a cloud-based cluster.
The cluster remains in use for thirty minutes.
Once the processing is complete, the processing resources are returned to the pool of resources.
The enterprise only incurs a thirty-minute usage charge each day.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 11: Advanced Big Data Architecture.

This pattern is covered in BDSCP Module 11: Advanced Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.