Big Data Patterns | Design Patterns | Automated Processing Metadata Insertion

Big Data Patterns, Mechanisms > Data Processing Patterns > Automated Processing Metadata Insertion

Home > Design Patterns > Automated Processing Metadata Insertion

Automated Processing Metadata Insertion (Buhler, Erl, Khattak)

How can confidence be instilled in results whose computation involves applying a series of processing steps in a Big Data environment?

Problem

Analytics results obtained through the execution of a number of processing steps without knowing how the results were computed can create doubts about the validity and accuracy of the acquired results.

Solution

Machine-readable information about each processing step is automatically added to the output of each processing step.

Application

Code is added within processing routines that automatically appends details about each processing step to the output as metadata.

Mechanisms

Productivity Portal, Coordination Engine, Processing Engine, Query Engine, Resource Manager, Storage Device

A particular data structure is standardized upon. Then details about the various operations that are applied during the course of the different processing runs are added as metadata based on the standardized data structure. The appending of the metadata is performed automatically via code that is inserted within the processing routines of the processing engine.

In case the data is manipulated via the query engine, depending on the functionality provided by the query engine, support for accessing metadata may need to be added to the query engine. The addition of metadata in machine-readable form eliminates the requirement of humans for interpreting metadata.

This pattern can also be applied in association with the Complex Logic Decomposition pattern or Intermediate Results Storage pattern to provide details about intermediate processing steps.

Automated Processing Metadata Insertion: Details about the operation(s) applied to the data during each processing step are automatically added in a machine-readable format to the output of the respective processing step as metadata. A form of interface, textual or graphical, is provided for the user to view the metadata. The application of the Automated Processing Metadata Insertion pattern also facilitates testing, debugging and code management.

Details about the operation(s) applied to the data during each processing step are automatically added in a machine-readable format to the output of the respective processing step as metadata. A form of interface, textual or graphical, is provided for the user to view the metadata. The application of the Automated Processing Metadata Insertion pattern also facilitates testing, debugging and code management.

In the diagram, a statistic, x, needs to be computed from a dataset. The computation involves multiple processing steps consisting of data cleansing, data transformation and the application of an algorithm. At the end of each step, metadata is added to the output, with details about the operations performed on the input. When the user views the statistic, the have high confidence about the validity of the statistic, for they have access to the metadata that tells them how the statistic was calculated.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 11: Advanced Big Data Architecture.

This pattern is covered in BDSCP BDSCP Module 11: Advanced Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.