Big Data Patterns | Design Patterns | High Volume Binary Storage

Big Data Patterns, Mechanisms > Storage Patterns > High Volume Binary Storage

Home > Design Patterns > High Volume Binary Storage

High Volume Binary Storage (Buhler, Erl, Khattak)

How can a variety of unstructured data be stored in a scalable manner such that it can be randomly accessed based on a unique identifier?

Problem

Storing very large amounts of unstructured data in traditional database technologies not only incurs performance penalty but also suffers from scalability issues as the amount of data increases.

Solution

Unstructured data is stored based on a simple cluster-based storage technique that implements accessing data units via keys.

Application

A NoSQL-based Big Data storage technology is used that treats each data unit as binary data and provides access to it via unique key such that each data unit can be retrieved, replaced or deleted individually.

Mechanisms

Serialization Engine, Storage Device

A key-value NoSQL data is introduced within the Big Data platform. Such a database generally provides API-based access for inserting, selecting and deleting data without any support for partial updates, as the database has no inner knowledge about the structure of the data it stores. Such a NoSQL database is good for storing large amounts of data in its raw form because all of the data gets stored as a binary object. Furthermore, a key-value NoSQL database can also be utilized where the use case involves high-speed read and write operations.

Apart from a generic disk-based, NoSQL and key-value database, a memory-based storage device, such as a memory grid that provides key-value storage, can also be used to gain the same functionality with the added benefit of low latency data access.

It should be noted that the application of the High Volume Binary Storage pattern delegates the responsibility of interpreting (serialization/deserialization) the data to the client that reads the data. Hence, the successful read of the data by any client requires knowledge about the nature of the data being stored. Also, as the access is only possible via the key, some logical key naming nomenclature may need to be implemented for quick retrieval of the required data units.

High Volume Binary Storage: A contemporary database solution is implemented that supports scaling out and stores data as a binary large object (BLOB) that can be accessed based on an identifier.

A contemporary database solution is implemented that supports scaling out and stores data as a binary large object (BLOB) that can be accessed based on an identifier.

A user tries to import a very large binary file into a key-value NoSQL database.
The operation succeeds and the database assigns a key to the stored file.
The user later requests the database for data with the same key.
The previously stored, very large binary file is returned to the user in its original format.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 10: Fundamental Big Data Architecture

This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.