Big Data Patterns | Design Patterns | High Volume Tabular Storage

Big Data Patterns, Mechanisms > Storage Patterns > High Volume Tabular Storage

Home > Design Patterns > High Volume Tabular Storage

High Volume Tabular Storage (Buhler, Erl, Khattak)

How can large amounts of non-relational data be stored in a table-like form where each record may consist of a very large number of fields or related groups of fields?

Problem

Traditional database technologies do not support storing related groups of columns as a single column and suffer from performance issues when rows with a very large number of columns are stored.

Solution

The data is stored in a cluster-based storage technology that supports table-like storage with the ability to group related columns together inside a parent column.

Application

A NoSQL-based Big Data storage technology is used that provides row/column abstraction and enables storing and retrieving multiple key-value pairs inside a column and further provides an SQL-like or API-based interface for create, read, update and delete (CRUD) operations.

Mechanisms

Serialization Engine, Storage Device

A column-family NoSQL database is used to enable the High Volume Tabular Storage pattern. Such a database normally allows adding multiple key-value pairs under a column and further allows rows within the same table to have different columns. Some level of schema conformance can be achieved by specifying a table schema before the table is populated. Some column-family implementations may support generic data types such as integer, float and double, while others may persist data within columns in binary form, in which case some serialization may be required before data is stored and deserialization when data is retrieved. Such databases may provide SQL-like or API-based access.

This pattern is also applicable when a relational database needs replacing with a highly scalable alternative, provided that ACID support is not required.

High Volume Tabular Storage: A database based on NoSQL technology is used that is capable of storing data in a hierarchical format and understanding the internal structure of the data. Saving data based on a nested structure further enables relational-like storage such that the related child table records can be embedded inside the parent table record.

A database based on NoSQL technology is used that is capable of storing data in a hierarchical format and understanding the internal structure of the data. Saving data based on a nested structure further enables relational-like storage such that the related child table records can be embedded inside the parent table record.

A dataset consists of rows such that each record consists of one million attributes.
The user uses a column-family NoSQL database to import the dataset.
The import is a success as the database can store more than billion attributes.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 10: Fundamental Big Data Architecture

This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.