Big Data Patterns, Mechanisms > Storage Patterns > High Volume Tabular Storage
High Volume Tabular Storage (Buhler, Erl, Khattak)
How can large amounts of non-relational data be stored in a table-like form where each record may consist of a very large number of fields or related groups of fields?
Problem
Solution
Application
Mechanisms
A column-family NoSQL database is used to enable the High Volume Tabular Storage pattern. Such a database normally allows adding multiple key-value pairs under a column and further allows rows within the same table to have different columns. Some level of schema conformance can be achieved by specifying a table schema before the table is populated. Some column-family implementations may support generic data types such as integer, float and double, while others may persist data within columns in binary form, in which case some serialization may be required before data is stored and deserialization when data is retrieved. Such databases may provide SQL-like or API-based access.
This pattern is also applicable when a relational database needs replacing with a highly scalable alternative, provided that ACID support is not required.
A database based on NoSQL technology is used that is capable of storing data in a hierarchical format and understanding the internal structure of the data. Saving data based on a nested structure further enables relational-like storage such that the related child table records can be embedded inside the parent table record.
- A dataset consists of rows such that each record consists of one million attributes.
- The user uses a column-family NoSQL database to import the dataset.
- The import is a success as the database can store more than billion attributes.