Big Data Patterns | Design Patterns | High Volume Linked Storage

Big Data Patterns, Mechanisms > Storage Patterns > High Volume Linked Storage

Home > Design Patterns > High Volume Linked Storage

High Volume Linked Storage (Buhler, Erl, Khattak)

How can very large datasets comprising entities that are connected together be stored in a way that enables efficient analysis of such connected entities?

Problem

Big Data storage technologies generally employ non-relational, aggregate-based data storage strategies. However, such technologies cannot be used to efficiently store and analyze data that consists of very large groups of entities that are connected together via a logical connection.

Solution

A specialized cluster-based storage technology is used that allows specifying connections between entities.

Application

A NoSQL-based Big Data storage technology is used that stores each data unit as a node or a vertex and the logical connection between two vertices as an edge and further enables querying the vertices based on the existence of edges between them.

Mechanisms

Serialization Engine, Storage Device

A graph NoSQL database is used to store linked data. Entities with attributes as key-value pairs are stored as vertices, while the connections between the vertices are stored as edges. Each edge can also contain key-value attributes that can be used fine-tune the query criteria. In order for the database to service link-based queries, the database requires that the edges are explicitly defined between the vertices.

High Volume Linked Storage: A link-aware storage device is used that not only supports storing very large amount of entities (records) but also provides a means for adding links between the entities. Such a storage device enables finding entities based on a direct or indirect connection between them.

A link-aware storage device is used that not only supports storing very large amount of entities (records) but also provides a means for adding links between the entities. Such a storage device enables finding entities based on a direct or indirect connection between them.

A dataset consists of details about a road network where intersections are connected with other intersections via road segments.
A user uses a graph NoSQL database to store this dataset.
The operation succeeds because the graph database stores each intersection as an entity (record) while also allowing the addition of links between them as road segments.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 10: Fundamental Big Data Architecture

This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.