Big Data Patterns, Mechanisms > Storage Patterns > High Volume Linked Storage
High Volume Linked Storage (Buhler, Erl, Khattak)
How can very large datasets comprising entities that are connected together be stored in a way that enables efficient analysis of such connected entities?
A graph NoSQL database is used to store linked data. Entities with attributes as key-value pairs are stored as vertices, while the connections between the vertices are stored as edges. Each edge can also contain key-value attributes that can be used fine-tune the query criteria. In order for the database to service link-based queries, the database requires that the edges are explicitly defined between the vertices.
A link-aware storage device is used that not only supports storing very large amount of entities (records) but also provides a means for adding links between the entities. Such a storage device enables finding entities based on a direct or indirect connection between them.
- A dataset consists of details about a road network where intersections are connected with other intersections via road segments.
- A user uses a graph NoSQL database to store this dataset.
- The operation succeeds because the graph database stores each intersection as an entity (record) while also allowing the addition of links between them as road segments.