Big Data Patterns | Design Patterns | Relational Sink


Big Data Patterns, Mechanisms > Data Processing Patterns > Relational Sink
Home > Design Patterns > Relational Sink

Relational Sink (Buhler, Erl, Khattak)

How can large amounts of processed data be ported from a Big Data platform directly to a relational database?

Relational Sink

Problem

Exporting processed data as a delimited file first and then importing it into the required relational database is not only time-consuming but also inefficient.

Solution

A direct connection is made from within the Big Data platform to the backend relational database for exporting relational data.

Application

A data transfer engine is used that employs different connectors to directly connect to different relational databases and execute SQL queries for inserting or updating data into the required table.

A relational data transfer engine component is added to the Big Data platform. Different drivers and connectors are internally used by the relational data transfer engine to connect to different types of relational databases. The user specifies the connection string of the relational database and the table to which data needs exporting. Depending upon the capabilities of the relational data transfer engine, the relational data transfer engine may internally make use of a processing engine that parallelizes the export process by executing multiple SQL commands (INSERT/UPDATE) in parallel. Based on the availability of suitable connectors, the Relational Sink pattern can also be applied to populate data warehouses.

The application of the Relational Sink pattern may be impeded if a database-specific connector is not available. A generic connector can generally be used in such a situation. However, the data export performance may suffer.

Relational Sink: The Big Data platform is enabled to make a direct connection to the relational database, and data is transferred as a batch from the storage device. The export process can further be scheduled to automatically update the relational database whenever fresh computational results are available.

The Big Data platform is enabled to make a direct connection to the relational database, and data is transferred as a batch from the storage device. The export process can further be scheduled to automatically update the relational database whenever fresh computational results are available.

  1. The user configures the relational data transfer engine to extract the required data from the storage device.
  2. The relational data transfer engine mechanism automatically extracts the required data from the storage device.
  3. The relational data transfer engine then automatically inserts the data into the relational database without requiring any human intervention.

BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 10: Fundamental Big Data Architecture

This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

Big Data Fundamentals

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.