Big Data Patterns, Mechanisms > Data Transfer and Transformation Patterns > Indirect Data Access
Indirect Data Access (Buhler, Erl, Khattak)
How can traditional BI tools access data stored in Big Data storage technologies without having to make separate connections to these technologies?
Problem
Solution
Application
A relational data transfer engine is used for exporting the data from the storage device to the data warehouse. However, before data can be exported, normally some level of formatting or restructuring may be required, as the data stored by the storage device is in a non-relational form. Additionally, data will need to be normalized if the Data Denormalization pattern was applied previously.
An indirect approach is adopted by using the data warehouse as an intermediary. The required data is exported from the Big Data platform to the data warehouse. As the BI tool can connect to the data warehouse, the exported data can now be accessed by the BI tool.
- A Big Data platform holds a large dataset of processed data in a NoSQL database.
- A data warehouse holds enterprise-wide data from multiple OLTP systems.
- An analyst uses a BI tool to query the data warehouse and generate some reports.
- The analyst needs to create a report based on the processed data in the Big Data platform.
- A relational data transfer engine is used to export the dataset to the data warehouse.
- The analyst uses the BI tool to query the newly imported processed dataset in the warehouse and retrieves the required data to generate the report.