Big Data Patterns, Mechanisms > Data Processing Patterns > File-based Sink
File-based Sink (Buhler, Erl, Khattak)
How can processed data be ported from a Big Data platform to systems that use proprietary, non-relational storage technologies?
Problem
Solution
Application
Mechanisms
A file data transfer engine is used that is configured to copy data from the storage device to a target location, such as a directory location or a URI. The file data transfer engine may internally use polling or some file watcher-based functionality to copy files from the source location. It should be noted that the file that needs to be copied over to the target system’s location may not be in the correct format or model. Consequently, some processing may be required to put the file in the required format or model.
Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems.
- The user configures the file data transfer engine mechanism to export textual data from the Big Data platform to the specified location.
- Delimited files containing textual data are automatically copied from the storage device by the file data transfer engine.
- The file data transfer engine then automatically inserts the delimited files into the configured location.