Big Data Patterns, Mechanisms > Data Processing Patterns > File-based Sink
File-based Sink (Buhler, Erl, Khattak)
How can processed data be ported from a Big Data platform to systems that use proprietary, non-relational storage technologies?
![File-based Sink](https://patterns.arcitura.com/wp-content/uploads/2018/09/file_based_sink.png)
Problem
Solution
Application
Mechanisms
A file data transfer engine is used that is configured to copy data from the storage device to a target location, such as a directory location or a URI. The file data transfer engine may internally use polling or some file watcher-based functionality to copy files from the source location. It should be noted that the file that needs to be copied over to the target system’s location may not be in the correct format or model. Consequently, some processing may be required to put the file in the required format or model.
![File-based Sink: Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems. File-based Sink: Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems.](https://patterns.arcitura.com/wp-content/uploads/2018/09/fig1-161.png)
Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems.
- The user configures the file data transfer engine mechanism to export textual data from the Big Data platform to the specified location.
- Delimited files containing textual data are automatically copied from the storage device by the file data transfer engine.
- The file data transfer engine then automatically inserts the delimited files into the configured location.
This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.
For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.
The official textbook for the BDSCP curriculum is:
Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)
Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.