Big Data Patterns, Mechanisms > Mechanisms > Data Transfer Engine
Data Transfer Engine
Data needs to be imported before it can be processed by the Big Data solution. Similarly, processed data may need to be exported to other systems before it can be used outside of the Big Data solution.
A data transfer engine enables data to be moved in or out of Big Data solution storage devices. Unlike other data processing systems, where input data conforms to a schema and is mostly structured, data sources for a Big Data solution tend to include a mix of structured and unstructured data.
A given data transfer engine may support either data ingress or egress functions, in which case it can be further qualified as follows:
- data transfer ingress engine
- data transfer egress engine
Data transfer ingress and egress functionality can be further grouped into the following categories:
- event (ingress only)
- file (ingress and egress)
- relational (ingress and egress)
A data transfer engine generally provides only one of the listed functions. It is common for multiple different data transfer engines to be part of a Big Data solution to facilitate a range of import and export requirements for different types of data.
Event-based data transfer ingress engines generally use a publish-subscribe model based on the use of a queue to ensure high reliability and availability. These engines may provide the agent-based processing of inflight data, which enables various data cleansing and transformation activities to be performed in realtime.
Data transfer engines enable the substitution of data that is distributed across a range of sources residing in multiple systems outside of the Big Data solution. A data transfer engine may internally use a processing engine to process multiple large datasets in parallel. This allows large amounts of data to be imported or exported within a short period of time. A workflow engine may provide integration with a data transfer engine to enable the automated import and export of data. Figure 1 provides an example of a data transfer engine.
Figure 1 – A data transfer engine imports data from two different databases (1a,1b). However, the actual import jobs are run by the processing engine (2), which executes the import jobs and then persists the imported data to the storage device (3).
Related Patterns:
- Automated Dataset Execution
- Canonical Data Format
- Cloud-based Big Data Storage
- Data Size Reduction
- Fan-in Ingress
- Fan-out Ingress
- File-based Sink
- File-based Source
- High Velocity Realtime Processing
- Indirect Data Access
- Large-Scale Batch Processing
- Large-Scale Graph Processing
- Realtime Access Storage
- Relational Sink
- Relational Source
- Streaming Egress
- Streaming Source