Machine Learning Patterns, Mechanisms > Mechanisms > Data Transfer Engine
Data Transfer Engine (Erl, Khattak)
Data needs to be imported before it can be processed by a machine learning system. Similarly, processed data may need to be exported to other applications before it can be used outside of the machine learning environment.
A data transfer engine enables data to be moved in or out of machine learning system storage devices. Unlike other data processing systems, where input data conforms to a schema and is mostly structured, data sources for a machine learning system can include a mix of structured and unstructured data.
Data transfer engines enable the substitution of data that is distributed across a range of sources residing in multiple systems. A data transfer engine may internally use a processing engine to process multiple large datasets in parallel. This allows large amounts of data to be imported or exported within a short period of time. A workflow engine may provide integration with a data transfer engine to enable the automated import and export of data.
A data transfer engine imports data from two different databases (1a,1b). However, the actual import jobs are run by the processing engine (2), which executes the import jobs and then persists the imported data to the storage device (3).