A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine to ensure operational consistency across all of the participating servers. Coordination engines make it possible to develop highly reliable, highly available distributed Big Data solutions that can be deployed in a cluster.
The processing engine will often use the coordination engine to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic.
The coordination engine can also be used for the following purposes, as shown in Figure 1:
- to support distributed locks
- to support distributed queues
- to establish a highly available registry for obtaining configuration information
- for reliable asynchronous communication between processes that are running on different servers
Figure 1 – Two nodes in a cluster need to write to a shared queue as part of executing a job, and both send a write request at the same time (1a, 1b). The write request is coordinated by the coordination engine. One request is sent to the queue (2) before the other request is sent in a serialized manner (3).
- Automated Dataset Execution
- Automated Processing Metadata Insertion
- Canonical Data Format
- Cloud-based Big Data Processing
- Complex Logic Decomposition
- Dataset Denormalization
- Direct Data Access
- High Velocity Realtime Processing
- Intermediate Results Storage
- Large-Scale Batch Processing
- Large-Scale Graph Processing
- Processing Abstraction
- Realtime Access Storage
- Streaming Egress