Serialization is the process of transforming objects or data entities into bytes for persistence (in memory or on disk) or transportation from one machine to another over a network. The opposite transformation process from bytes to objects or data entities is called deserialization.
A serialization engine provides the ability to serialize and deserialize data in a Big Data platform. In Big Data platforms, serialization is required for establishing communication between machines by exchanging messages between them, and for persisting data.
The serialized bytes can either be encoded using a binary format or a plain-text format. Different serialization engines may provide different levels of speed, extensibility and interoperability.
Ideally, a serialization engine should serialize/deserialize data at a fast speed, be amenable to future changes and work with a variety of data producers and consumers. These goals are achieved in part by serializing and deserializing data into and out of non-proprietary formats, such as XML, JSON and BSON.
Figure 1 – A customer object needs to be persisted to the disk. A serialization engine (1) serializes the customer object using a JSON plain-text format (2) which is then persisted to a NoSQL database (3).