The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming. Analysts working with Big Data solutions are not expected to know how to program processing engines.
A query engine abstracts the processing engine from end-users by providing a front-end user interface that can be used to query underlying data, along with features for creating query execution plans.
Languages that are more familiar and easier to work with, such as SQL, can be used by non-technical users to perform ETL tasks and run ad-hoc queries for data analysis activities. Common processing functions performed by a query engine include sum, average, group by, join, and sort.
Under the hood, the query engine seamlessly transforms user queries into the relevant low-level code that can be used by the processing engine. The use of query engines can reduce development time and enable the manipulation of large datasets without the need to write complex programming logic. Figure 1 provides an example of a query engine.
Figure 1 – A client performs a simple aggregation query on the data persisted in the storage device (1). The query engine creates a query execution plan and creates jobs that need to be executed on the processing engine (2). The processing engine retrieves the required data from the storage device (3) and then executes the required jobs. The results are then forwarded to the query engine (4), which sends the results back to the client after further processing (5).