Lambda architecture and batch processing
The Lambda architecture had to be involved in the discussion of batch processing since batch processing is one of the layers in the Lambda architecture, and we are building a batch processing layer for a data-intensive application.
So, the following screenshot shows how to quickly review a Lambda architecture:
We will not go in to the details of the Lambda architecture again but instead focus on point 2 in the preceding diagram, labelled the batch layer.
The batch layer in Lambda mainly performs two functions:
- Manages the master dataset. This master dataset is advised to be defined as an immutable, append-only set of raw data.
- Precomputes the batch view that feeds into the serving layer being used by the underlying query system.
It is the second function that we will discuss in this chapter. Managing the master dataset does not usually require a lot of development time or effort. Point 1 is mainly the responsibility of the operations team.
Now, if you...