Chapter 7: Data Curation Stage – The Silver Layer
The journey of data is now at a very critical stage. In this stage, the driver (data engineer) needs to carefully plan and maneuver the vehicle (data pipeline) around several roadblocks in such a way that the sanity, durability, and security of the data are preserved.
In the previous chapter, we performed a deep dive into Delta Lake. Understanding the Delta Lake functionality is a critical skill, as it enables the data engineer to design and develop the silver layer of the lakehouse. In this chapter, we will advance our understanding of how to cleanse raw data. We will start by learning the need for data curation, followed by building a data curation pipeline that can perform the cleaning work consistently and regularly.
In this chapter, we will cover the following topics:
- The need for curating raw data
- The process of curating raw data
- Developing a data curation pipeline
- Running the pipeline for the...