-
Book Overview & Buying
-
Table Of Contents
Data Engineering with Azure Databricks
By :
Cloud object storage is cheap and scalable, but it comes with several problems. Raw files have no concept of transactions, versioning, or schema control. A failed write can leave a table in a broken state. A schema change can silently corrupt downstream systems. And once bad data lands, recovering from it is painful and manual.
Delta Lake solves these problems by adding a transaction log on top of standard cloud storage. Data is stored physically as Parquet files, with the transaction log acting as a control layer on top. This chapter covers what that means in practice. How Delta Lake protects data quality, tracks every change to a table, and makes it possible to capture and propagate changes across a pipeline.
In this chapter, we will cover the following topics: