-
Book Overview & Buying
-
Table Of Contents
Building Modern Data Applications Using Databricks Lakehouse
By :
In this chapter, we looked at how DLT can simplify our data pipelines by abstracting away many of the low-level details of processing data in Spark. We saw how Databricks Auto Loader solves the scalability problem of stream processing files from cloud storage. With just a few lines of code, we deployed a scalable backend system to efficiently read new files as soon as they appear in a cloud storage location. When it came to applying data changes to downstream datasets within our pipeline, the DLT framework once again simplified data reconciliation when data events were published late or out of order. We also saw how we could apply slowly changing dimensions with just a few parameter changes in the apply_changes() API. Finally, we uncovered the details of data pipeline settings, optimizing the pipeline compute based on the computational requirements and DLT feature set that we needed in the data pipeline. We also saw how DLT can automatically handle pipeline failures for us...