-
Book Overview & Buying
-
Table Of Contents
Building Modern Data Applications Using Databricks Lakehouse
By :
In this chapter, we examined how and why the data industry has settled on a lakehouse architecture, which aims to merge the scalability of ETL processing and the fast data warehousing speeds for BI workloads under a single, unified architecture. We learned how real-time data processing is essential to uncovering value from the latest data as soon as it arrives, but real-time data pipelines can halt the productivity of data engineering teams as complexity grows over time. Finally, we learned the core concepts of the Delta Live Tables framework and how, with just a few lines of PySpark code and function decorators, we can quickly declare a real-time data pipeline that is capable of incrementally processing data with high throughput and low latency.
In the next chapter, we’ll take a deep dive into the advanced settings of Delta Live Tables pipelines and how the framework will optimize the underlying datasets for us. Then, we’ll look at more advanced data transformations...