-
Book Overview & Buying
-
Table Of Contents
Building Modern Data Applications Using Databricks Lakehouse
By :
In this chapter, we looked at various methods for scaling our data pipelines to handle large volumes of data and perform well under periods of high and unpredictable processing demand. We looked at two attributes of scaling our DLT pipelines – compute and data layout. We examined the enhanced autoscaling feature of the Databricks Data Intelligence Platform to automatically scale the computational resources that the data pipelines execute on. We also looked at optimizing how the underlying table data was stored, clustering relevant data within table files and leading to faster table queries and shorter pipeline processing times. Furthermore, we also looked at regular maintenance activities to maintain high-performing table queries, as well as prevent ballooning cloud storage costs from obsolete data files.
Data security is of the utmost importance and is often overlooked until the end of a lakehouse implementation. However, this could mean the difference between a...