-
Book Overview & Buying
-
Table Of Contents
Data Engineering with Azure Databricks
By :
In this chapter, we transitioned from simply running data workflows to running them with peak efficiency and cost-effectiveness. You learned how to analyze and optimize Delta Lake tables using tools like OPTIMIZE, Z-ORDER, and the modern Liquid Clustering approach to ensure your data is stored efficiently. We explored how to diagnose and resolve query bottlenecks by interpreting Spark UI metrics and identifying issues like data skew, allowing you to implement advanced performance techniques such as join optimization and salting for skewed joins.
We also saw how to leverage modern Spark features like Adaptive Query Execution (AQE) and Photon to automatically accelerate your workloads, and how to make informed caching decisions by choosing between the automatic Disk Cache and manual Spark caching based on your workload patterns. Finally, we covered how to configure and manage cluster costs effectively by right-sizing clusters, implementing autoscaling, using spot instances...