-
Book Overview & Buying
-
Table Of Contents
Building Modern Data Applications Using Databricks Lakehouse
By :
There may be certain scenarios where a data pipeline has been deployed into a production environment. However, down the road, there may be significant changes in the business requirements, requiring the datasets to be recomputed from scratch. In these scenarios, recomputing the historical data of these datasets could be cost prohibitive.
Enzyme, a brand-new optimization layer that is only available for serverless DLT pipelines, aims to reduce ETL costs by dynamically calculating a cost model for keeping the materialized results of a dataset up to date. Like the cost model in Spark query planning, Enzyme calculates a cost model between several ETL techniques from a traditional materialized view in DLT to a Delta streaming table to another Delta streaming table, or a manual ETL technique. For example, the Enzyme engine might model the cost to refresh a dataset using a materialization technique, translating to 10 Spark jobs, each...