-
Book Overview & Buying
-
Table Of Contents
Engineering Lakehouses with Open Table Formats
By :
As data lakehouses continue to grow in scale and complexity, the need for efficient query performance becomes increasingly important. Query optimization techniques play a pivotal role in reducing data I/O, improving scan efficiency, and accelerating data retrieval. In modern data lakehouses, leveraging advanced metadata, indexing strategies, vectorized execution, cost-based optimization (CBO), intelligent caching, and unified materialized views has become essential for achieving high performance. This approach is particularly effective for time-series workloads, where efficient data pruning can lead to orders of magnitude performance improvements.
Some of the examples in this section reference public talks or engineering blog posts from companies such as Netflix, Microsoft, Uber, Pinterest, and Databricks. These are intended to illustrate how these techniques have been applied in practice rather than to prescribe specific performance expectations...