-
Book Overview & Buying
-
Table Of Contents
Engineering Lakehouses with Open Table Formats
By :
Performance optimization is critical for managing and analyzing large-scale datasets in modern data systems. As the volume of data continues to grow exponentially and different types of analytical workloads (BI, machine learning, streaming, and so on) emerge, ensuring that systems can provide low-latency queries, efficient storage, and scalability becomes a core challenge. In big data ecosystems, performance bottlenecks typically stem from two critical areas: unoptimized storage and query execution overheads. To address these challenges, data systems have evolved to incorporate techniques that minimize data scanning, optimize resource usage, and improve data organization. In this chapter, we will explore how widely used performance optimization techniques (storage and query engine) from the traditional database world are applied to lakehouse table formats such as Apache Hudi, Apache Iceberg, and Delta Lake, and learn how they are...