-
Book Overview & Buying
-
Table Of Contents
Engineering Lakehouses with Open Table Formats
By :
Processing large volumes of structured and unstructured data is essential for generating insights and making informed decisions. Over the past few decades, the growing need for both real-time transactional processing and large-scale analytical capabilities has influenced the design of data management systems. Initially, organizations relied on two distinct systems: online transaction processing (OLTP) for high-throughput transactional workloads, and online analytical processing (OLAP) for historical trend analysis and complex querying.
However, as data volumes grew beyond the capacity of traditional data warehouses, particularly due to the rise of semi-structured and unstructured data, a new architectural model emerged: the data lake. Built on low-cost cloud or distributed storage, data lakes decoupled compute from storage and allowed organizations to ingest and store raw data of all types at scale. While this architecture addressed scalability and schema flexibility, it lacked the transactional guarantees and governance features necessary for reliable analytics.
These limitations eventually led to the rise of the lakehouse architecture, a unification of the best features of both data warehouses and data lakes. Lakehouses offer the scalability and openness of data lakes with the data reliability and query performance traditionally associated with data warehouses.
In this chapter, we will cover the following topics:
By the end of this chapter, you’ll have a clear understanding of how data management has evolved from OLTP and OLAP systems to the lakehouse architecture. You will also have gained insight into the core components and key attributes that make an open data lakehouse a powerful solution for modern data needs.