Performance Optimization and Tuning in a Lakehouse | Engineering Lakehouses with Open Table Formats

Book Overview & Buying
Table Of Contents

Engineering Lakehouses with Open Table Formats

By : Dipankar Mazumdar, Vinoth Govindarajan

Buy this Book

Engineering Lakehouses with Open Table Formats

By: Dipankar Mazumdar, Vinoth Govindarajan

Buy this Book

Overview of this book

Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. You’ll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also get hands on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you’ll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them. By the end of this book, you’ll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Benefits with Your Book

Free Chapter

Open Data Lakehouse: A New Architectural Paradigm

Free Benefits with Your Book

The evolution of data systems

The emergence of the lakehouse architecture

Attributes of an open data lakehouse

Summary

Questions

Answers

Transactional Capabilities of the Lakehouse

Understanding transactions and ACID properties

Discovering conflict resolution mechanisms

Understanding the storage engine

Summary

Questions

Answers

Apache Iceberg Deep Dive

Apache Iceberg architecture

Apache Iceberg features

Hands-on with Apache Iceberg and Apache Spark

Installation requirements

Hands-on with Apache Iceberg and Apache Flink

Summary

Apache Hudi Deep Dive

Technical requirements

Architecture

Apache Hudi features

Hands-on with Apache Hudi and Apache Spark

Hands-on with Apache Hudi and Apache Flink

Hudi table services

Summary

Questions

Answers

Delta Lake Deep Dive

Technical requirements

Delta Lake architecture

Transaction log protocol

Delta Lake features

Unique capabilities

Hands-on with Delta Lake and Apache Spark

Hands-on with Delta Lake and Apache Flink

Summary

Questions

Answers

Catalog and Metadata Management

Technical requirements

The importance of catalogs in a lakehouse architecture

Iceberg REST catalog specification

Popular catalog options for lakehouses

Summary

Questions

Answers

Interoperability in Lakehouses

Need for interoperability

Apache XTable (incubating)

How to run translation with Apache XTable and Apache Spark

Delta UniForm

Use cases for interoperability

Summary

Performance Optimization and Tuning in a Lakehouse

Performance optimization

Optimization techniques in open table formats

Query optimization techniques

Summary

Questions

Answers

Data Governance and Security in Lakehouses

Understanding data quality and lineage

Data lifecycle management

Security and access control

Compliance and regulations

Summary

Questions

Answers

Evaluating and Selecting Open Table Formats

Key decision factors in table format evaluation

Table evolution and versioning

Platform tools and operational features

Feature-level strengths and practical selection criteria

Support within the open ecosystem

Real-world case studies

Summary

References

Real-World Applications and Learnings

Scenario overview: Acme Manufacturing’s journey to an Iceberg-based lakehouse

Scenario overview: GlobalMart’s real-time analytics with Apache Hudi CDC

Scenario overview: Visionary Telecom’s machine learning workflow modernization with Delta Lake

Summary

Questions

Answers

Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

Other Books You May Enjoy

Index

Engineering Lakehouses with Open Table Formats

By : Dipankar Mazumdar, Vinoth Govindarajan

Engineering Lakehouses with Open Table Formats

By: Dipankar Mazumdar, Vinoth Govindarajan

Overview of this book

Query optimization techniques

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access