Book Image

The Definitive Guide to Data Integration

By : Pierre-Yves BONNEFOY, Emeric CHAIZE, Raphaël MANSUY, Mehdi TAZI
Book Image

The Definitive Guide to Data Integration

By: Pierre-Yves BONNEFOY, Emeric CHAIZE, Raphaël MANSUY, Mehdi TAZI

Overview of this book

The Definitive Guide to Data Integration is an indispensable resource for navigating the complexities of modern data integration. Focusing on the latest tools, techniques, and best practices, this guide helps you master data integration and unleash the full potential of your data. This comprehensive guide begins by examining the challenges and key concepts of data integration, such as managing huge volumes of data and dealing with the different data types. You’ll gain a deep understanding of the modern data stack and its architecture, as well as the pivotal role of open-source technologies in shaping the data landscape. Delving into the layers of the modern data stack, you’ll cover data sources, types, storage, integration techniques, transformation, and processing. The book also offers insights into data exposition and APIs, ingestion and storage strategies, data preparation and analysis, workflow management, monitoring, data quality, and governance. Packed with practical use cases, real-world examples, and a glimpse into the future of data integration, The Definitive Guide to Data Integration is an essential resource for data eclectics. By the end of this book, you’ll have the gained the knowledge and skills needed to optimize your data usage and excel in the ever-evolving world of data.
Table of Contents (19 chapters)

Different types of data transformation

Data transformation is a critical component of any data integration process, and understanding the various types of transformations is essential for effective data management. This section provides a friendly introduction to the types of transformations you might encounter on your data journey.

First, we will discuss batch processing, which is our first data transformation method. Batch processing deals with data transformations in chunks or groups. This method is frequently used when it is more efficient to process multiple data points at the same time or when the data does not require immediate analysis. Examples of common use cases include generating daily sales reports and updating a recommendation system overnight. Then, we will discuss event and stream processing, which are two other data transformation methods that are closely related, each playing a crucial role in handling real-time data. Event processing focuses on immediately handling...