Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying The Definitive Guide to Data Integration
  • Table Of Contents Toc
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

By : Pierre-Yves BONNEFOY, Emeric CHAIZE, Raphaël MANSUY, Mehdi TAZI
4 (1)
close
close
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

4 (1)
By: Pierre-Yves BONNEFOY, Emeric CHAIZE, Raphaël MANSUY, Mehdi TAZI

Overview of this book

The Definitive Guide to Data Integration is an indispensable resource for navigating the complexities of modern data integration. Focusing on the latest tools, techniques, and best practices, this guide helps you master data integration and unleash the full potential of your data. This comprehensive guide begins by examining the challenges and key concepts of data integration, such as managing huge volumes of data and dealing with the different data types. You’ll gain a deep understanding of the modern data stack and its architecture, as well as the pivotal role of open-source technologies in shaping the data landscape. Delving into the layers of the modern data stack, you’ll cover data sources, types, storage, integration techniques, transformation, and processing. The book also offers insights into data exposition and APIs, ingestion and storage strategies, data preparation and analysis, workflow management, monitoring, data quality, and governance. Packed with practical use cases, real-world examples, and a glimpse into the future of data integration, The Definitive Guide to Data Integration is an essential resource for data eclectics. By the end of this book, you’ll have the gained the knowledge and skills needed to optimize your data usage and excel in the ever-evolving world of data.
Table of Contents (19 chapters)
close
close

Influential open source data technologies

Several solutions must be studied at the level of data integration and open source solutions. We can consider solutions such as Hadoop, Spark, and Kafka, or even solutions such as Presto. These different solutions make it possible to support large volumes and thus meet the challenges of today’s companies.

Hadoop and the Hadoop ecosystem

Hadoop leverages the principles of distributed file systems, allowing storage capacity and computing power to be scaled by distributing data and processing across multiple nodes in a cluster.

The Hadoop ecosystem comprises numerous tools and components that enhance and extend its capabilities. These components assist in data ingestion, storage, processing, and analysis. Some of the key components in the Hadoop ecosystem are as follows:

  • HDFS: This is Hadoop’s primary storage layer and offers a fault-tolerant, distributed storage system designed for large datasets. It splits files...
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
The Definitive Guide to Data Integration
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon