Book Image

Data Ingestion with Python Cookbook

By : Gláucia Esppenchutz
Book Image

Data Ingestion with Python Cookbook

By: Gláucia Esppenchutz

Overview of this book

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges. You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation. By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.
Table of Contents (17 chapters)
1
Part 1: Fundamentals of Data Ingestion
9
Part 2: Structuring the Ingestion Pipeline

Applying reverse ETL

As the name suggests, reverse ETL takes data from a data warehouse and inserts it into a business application such as HubSpot or Salesforce. The reason behind this is to make data more operational and use business tools to bring more insights to data that is already in a format ready for analysis or analytical format.

This recipe will teach us how to architect a reverse ETL pipeline and about the commonly used tools.

Getting ready

There are no technical requirements for this recipe. However, it is encouraged to use a whiteboard or a notepad to take notes.

Here, we will work with a scenario where we are ingesting data from an e-learning platform. Imagine we received a request from the marketing department to better understand user actions on the platform using the Salesforce system.

The objective here will be to create a diagram showing the data flow process from a source of data to the Salesforce platform.

How to do it…

To make this...