Book Image

Data Ingestion with Python Cookbook

By : Gláucia Esppenchutz
Book Image

Data Ingestion with Python Cookbook

By: Gláucia Esppenchutz

Overview of this book

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges. You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation. By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.
Table of Contents (17 chapters)
1
Part 1: Fundamentals of Data Ingestion
9
Part 2: Structuring the Ingestion Pipeline

Accessing databases and data warehouses

Databases are the foundation of any system or application, no matter your architecture. A database is sometimes needed to store logs, user activities or information, and system stuff.

Putting it in a bigger perspective, data warehouses have the same usage but are related to analytical data. After ingesting and transforming data, we need to load it somewhere where it is easier to retrieve analytic information for use on dashboards, reports, etc.

Currently, it is possible to find several types of databases (of the SQL and NoSQL types) and data warehouse architectures. However, this recipe aims to cover how access control is usually done for both relational structures. The goal is to understand how the access levels are defined, even using a generic scenario.

Getting ready

For this recipe, we will use MySQL. You can install it following the instructions on the MySQL official page here: https://dev.mysql.com/downloads/installer/.

You...