Book Image

Data Ingestion with Python Cookbook

By : Gláucia Esppenchutz
Book Image

Data Ingestion with Python Cookbook

By: Gláucia Esppenchutz

Overview of this book

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges. You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation. By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.
Table of Contents (17 chapters)
1
Part 1: Fundamentals of Data Ingestion
9
Part 2: Structuring the Ingestion Pipeline

Connecting OpenMetadata to our database

Now that we have configured our Data Discovery tool, let’s create a sample connection to our local database instance. Let’s try to use PostgreSQL to do an easy integration and practice another database usage.

Getting ready

First, ensure our application runs appropriately by accessing the http://localhost:8585/my-data address.

Note

Inside OpenMetadata, the user must have the Data Steward or Administration role to create connections. You can switch to the admin user using the previous credentials we saw.

You can check the Docker status here:

Figure 3.16 – Active containers are shown in the Docker desktop application

Figure 3.16 – Active containers are shown in the Docker desktop application

Use PostgreSQL for testing. Since we already have a Google project ready, let us create a SQL instance using the PostgreSQL engine.

As we kept the queries to create the database and tables in Chapter 2, we can build it again in Postgres. The queries can also be found...