Book Image

Data Ingestion with Python Cookbook

By : Gláucia Esppenchutz
Book Image

Data Ingestion with Python Cookbook

By: Gláucia Esppenchutz

Overview of this book

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges. You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation. By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.
Table of Contents (17 chapters)
1
Part 1: Fundamentals of Data Ingestion
9
Part 2: Structuring the Ingestion Pipeline

Ingesting Data from Structured and Unstructured Databases

Nowadays, we can store and retrieve data from multiple sources, and the optimal storage method depends on the type of information being processed. For example, most APIs make data available in an unstructured format as this allows the sharing of data of multiple formats (for example, audio, video, and image) and has low storage costs via the use of data lakes. However, if we want to make quantitative data available for use with several tools to support analysis, then the most reliable option might be structured data.

Ultimately, whether you are a data analyst, scientist, or engineer, it is essential to understand how to manage both structured and unstructured data.

In this chapter, we will cover the following recipes:

  • Configuring a JDBC connection
  • Ingesting data from a JDBC database using SQL
  • Connecting to a NoSQL database (MongoDB)
  • Creating our NoSQL table in MongoDB
  • Ingesting data from MongoDB...