Book Image

Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal
Book Image

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.
Table of Contents (17 chapters)
1
Part 1: Introduction to Data Observability
4
Part 2: Implementing Data Observability
8
Part 3: How to adopt Data Observability in your organization
12
Part 4: Appendix

Getting the metadata of the data sources

The fuel for the data application is the data itself. The data sources that are used in the application have to be correctly identified in the logs. If an issue occurs at a data source and you need to perform deeper analyses, you would expect information that will help you retrieve the data. In this section, we will see how a data source can be identified.

Data source

To identify the data that’s used by an application, we need to define the metadata of the data source. The metadata represents the data on the data.

The metadata of the data source is all the elements that will allow you to recognize the data source. Let’s explore them:

  • The file’s location: This gives you the address of the data source and helps you retrieve the data in case you need it. The file location can be the path on your local filesystem or the filesystem of the company. It can also be a connection string if the data is in a table located...