Book Image

Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal
Book Image

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.
Table of Contents (17 chapters)
1
Part 1: Introduction to Data Observability
4
Part 2: Implementing Data Observability
8
Part 3: How to adopt Data Observability in your organization
12
Part 4: Appendix

Computing observability metrics

The following data observability elements are known as data quality metrics. In this category, we will group everything we consider to be observability metrics. These observations are statistics related to the data you manipulate:

  • Distribution observations: Minimum, maximum, mean, standard deviation, skewness and kurtosis, quantiles, and so on
  • Categorical stats: Number of categories, percentage of each category, and so on
  • Completeness observations: Number of rows and number of missing values
  • Freshness information: Timestamp of the data itself
  • KPIs: Key performance indicators and other custom metrics worth checking, for technical or business purposes

The metrics you compute depend on the circumstances and need to be linked to the context where they were computed. Those metrics can change following the usage of the data, the filters you applied, and the application run. Figure 4.7 shows an example of multiple contexts for...