Book Image

Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal
Book Image

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.
Table of Contents (17 chapters)
1
Part 1: Introduction to Data Observability
4
Part 2: Implementing Data Observability
8
Part 3: How to adopt Data Observability in your organization
12
Part 4: Appendix

Analyzing the application

A common way to understand what happens in an application is to replay its course after it’s run. A good example would be a SQL application. When you query a SQL database, for instance, through a JDBC connector, you are creating access logs in the database. These logs may contain lots of information, especially regarding who has queried the database, what they queried, when it was executed, and sometimes information on how long it took to process the query, how many bytes were retrieved, and so on.

This situation is explained in Figure 3.4. Users are continuously querying a central SQL database. This creates a log file, which is a kind of journal that contains the records of the queries:

Figure 3.4 – Logging strategy for a SQL logs analyzer

Figure 3.4 – Logging strategy for a SQL logs analyzer

This said, these logs can be extremely valuable for observability purposes. By using strategies to retrieve and analyze the logs, the data team can rebuild data transformation...