Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Introduction to Data Observability

Free Chapter

Chapter 1: Fundamentals of Data Quality Monitoring

Learning about the maturity path of data in companies

Identifying information bias in data

Exploring the seven dimensions of data quality

Turning data quality into SLAs

Indicators of data quality

Alerting on data quality issues

Summary

Chapter 2: Fundamentals of Data Observability

Technical requirements

From data quality monitoring to data observability

Three principles of data observability

Data observability in IT observability

Key components of data observability

Data observability in the enterprise ecosystem

Summary

Part 2: Implementing Data Observability

Chapter 3: Data Observability Techniques

Analyzing the data

Analyzing the application

Advanced techniques for data observability – distributed tracing

Summary

Chapter 4: Data Observability Elements

Technical requirements

Prerequisites and installation requirements

Static and dynamic elements

Defining the data observability context

Getting the metadata of the data sources

Mastering lineage

Computing observability metrics

Data observability for AI models

Summary

Chapter 5: Defining Rules on Indicators

Technical requirements

Determining SLOs

Turning SLOs into rules

Project – continuous validation of the data

Summary

Part 3: How to adopt Data Observability in your organization

Chapter 6: Root Cause Analysis

Data incident management

Anomaly detection

Summary

Chapter 7: Optimizing Data Pipelines

Concepts of data pipelines and data architecture

Rationalizing the costs

Summary

Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability

Defining and understanding data teams

Data mesh, data quality, and data observability – a virtuous circle

The first steps toward data observability and how to measure success

Measuring success

Summary

Part 4: Appendix

Chapter 9: Data Observability Checklist

Challenges of implementing data observability

Checklist to implement data observability

Summary

Chapter 10: Pathway to Data Observability

Technical roadmap to include data observability

Implementing data observability in a project

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Key components of data observability

In this section, we will see some examples of data observability metrics that are collected from inside applications and issues that can be raised from such quality issues. We will focus on detecting issues and to do so, we are going to create visuals of data observability issues in a Jupyter notebook.

If you want to follow the example, you can find it in the Chapter2 section of the GitHub repository. The name of the notebook is Visualise_Observability_Issues.ipynb.

In this part, we will focus on a timeliness, a completeness, and an accuracy issue.

The dataset that we provide is a basic example of marketing and sales data. The data represents the orders made on a web shop and consists of the following fields:

date: The date of the order
guid: A unique ID for the order
email: The email address linked to the order
page_visited: The number of pages the customer visited on the website
duration: How long the customer...

Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

Related Content you might be interested in

Current Title:

Data Observability for Data Engineering

Data Stewardship in Action

Driving Data Quality with Data Contracts

Implementing Enterprise Observability for Success

Key components of data observability