Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Introduction to Data Observability

Free Chapter

Chapter 1: Fundamentals of Data Quality Monitoring

Learning about the maturity path of data in companies

Identifying information bias in data

Exploring the seven dimensions of data quality

Turning data quality into SLAs

Indicators of data quality

Alerting on data quality issues

Summary

Chapter 2: Fundamentals of Data Observability

Technical requirements

From data quality monitoring to data observability

Three principles of data observability

Data observability in IT observability

Key components of data observability

Data observability in the enterprise ecosystem

Summary

Part 2: Implementing Data Observability

Chapter 3: Data Observability Techniques

Analyzing the data

Analyzing the application

Advanced techniques for data observability – distributed tracing

Summary

Chapter 4: Data Observability Elements

Technical requirements

Prerequisites and installation requirements

Static and dynamic elements

Defining the data observability context

Getting the metadata of the data sources

Mastering lineage

Computing observability metrics

Data observability for AI models

Summary

Chapter 5: Defining Rules on Indicators

Technical requirements

Determining SLOs

Turning SLOs into rules

Project – continuous validation of the data

Summary

Part 3: How to adopt Data Observability in your organization

Chapter 6: Root Cause Analysis

Data incident management

Anomaly detection

Summary

Chapter 7: Optimizing Data Pipelines

Concepts of data pipelines and data architecture

Rationalizing the costs

Summary

Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability

Defining and understanding data teams

Data mesh, data quality, and data observability – a virtuous circle

The first steps toward data observability and how to measure success

Measuring success

Summary

Part 4: Appendix

Chapter 9: Data Observability Checklist

Challenges of implementing data observability

Checklist to implement data observability

Summary

Chapter 10: Pathway to Data Observability

Technical roadmap to include data observability

Implementing data observability in a project

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Fundamentals of Data Quality Monitoring

Welcome to the exciting world of Data Observability for Data Engineering!

As you open the pages of this book, you will embark on a journey that will immerse you in data observability. The knowledge within this book is designed to equip you, as a data engineer, data architect, data product owner, or data engineering manager, with the skills and tools necessary to implement best practices in your data pipelines.

In this book, you will learn how data observability can help you build trust in your organization. Observability provides insights directly from within the process, offering a fresh approach to monitoring. It’s a method for determining whether the pipeline is functioning properly, especially in terms of adhering to its data quality standards.

Let’s get real for a moment. In our world, where we’re swimming in data, it’s easy to feel like we’re drowning. Data observability isn’t just some fancy term – it’s your life raft. Without it, you’re flying blind, making decisions based on guesswork. Who wants to be in that hot seat when data disasters strike? Not you.

This book isn’t just another item on your reading list; it’s the missing piece in your data puzzle. It’s about giving you the superpower to spot the small issues in your data before they turn into full-blown catastrophes. Think about the cost, not just in dollars, but in sleepless nights and lost trust, when data incidents occur. Scary, right?

But here’s the kicker: data observability isn’t just about avoiding nightmares; it’s about building a foundation of trust. When your data’s in check, your team can make bold, confident decisions without that nagging doubt. That’s priceless.

Data observability is not just a buzzword – we are deeply convinced it is the backbone of any resilient, efficient, and reliable data pipeline. This book will take you on a comprehensive exploration of the core principles of data observability, the techniques you can use to develop an observability approach, the challenges faced when implementing it, and the best practices being employed by industry leaders. This book will be your compass in the vast universe of data observability by providing you with various examples that allow you to bridge the gap between theory and practice.

The knowledge in this book is organized into four essential parts. In part one, we will lay the foundation by introducing the fundamentals of data quality monitoring and how data observability takes it to the next level. This crucial groundwork will ensure you understand the core concepts and will set the stage for the next topics.

In part two, we will move on to the practical aspects of implementing data observability. You will dive into various techniques and elements of observability and learn how to define rules on indicators. This part will provide you with the skills to apply data observability in your projects.

The third part will focus on adopting data observability at scale in your organization. You will discover the main benefits of data observability by learning how to conduct root cause analysis, how to optimize pipelines, and how to foster a culture change within your team. This part is essential to ensure the successful implementation of a data observability program.

Finally, the fourth part will contain additional resources focused on data engineering, such as a data observability checklist and a technical roadmap to implement it, leaving you with strong takeaways so that you can stand on your own two feet.

Let’s start with a hypothetical scenario. You are a data engineer, coming back from your holidays and ready to start the quarter. You have a lot of new projects for the year. However, the second you reach your desktop, Lucy from the marketing team calls out to you: “The marketing report of last month is totally wrong – please fix it ASAP. I need to update my presentation!”

This is annoying; all the work that’s been scheduled for the day is delayed, and you need to check the numbers. You open your Tableau dashboard and start a Zoom meeting with the marketing team. The first task of the day: understand what she meant by wrong. Indeed, the turnover seems odd. It’s time for you to have a look at the SQL database feeding the dashboard. Again, you see the same issue. This is strange and will require even more investigation.

After hours of manual and tedious checks, contacting three different teams and sending 12 emails, you finally found the culprit: an ingestion script, feeding the company’s master database, was modified to express the turnover in thousands of dollars instead of units. Because the data team didn’t know that the metric would be used by the marketing team, the information did not pass and the pipeline was fed with the wrong data.

It’s not the first time this has happened. Hours of productivity are ruined by firefighting data issues. It’s decided – you need to implement a new strategy to avoid this.

Observability is intimately correlated with the notions of data quality. The latter is often defined as a way of measuring data indicators. Data quality is one thing, but monitoring it is something else! Through this chapter, we will explore the principles of data quality and understand how those can guide you on the data observability journey and how the information bias between stakeholders is key to understanding the need for data quality and observability in the data pipeline.

Data quality comes from the need to ensure correct and sustainable data pipelines. We will look at the different stakeholders of a data pipeline and describe why they need data quality. We will also define data quality through several concepts, which will lead to you understanding how a common base can be created between stakeholders.

By the end of this chapter, you will understand how data quality can be monitored and turned into metrics, preparing the ground for data observability.

In this chapter, we’ll cover the following topics:

Learning about the maturity path of data in companies
Identifying information bias in data
Exploring the seven dimensions of data quality
Turning data quality into SLAs
Indicators of data quality
Alerting on data quality issues

Data Observability for Data Engineering

By : Michele Pinto, Sammy El Khammal

Data Observability for Data Engineering

By: Michele Pinto, Sammy El Khammal

Overview of this book

Related Content you might be interested in

Current Title:

Data Observability for Data Engineering

Data Stewardship in Action

Driving Data Quality with Data Contracts

Implementing Enterprise Observability for Success

Fundamentals of Data Quality Monitoring