Book Image

Driving Data Quality with Data Contracts

By : Andrew Jones
Book Image

Driving Data Quality with Data Contracts

By: Andrew Jones

Overview of this book

Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value. With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products. By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.
Table of Contents (16 chapters)
1
Part 1: Why Data Contracts?
4
Part 2: Driving Data Culture Change with Data Contracts
8
Part 3: Designing and Implementing a Data Architecture Based on Data Contracts

A Brief History of Data Platforms

Before we can appreciate why we need to make a fundamental shift to a data contracts-backed data platform in order to improve the quality of our data, and ultimately the value we can get from that data, we need to understand the problems we are trying to solve. I’ve found the best way to do this is to look back at the recent generations of data architectures. By doing that, we’ll see that despite the vast improvements in the tooling available to us, we’ve been carrying through the same limitations in the architecture. That’s why we continue to struggle with the same old problems.

Despite these challenges, the importance of data continues to grow. As it is used in more and more business-critical applications, we can no longer accept data platforms that are unreliable, untrusted, and ineffective. We must find a better way.

By the end of this chapter, we’ll have explored the three most recent generations of data architectures at a high level, focusing on just the source and ingestion of upstream data, and the consumption of data downstream. We will gain an understanding of their limitations and bottlenecks and why we need to make a change. We’ll then be ready to learn about data contracts.

In this chapter, we’re going to cover the following main topics:

  • The enterprise data warehouse
  • The big data platform
  • The modern data stack
  • The state of today’s data platforms
  • The ever-increasing use of data in business-critical applications