Book Image

Driving Data Quality with Data Contracts

By : Andrew Jones
Book Image

Driving Data Quality with Data Contracts

By: Andrew Jones

Overview of this book

Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value. With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products. By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.
Table of Contents (16 chapters)
1
Part 1: Why Data Contracts?
4
Part 2: Driving Data Culture Change with Data Contracts
8
Part 3: Designing and Implementing a Data Architecture Based on Data Contracts

Creating a data contract

We’ll start by defining a specification for data generators to create a data contract. We’ll discuss why we have chosen to define it in this way, and how it acts as the foundation of our sample implementation.

We’ll be using this data contract to drive the contract-driven architecture we’ll be building out in this chapter. It will be the foundation that drives the following resources and services:

  • A BigQuery table, acting as the interface to the data.
  • Code libraries for the data generators to use, by converting our data contract to JSON Schema and using existing open source libraries.
  • A schema registry, so the schemas are available to others. Again, we used our JSON Schema representation of the data contract to interact with that.
  • An anonymization service, which uses the data contract directly to anonymize some data.

The following diagram shows how each of these resources is driven by the data contract...