Book Image

Driving Data Quality with Data Contracts

By : Andrew Jones
Book Image

Driving Data Quality with Data Contracts

By: Andrew Jones

Overview of this book

Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value. With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products. By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.
Table of Contents (16 chapters)
1
Part 1: Why Data Contracts?
4
Part 2: Driving Data Culture Change with Data Contracts
8
Part 3: Designing and Implementing a Data Architecture Based on Data Contracts

The ever-increasing use of data in business-critical applications

Despite all these challenges, data produced on a data platform is being increasingly used in business-critical applications.

This is for good reason! It’s well accepted that organizations that make effective use of data can gain a real competitive advantage. Increasingly, these are not traditional tech companies but organizations across almost all industries, as technology and data become more important to their business. This has led to organizations investing heavily in areas such as data science, looking to gain similar competitive advantages (or at least, not get left behind!).

However, for these data projects to be successful, more of our data needs to be accessible to people across the organization. We can no longer just be using a small percentage of our data to provide top-level business metrics and nothing more.

This can be clearly seen in the consumer sector, where to be competitive you must be providing a state-of-the-art customer experience, and that requires the atomic use of data at every customer touchpoint. A report from McKinsey (https://www.mckinsey.com/industries/retail/our-insights/jumpstarting-value-creation-with-data-and-analytics-in-fashion-and-luxury) estimated that the 25 top-performing retailers were digital leaders. They are 83% more profitable and took over 90% of the sector’s gains in market capitalization.

Many organizations are, of course, aware of this. An industry report by Anmut in 2021 (https://www.anmut.co.uk/wp-content/uploads/2021/05/Amnut-DLR-May2021.pdf) illustrated both the perceived importance of data to organizations and the problems they have utilizing it when it stated this in its executive summary:

We found that 91% of business leaders say data’s critical to their business success, 76% are investing in business transformation around data, and two-thirds of boards say data is a material asset.

Yet, just 34% of businesses manage data assets with the same discipline as other assets, and these businesses are reaping the rewards. This 34% spend most of their data investment creating value, while the rest spend nearly half of their budget fixing data.

It’s this lack of discipline in managing their data assets that is really harming organizations. It manifests itself in the lack of expectations throughout the pipeline and then permeates throughout the entire data platform and into those datasets within the data warehouse, which themselves also have ill-defined expectations for its downstream users or data-driven products.

The following diagram shows a typical data pipeline and how at each stage the lack of defined expectations ultimately results in the consumers losing trust in business-critical data-driven products:

Figure 1.6 – The lack of expectations throughout the data platform

Figure 1.6 – The lack of expectations throughout the data platform

Again, in the absence of these expectations, users will optimistically assume the data is more reliable than it is, but now it’s not just internal KPIs and reporting that are affected by the inevitable downtime but revenue-generating services affecting external customers. Just like internal users, they will start losing trust, but this time they are losing trust in the product and the company, which can eventually cause real damage to the company’s brand and reputation.

As the importance of data continues to increase and it finds its way into more business-critical applications, it becomes imperative that we greatly increase the reliability of our data platforms to meet the expectations of our users.