Book Image

Practical Data Quality

By : Robert Hawker
Book Image

Practical Data Quality

By: Robert Hawker

Overview of this book

Poor data quality can lead to increased costs, hinder revenue growth, compromise decision-making, and introduce risk into organizations. This leads to employees, customers, and suppliers finding every interaction with the organization frustrating. Practical Data Quality provides a comprehensive view of managing data quality within your organization, covering everything from business cases through to embedding improvements that you make to the organization permanently. Each chapter explains a key element of data quality management, from linking strategy and data together to profiling and designing business rules which reveal bad data. The book outlines a suite of tried-and-tested reports that highlight bad data and allow you to develop a plan to make corrections. Throughout the book, you’ll work with real-world examples and utilize re-usable templates to accelerate your initiatives. By the end of this book, you’ll have gained a clear understanding of every stage of a data quality initiative and be able to drive tangible results for your organization at pace.
Table of Contents (16 chapters)
Part 1 – Getting Started
Part 2 – Understanding and Monitoring the Data That Matters
Part 3 – Improving Data Quality for the Long Term

Basics of data profiling

Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column. For example, for both values and string lengths, the minimum, maximum, mean, and median are provided to help identify outliers.

Most of you will have some experience in data profiling – even if you have not heard the term before. The first task that many people perform when looking at an unfamiliar set of data is to open it in a spreadsheet tool and apply a filter (the autofilter feature in Microsoft Excel, for example) to all the columns. They will check all values in each column, looking to see whether the column contains a couple of values that all the rows are associated with, or whether there are many. People look to see whether the data is a number, a date, text, and so on. It’s quite common to look for the smallest and largest values. Even this basic action is an...