Book Image

Practical Data Quality

By : Robert Hawker
Book Image

Practical Data Quality

By: Robert Hawker

Overview of this book

Poor data quality can lead to increased costs, hinder revenue growth, compromise decision-making, and introduce risk into organizations. This leads to employees, customers, and suppliers finding every interaction with the organization frustrating. Practical Data Quality provides a comprehensive view of managing data quality within your organization, covering everything from business cases through to embedding improvements that you make to the organization permanently. Each chapter explains a key element of data quality management, from linking strategy and data together to profiling and designing business rules which reveal bad data. The book outlines a suite of tried-and-tested reports that highlight bad data and allow you to develop a plan to make corrections. Throughout the book, you’ll work with real-world examples and utilize re-usable templates to accelerate your initiatives. By the end of this book, you’ll have gained a clear understanding of every stage of a data quality initiative and be able to drive tangible results for your organization at pace.
Table of Contents (16 chapters)
1
Part 1 – Getting Started
6
Part 2 – Understanding and Monitoring the Data That Matters
10
Part 3 – Improving Data Quality for the Long Term

Causes of bad data

Any of these impacts can cause critical damage to an organization. No organization deliberately plans for data quality to be poor enough to be impacted in these ways. So, how do organizations end up impacted in this way? How does an organization neglect data sufficiently so that it can no longer achieve its objectives?

Lack of a data culture

Successful organizations try to put a holistic data culture in place. Everyone is educated on the basics of looking after data and the importance of having good data. They consider what they have learned when performing their day-to-day tasks. This is often referred to as the promotion of good data literacy.

Putting a strong data culture in place is a key building block when trying to ensure data remains at an acceptable level of quality for the business to succeed in its objectives. The data culture includes how everyone thinks about data. Many leaders will say that they treat data like an asset, but this can be quite superficial. Doug Laney’s book, Infonomics, explains this best:

Consider your company’s well-honed supply chain and asset management practices for physical assets, or your financial management and reporting discipline. Do you have similar accounting and asset management practices in place for your “information assets?” Most organizations do not.” (Laney, 2017)

Laney makes an interesting point. Accounting standards allow organizations to value intangible assets – for example, patents, copywrites, and goodwill. These are logged on an asset register and are depreciated over time as their value diminishes. Why do we not do this with data as well? If data had a value attributed to it, then initiatives to eliminate practices that eroded that value would be better received.

We will return to this in later chapters, but for now, suffice it to say that having a data culture is a key building block when striving for good data quality. Many organizations make statements about treating data as an asset and having a data culture, without really taking practical steps to make this so.

Prioritizing process speed over data governance

There is always a contention between the speed of a business process and the level of data governance involved in the steps of that process. Efforts to govern and manage data can often be seen as red tape.

Sometimes, a desire for a high process speed comes into conflict with the enforcement of these rules. There may even be financial incentives for process owners to keep processes shorter than a certain number of days/hours. In these cases, process owners may ask for the data entry process to be simplified and the rules removed.

In the short term, this may result in an improved end-to-end process speed – for example, in procurement, initial requests may be turned into purchase orders more quickly than before. However, as shown in Figure 1.3, a fast process with few data entry rules will result in poor data quality (box 1) and this is unsustainable.

In all these cases, the organization experiences what we call data and process breakdown – the dreaded box 2 in Figure 1.3. The initial data entry process is now rapid, but the follow-on processes are seriously and negatively impacted. For example, if supplier bank details are not collected accurately in the initial process, then the payment process will not be completed successfully. The accounts payable team will have to contact the supplier to request the correct details. If the contact details have also not been collected properly, then the team will have a mystery to solve before they can do their job! For one supplier, this can be frustrating, but for large organizations with thousands of suppliers and potentially millions of payments, processes are usually highly automated, and gaps like these become showstopping issues:

Figure 1.3 – Balance of process speed and data quality – avoiding data and process breakdown

Figure 1.3 – Balance of process speed and data quality – avoiding data and process breakdown

When establishing new processes, most organizations start in box 3, where the rules have been established but they are inefficient. For example, rules are applied in spreadsheet-based forms, but the form must be approved by three different people before data can be entered into a system. Some organizations (typically those in regulated industries) move further to the right into box 6 – where the data governance is so complex that process owners feel compelled to act. This often leads to a move back to box 1 – where the process owner instructs their team to depart from the data governance rules, sacrificing data quality for process speed. Again, this brings the data and process breakdown scenario into sharp focus.

Through technology, organizations tend to move to box 4 – for example, a web-based form is added for data input that validates data, connects to the underlying system to save the valid data, and automatically orchestrates approvals as appropriate. As these processes are improved over time, there is the opportunity to move to box 5 – for example, by adding lookups to databases of companies (for example, Dun and Bradstreet) to collect externally validated supplier data, including additional attributes such as details of the supplier risk and company ownership details. In the best cases, good master data management can contribute to a higher process speed than would otherwise have been possible.

There can be significant shifts in an organization’s position within this model when there is great organizational change. This might include a re-organization which might remove roles relating to data management, or a merger with another organization.

Mergers and acquisitions

Often, in merger and acquisition scenarios, two different datasets need to be brought together from different systems – for example, datasets from two different ERP systems are migrated to a single ERP. Often, these projects have extremely aggressive timelines because of the difficulties of running a newly combined business across multiple systems of record and the cost of maintaining the legacy systems.

When data is migrated in an aggressive timeline, the typical problems are as follows:

  • Data is not de-duplicated across the two different source systems (for example, the same supplier exists for both former organizations and two copies get created in the new system)
  • Data is migrated as-is without being adjusted to work in the new system – which may have different data requirements
  • Data was of poor quality in one or more of the legacy systems, but there is no time to enhance it in the project timeline

After a merger, there is usually a significant investment in the harmonization of systems and processes that cover the migration process. If the migration process encounters these problems and bad data is created in the new systems, a budget is rarely set aside to resolve the problems in a business-as-usual context.