Book Image

Data Governance Handbook

By : Wendy S. Batchelder
Book Image

Data Governance Handbook

By: Wendy S. Batchelder

Overview of this book

2.5 quintillion bytes! This is the amount of data being generated every single day across the globe. As this number continues to grow, understanding and managing data becomes more complex. Data professionals know that it’s their responsibility to navigate this complexity and ensure effective governance, empowering businesses with the right data, at the right time, and with the right controls. If you are a data professional, this book will equip you with valuable guidance to conquer data governance complexities with ease. Written by a three-time chief data officer in global Fortune 500 companies, the Data Governance Handbook is an exhaustive guide to understanding data governance, its key components, and how to successfully position solutions in a way that translates into tangible business outcomes. By the end, you’ll be able to successfully pitch and gain support for your data governance program, demonstrating tangible outcomes that resonate with key stakeholders.
Table of Contents (24 chapters)
1
Part 1:Designing the Path to Trusted Data
7
Part 2:Data Governance Capabilities Deep Dive
14
Part 3:Building Trust through Value-Based Delivery
20
Part 4:Case Study

A brief overview of the data governance components

Now that we have classified data solutions into assets and liabilities and defined how to calculate value, let’s dive into the components in further detail. I prefer to group the components of data governance into building blocks. The reason I prefer this approach and have leveraged this framing in several companies is because it allows the organization to directly tie each building block to specific and straightforward outcomes. The first building block, policy and standards, is relatively basic and can be designed with a small team. This is a great place to get started in developing a data governance program.

Policy and standards

The purpose of this building block is to define data ownership and the structures needed to design accountability to manage your organization’s data as an asset. This building block will ensure effective, sustainable, and standardized data governance on which the company can depend. This building block is a prerequisite for future building blocks because it defines what is required to drive effective data governance and who needs to be involved. Additionally, the components of this building block can be created in a simplified way and can be expanded as the company matures in its data journey.

An easy place to start is to draft a simple and straightforward data governance policy. The purpose of a data governance policy is to tell the company what they need to do, why, and who is accountable.

The objectives of a strong data policy include the following:

  • Establishing a single policy and set of standards for data management
  • Establish the capabilities and data assets that are in scope for the policy and, in turn, for the office of the Chief Data and Analytics Officer
  • Define the accountability and responsibilities for the implementation of the policy and the operationalization of data management capabilities
  • Set minimum standards for data management, specifically for governance, quality, and meta- and master data management
  • Define the procedures and usage requirements for tools to drive the consistent and robust adoption of minimum standards in a consistent manner
  • Enable flexibility where appropriate to allow for ease of implementation where possible
  • Define what is out of the scope of the policy

As with any policy, it is important to identify the owner of the data governance policy, who will be accountable for managing the policy by refreshing it at least annually, updating the content, and evangelizing it to the company. It is also the owner’s responsibility to ensure buy-in from key stakeholders across the company. Ideally, this owner would be a chief data officer, head of data governance, or similar role. If your company does not have a data leader in the role yet, another option would be a chief information officer, chief information security officer, chief privacy officer, or even a general council.

A policy does not need to be lengthy to be effective. Ideally, the policy would set forth the basics and would be supported by more specific and topically focused data standards. This approach often allows the policy to go through a more formalized corporate governance approval process while allowing for slightly easier updates to the data standards as your organization matures. I recommend implementing a data standard for each of the core capabilities addressed in Part 2 of this book, plus any specific to your business requires additional guidance for data stakeholders. Remember, the policy sets forth the minimum expectations for the company.

To get started in developing your data governance policy, a suggested data governance policy outline may contain the following:

  1. Purpose and scope statement (for example, to transform how the company utilizes data by creating additional revenue streams and simultaneously reducing data risk)
  2. The owner (for example, a Chief Data Officer)
  3. Reviewers/contributors and titles (for example, Head of IT, COO, and data stewards)
  4. Sign-off/approval (For example, CEO, CFO, and so on)
  5. Data governance requirements
  6. Roles and responsibilities for implementation
  7. Feedback loops for improvements and/or additions
  8. Measures of success
  9. Compliance/audit expectations and frequency
  10. Glossary of terms

Data governance policy example

The following is an example of an enterprise data governance policy:

Owner: Chief Data & Analytics Officer

Last Approval: 12/31/2023

Policy Leader: Head of Data Governance

Contributors:

  • Head of Information Technology/CIO
  • Head of Human Resources/CHRO
  • Head of Marketing/CMO
  • Head of Sales/CRO
  • Product/Business Unit Leaders
  • Product/Business Unit Data Stewards

Purpose and scope

This data management policy applies to all data held or processed by the company, which may include customer data, transactional data, financial information, regulatory and risk reporting, and any other data related to the business of the company. This data may be first-party data, derived data, or data acquired from another company (third-party data). The outcomes of this policy are the following:

  • Reduce risk
  • Unlock revenue opportunities
  • Drive operational efficiencies

Introduction

The company is responsible for ensuring all data is accurate, complete, secure, and accessible only to those who require access to fulfill their job responsibilities. This policy sets for the requirements for the enterprise to deliver on the outcomes established above.

Data governance

Data governance establishes the requirements and standards for all corporate data deemed “in scope” of this policy in the aforementioned policy and scoping section. The purpose of the data governance capabilities established within this policy are to drive enhanced transparency and accountability for our company’s data and to drive improved consistency, control, and oversight for how data is managed, stored, and used going forward.

Roles and responsibilities

  1. Enterprise Data Committee: An Enterprise Data Committee will be established, chaired by the Chief Data & Analytics Officer, to provide an oversight and prioritization body to manage data and analytics initiatives and issue remediation enterprise-wide. A Data Domain Executive will be required to sit on this committee to ensure appropriate prioritization across all data domains.
  2. The Chief Information Officer will partner with the Chief Data & Analytics Officer to ensure technical requirements and systems are provided in support of the data and analytical needs of the organization, both for the Office of the Chief Data & Analytics Officer, but also for all functional data domains enterprise-wide.
  3. A Data Domain Executive will be established for each functional area to ensure the appropriate focus, funding, and resourcing is established and maintained to manage data in accordance with both this policy and the needs of the business.
  4. Data Stewards will be assigned by each Data Domain Executive to ensure the day-to-day execution of data requirements is completed in accordance with policy and the needs of the business. Data Stewards will also be required to work with the Office of the Chief Data & Analytics Officer to ensure that transparency of progress and ongoing operational effectiveness is maintained for leadership, regulators, and across domains.

Requirements

This section provides the minimum expectations for compliance.

Data Governance

Each data domain will develop a plan to drive compliance with this policy to operationalize the requirements within their data domain. The Data Domain Executive will ensure appropriate prioritization, whereas the Data Stewards will execute the plan on behalf of the Data Domain Executive. Additionally, Technical Data Stewards will support the delivery of all technical requirements to ensure compliance with this policy and the broader needs of the business. The minimum requirements are the following:

  1. Identify all data assets and systems
  2. Identify all data and technical data stewards for each asset and system
  3. Assign each asset and system to the appropriate data domain
  4. Develop a plan to meet the requirements for each asset and system and maintain compliance going forward

Data Cataloging

The purpose of data cataloging is to centrally manage and publish business and technical metadata across the organization to enable accelerated discovery of the data available across the organization in a clear, transparent manner. As data cataloging is implemented, the Chief Data & Analytics Office will evaluate metadata to determine the best source of truth for a given data asset and identify opportunities to reduce proliferation and redundancy across the company. This will further simplify our data ecosystem over time and reduce the costs of duplicate data handling/management and storage. The minimum requirements to be published in the Enterprise Data Catalog are the following:

  1. Description of the data asset/system
  2. Technical metadata
  3. Description of schemas and tables
  4. Identification of critical data elements (CDEs)
  5. Business definitions for CDEs
  6. Data classification for all data elements within the asset/system in accordance with the company data classification policy

Data Quality

The purpose of data quality is to ensure the data is fit for use. The following requirements have been set forth with the aim to centrally develop data quality rules, provide profiling resources and tooling, and monitor data hygiene to ensure the data can be trusted for analytical and business use and identify issues requiring disclosure and/or remediation. The minimum requirements are the following:

  1. Define the data quality rules for each CDE and enter this into the enterprise data quality tool
  2. Enable CDEs for data quality monitoring
  3. Provide data quality dashboards to transparently report on current quality levels
  4. Identify data quality issues and create plans to address material data quality issues

Policy Management

  • Feedback Loops: Feedback about the policy and/or questions about policy implementation should be directed to the Policy Leader defined above.
  • Measures of Success: A robust Enterprise data governance scorecard will be established for each data domain and at the corporate level. Periodic reporting of the progress of complying with this policy will be reported to the Office of the Chief Data & Analytics Officer and to the Enterprise Data Committee. Further measures of success may be required.
  • Compliance/Audit: Internal audits, external audits, and regulatory bodies may audit this policy for compliance on a regular basis. All requests for audit should be disclosed to the Office of the Chief Data & Analytics Officer so that requests can be co-ordinated and driven through the Enterprise Data Committee.
  • Frequency: This policy will be reviewed, updated, and re-approved at least annually.

Now that we’ve reviewed what makes up a great policy, let’s pivot into the key roles and responsibilities for a data governance program.

Roles and responsibilities

Any data governance expert will tell you that people are the key to a successful data governance program. People are responsible for caring for the data and ensuring its accuracy, that it is fit for use, and how to improve it to make it better. This concept is called data stewardship. Data stewardship requires collaboration to drive success. The executive identified for each data domain appoints a data steward to drive day-to-day activities for the data domain.

The key responsibilities of data stewards include the following:

  • Serve as the single point of leadership for their data domain
  • Ensure the data domain executive is kept informed on key activities
  • Ensure the funding necessary for adequate data management is secured and allocated properly to data management activities
  • Collect data requirements for the data domain and execute the data management requirements across the data domain

The key responsibilities of the office of the Chief Data and Analytics Officer include the following:

  • Define data policy, publish the policy, and review for updates at least annually
  • Lead data domain executives and data stewards through requirements, ensuring a comprehensive understanding
  • Provide data tooling to drive the enterprise-wide enablement of data management
  • Streamline, to the extent possible, compliance with the policy
  • Report regularly to the executive team and, when required, to regulators and the board of directors

Important note

One of the hardest parts of data governance is gaining the collaboration required to drive the outcomes the organization needs to leverage its data for results.

In every company I have worked in, the intention was almost always good: the people wanted to collaborate. Data governance experts wanted to collaborate to drive success; however, competing priorities, a lack of a clear vision, and difficulty measuring the impact of a data governance program often led to data stewardship being deprioritized. Ultimately, organizations that drive successful data governance initiatives recognize the importance of good data governance and that it is more than just understanding records, fields, and tables. They recognize that it is more than just building another data warehouse. They recognize that people are the center of success and that identifying the individuals responsible and accountable for strong data is the cornerstone of any data governance program.

For data to be good, stewards must be good. Good stewards take accountability for their data’s quality, access, and overall management. The best data stewards I have worked with ensure the buy-in of the business users is achieved in every step of the governance process because the goal of data governance is not just clean data; it’s enabling the users of the data to confidently and easily use data in pursuit of their business objectives.

Let’s compare two examples to illustrate this:

  • Example 1: A business user, person A, needs data X to report to a regulator. In a well-governed data environment, person A goes to the enterprise data catalog, where person A can search for data X. Person A finds the metric they need to report to the regulator, but they have a few questions. In the catalog, person B is identified as the data steward. Person A can reach out to person B to ask questions and learn more about data X to confirm it is the appropriate metric to share with the regulator.
  • Example 2: Here is a company with little to no data governance. A business User, person A, needs to get data X for the regulator, but does not know where to start. There is no catalog for the data, so they ask the person they think might know about the data, person C. Person C suggests person A calls person D. Person A calls person D, and so on. Days or even months go by, and person A does not feel confident they have the right information for the regulator but provides the best information they know of. Ultimately, the regulator does not have confidence in the data because person A does not have confidence in the data.

Often, aligning data stewards in an organization is easier said than done. One of the easiest approaches to getting started is to begin at the executive level. I refer to these individuals as data domain executives. These are the individuals who are ultimately accountable for the data their division of the organization is accountable for. An example would be a Chief People Officer (CPO) being assigned the data domain executive for human resources data. This would make the CHRO ultimately accountable for human resources data. The CPO would delegate the day-to-day activities to their data steward, who would be responsible for ensuring human resources data is managed according to data governance policy and standards.

As a part of your data governance program, one of the first activities should be to identify the data domain executive for the organization. In my experience, defining the logical types of data into data domains and assigning a data domain executive is the best place to start. Upon defining the data domains and data domain executive, you will have the executives responsible for the data of the entire organization named. These individuals should make up your sponsors, and should you choose to start an enterprise data committee or council, they should become your voting members:

Figure 1.3 – Identification of data domain executives and data/technical data stewards

Figure 1.3 – Identification of data domain executives and data/technical data stewards

Depending on the size of your organization, you may have a third key role in the business side of data management. For larger organizations, you may consider establishing a more senior-level person to be a data domain manager. This person would have the role of the data steward, as described in the preceding section, and would likely have a 1:M relationship with data stewards who either report to them or have a dotted-line relationship with the business. In one organization I worked for, we had a data domain manager for the division of the organization, and for each sub-group, a data steward was defined. It looked like this:

Figure 1.4 – Example of data domain appointments

Figure 1.4 – Example of data domain appointments

Governance forums

An enterprise data committee is an effective way to align an organization around a data strategy and a data governance program, and it serves as a prioritization and escalation body. Each data domain executive who participates in the committee should assign a data steward to drive activities within their data domain. Often, organizations establish sub-groups (e.g., human resources data councils or human resources data working groups) to carry out the implementation and ongoing governance of the respective domain. This allows data governance activities to be implemented more deeply within the organization. The Chief Data Officer (or equivalent) should chair this committee and be responsible and accountable for managing the agenda, cadence, and facilitation, as well as reporting progress upward in the organization to the C-suite and the board of directors.

Figure 1.5 – Example of how various data governance forums work together

Figure 1.5 – Example of how various data governance forums work together

Important note

If you notice that the enterprise data committee is delegated below the data domain executive, you may have a problem. Having someone delegate once may not be of concern (e.g., for vacation); however, if you start to see a pattern, either by a particular domain or across the board, it is a signal that your committee is not providing value. Quickly reach out to the data domain executive and seek to understand what is driving the delegation. Simply ask, “I noticed that you have delegated the last few EDC meetings to someone on your team. Is there anything I can do to make the meeting more engaging for you? Your perspective is critical for the whole committee, and I want to make sure it’s a valuable use of your time.”

There are other key roles that may play a part in your data governance program, including BU stakeholders (often users of data) and the information technology team. We will dig deeper into the metadata chapter in Part 2, but for now, I also included the IT application owner in the preceding diagram (see IT). In any program, an IT application owner plays a key role in the success of a data governance program. The IT application owner is the individual responsible for the technical implementation of any data governance requirements set forth by the enterprise data governance policy and by the data domain executive or their delegate. We will get deeper into operating models in Chapter 3.

Reporting on governance progress

Ideally, as implementation progresses, the enterprise data committee should receive ongoing reporting to demonstrate improvements in data governance. One way to report this information is through the use of an enterprise data governance scorecard (EDG Scorecard). The EDG Scorecard should provide a transparent status of how well the company is doing in implementing data governance capabilities and how well this remains implemented post-implementation. Ultimately, the EDG Scorecard should communicate to its users how the company is doing in terms of making data easier to find, understand, and, ultimately, trust.

Before trying to design and implement the EDG Scorecard for the entire company, I recommend selecting one data domain to pilot this process. I prefer to start with a data domain that has at least slightly more mature data governance practices than other domains. One data domain that tends to be a bit more mature in most organizations is finance and/or regulatory reporting. By piloting a data domain, the other data domain executives, who make up the enterprise data committee, get a sense of what an EDG Scorecard looks like and how they should be implementing it for their respective data domain.

Sample implementation metrics

Metrics should be defined to measure the implementation of the data management policy as well as the operational effectiveness of capabilities on an ongoing basis. To measure the implementation of the policy across domains, each domain should measure progress and report that progress to the office of the chief data & analytics officer on a regular basis (bi-weekly, monthly, or quarterly, based on your organizational expectations).

Examples for the office of the Chief Data and Analytics Officer include the following:

  • The total number of data domains (this becomes the denominator for the following metrics)
  • The number of domains with an identified/confirmed data domain executive
  • The number of domains with identified/confirmed data stewards(s)
  • The number of domains with systems of record identified
  • The number of domains with systems of record assigned to system owners
  • The number of domains with critical data elements identified for each system of record
  • The number of domains with data quality rules written and executed for each system of record
  • The number of domains with business glossaries established for each system of record
  • The number of domains with data dictionaries established for each system of record
  • The number of domains with reference data adopted for each system of record

Use case

A large multinational company has seven business units and four corporate functions. A chief data & analytics office was established to advance the use of data, analytics, and AI. As a part of the office’s first-year strategy, the CDAO organized a team focused on data management maturity. With this focus, the head of enterprise data governance was tasked with the formal development of a data policy and a scorecard to track the implementation thereof.

The head of enterprise data governance developed an enterprise data policy to establish expectations for the organization and defined the key roles and responsibilities to drive compliance with the policy. The team identified 12 data domains: one for each of the seven business units, one for financial data, one for risk data, one for marketing data, one for employee/HR data, and one for master and reference data, which is owned by the CDAO.

To track the implementation progress across the 12 domains identified, the following scorecard was developed for the office of the chief data & analytics officer. This report is updated bi-weekly and reported to the executive team, the data domain executives, and the data stewards, and it is reviewed in the enterprise data committee meetings on a monthly basis.

Figure 1.6 – Example of a data governance scorecard for the chief data and analytics office

Figure 1.6 – Example of a data governance scorecard for the chief data and analytics office

Examples of the data domains/data stewards include the following:

  • The total number of system of records assigned to the data domain (this becomes the denominator for the following metrics)
  • The number of system of records with critical data elements identified
  • The number of system of records with data quality rules written and executed
  • The number of system of records with business glossaries established
  • The number of system of records with data dictionaries established
  • The number of system of records with reference data adopted

Each data domain should track its own respective metrics and submit the report to the office of the Chief Data and Analytics Officer on a bi-weekly basis.

Implementation Scorecard for the HR Data Domain

Owner: Chief People Officer

Primary Contact: VP of HR Analytics

Purpose: The following scorecard is designed to report on the progress in implementing the core requirements of the Enterprise Data Policy. Twelve Data Domains have been identified and established to manage the data for the company. The following scorecard is for monitoring the adoption of the HR Data Domain.

Systems with CDEs Identified

Systems with Data Quality Rules Written and Executed

Systems with Business Glossaries Established

Systems with Data Dictionaries Established

Systems with Reference Data Adopted

2 of 5

1 of 5

1 of 5

0 of 5

0 of 5

40%

20%

10%

0%

0%

Status: Green—On Track | Yellow—At Risk | Red—Past Due

Figure 1.7 – Example of a data governance scorecard for a data domain

Related teams and capabilities needed for success

No data team will function alone. To be successful in implementation, partnership is the most valuable skill at your disposal. There are a series of key functions that you will need to build strong and sustainable relationships with. First, the business functions. You must build trust with each business function you support. This begins and ends with deep listening. Deep listening requires you to listen with the sole purpose of learning. You are listening to learn not to respond. Your business function leaders will provide you with their needs. It is your job to listen and be able to take their needs back to your data team to inform and build out the data strategy. Trust will be earned as you deliver against their needs. Initially, hearing them out, regardless of their past experiences with data, is the most important place to start.

Information technology

Your company’s information technology team (IT team), led by a chief information officer (CIO), will likely support your success by delivering the infrastructure (at minimum) to support your team’s data solutions. The relationship you build with your CIO and their key leaders is critical to your success. I’ve found the quality of the relationship between CDO and CIO to be one of the most powerful indicators of a CDO’s success. In cases where the CDO and CIO have a high-functioning and trusting relationship, I have witnessed success for both leaders. In cases where the relationship is not high-functioning or, at worst, competitive, the success of both leaders is hindered.

Information security

Additionally, with the rise and formalization of the role of the CISO and the increasing threat landscape across all industries, the importance of the CISO has never been higher. One of the core tenants of the liability side of the data equity equation is the protection of your company’s data. This shows up in data security, of course, but also spans and tightly connects with technical metadata management. In short, you cannot protect data if you don’t know where it is. Technical metadata is a critical enabler that supports an understanding of what data you have, where it is stored, where it moves, how to classify it, and then what controls are needed to properly protect and secure it. We will cover more about this topic in Chapters 6 and 7. As a data leader, this capability will enable the success of your CISO, and therefore, I suggest you engage early and often with them throughout your data governance implementation.

Chief financial officer

As you work through building relationships and, ultimately, building out a stakeholder list, I encourage you to think of both business functions and corporate functions. Your stakeholders should include the CIO and CISO, as mentioned in the preceding section, as well as the CFO, head of privacy, general council, human resources, and operations, in addition to business functions. If you work in the technology field, product and engineering leaders are critical, whereas in financial services, groups such as risk management and regulators may be relevant. The point is to look at the company at large, including all the stewards and users of data, and identify the key leaders you need to work with to partner effectively. Most data leaders who do not succeed in their role fail because they fail to build relationships and/or fail to build trust through value creation.

Human resources

Your human resources leader will be a key stakeholder and strategic partner in helping you design your organization and develop a hiring, training, and retention strategy for talent management.

Privacy and legal

Privacy and legal teams will be key drivers for many of the capabilities your team will enable. For example, while the legal team should define data retention standards and the privacy team should define the types of information classification (private or company confidential, for example), it is your team’s responsibility to implement the capabilities to discover, inventory, and auto-classify the data in accordance with these policies. Regulations such as GDPR or CCPA drive much of this work across industries, but other industries have special considerations, such as BCBS 239 or HIPPA. Be sure to work with your legal and privacy teams to find the right relations to follow.

Additionally, be mindful that emerging state regulations are coming quickly. At the time of this writing, the State of Iowa was the most recent state in the US to release a privacy law. It will go into effect January 2025.