Book Image

Microsoft SQL Server 2008 R2 Master Data Services

Book Image

Microsoft SQL Server 2008 R2 Master Data Services

Overview of this book

Table of Contents (18 chapters)
Microsoft SQL Server 2008 R2 Master Data Services
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Data quality


If effort is going to be made to distribute a single copy of master data across the organization, then it stands to reason that the data should be of high quality.

It is the responsibility of the MDM solution to ensure data quality - it cannot simply be left to chance that data coming from multiple data sources, or even a single source, will be immediately fit for consumption. Instead, it is likely that one or more of the following problems will need to be corrected:

  • Lack of consistency—The existence of different attribute values across two or more members when the true attribute values are semantically the same. For example, the CRM could hold the Customer Genders as "Male" and "Female", whereas the ERP could hold the Genders as "M" and "F".

  • Incomplete data—NULL or blank data for a given entity attribute. For example, the Sales Order Processing system could hold the Customer's First Date of purchase. In some systems, this would be calculated and could therefore be relied upon without problems. However, if it's a manually entered field, then having it as NULL or blank would mean any systems that want to use that field for analytical purposes would not be able to do so.

  • Format issues—Data that is entered and stored in an incorrect format. For example, customer names being entered in upper case, rather than title case.

  • Out of range—Numerical attribute values that are outside the bounds of what the business deems acceptable. For example, a stationary manufacturer may sell pens between the range of $5 – $20. Therefore, if a product price of $500 is encountered for a pen, then it is definitely incorrect.

  • Complex data issues—A situation that can occur whereby an attribute value is correct by itself, but incorrect in the context of the member's other attribute values. For example, a Product Recommend Retail Price may well be correct at $1, but not if the Cost of Manufacture is $5.

  • Data duplication—The existence of duplicate members for a given entity. As we've seen in one of the previous examples, this can arise due to different systems holding the same information, resulting in us getting two customers with slightly different names that are in fact exactly the same customer.

Data cleansing should be applied to the data before it enters the MDM database, according to a set of pre-defined business rules that check for the specific problems that can occur with the data. The results of any data cleaning should be logged so that reports can be created in order to meet compliance or auditing requirements.

Prior to the development of the data cleansing routines, a detailed data analysis, known as data profiling, should be carried out on the source data to anticipate where the problems may lie. This should be combined with discussions with users and source system experts to understand any additional data quality issues not uncovered by the data profiling.

All the major vendors, such as Microsoft, IBM, and Oracle, produce ETL (Extract Transform Load) tools that contain functionality to assist with data cleaning and data profiling. In addition, there are some more specific tools on the market that are able to assist with the sometimes complex area of de-duplication, especially for areas such as Customer names and addresses.