Microsoft SQL Server 2008 R2 Master Data Services

As could be said for a lot of IT solutions, there are several different architectural approaches to choose from when implementing MDM: three to be exact, each with their own advantages and disadvantages.

The three approaches to choose from are:

Transaction Hub
Registry
Hybrid

Each approach is explained in turn next.

Transaction Hub

The Transaction Hub approach is the simplest of the three approaches to explain, but it's probably fair to say that it's probably the hardest to be able to implement.

The idea behind the Transaction Hub approach is to have a single repository of master data that all applications can connect to directly in order to retrieve the master data that's needed. A diagram of this approach is shown below:

Allowing each application to directly retrieve from a single database involves altering all applications to make their database connections to the central database. For example, the CRM application may be set up to take customer data (such as Full Name, Street Name, State, and other attributes) from a SQL table called 'Cust' in a database called 'CRM'. Implementing a Transaction Hub approach to MDM would mean that the application would have to take customer data in a potentially different structure (perhaps First Name and Surname instead of Full Name) from a table called 'Customer' in a Transaction Hub MDM repository database called 'MDM', for example.

Prior to the existence of the MDM program, the business applications would have been able to insert, update, and delete records as much as they wanted. This is still allowed with the Transaction Hub approach, with the original application user interface facilitating the change, but the data part of the change being sent to the central database. A locking strategy is often employed to ensure that users cannot update the same record at the same time.

The advantages and disadvantages of the Transaction Hub approach are immediately obvious. By forcing all systems to retrieve data from a single database, the problem of having duplicate entries across different databases is eradicated.

The problem with the Transaction Hub approach is that it is probably not going to be possible for most organizations to be able to alter the data tier level of their operational applications. Altering the database connection string is one thing, but having to alter the whole data tier of an off-the-shelf application is very difficult to achieve.

For these reasons, the Transaction Hub approach is seen more as an idealistic approach to MDM, whereas other methods are more realistic to achieve.

Registry

The Registry approach involves the creation of a single database that maintains a mapping system in order to return a single, unique view of a given master data entity. This can be used for reporting, or for any operational reasons that need to obtain a view of a master data entity, such as a mail shot to all customers. The solution typically includes some sort of front-end application that connects to the registry application in order to maintain the mapping.

For each entity that is managed by the Registry, the database holds the following information:

Member key/unique identifier
One or more corresponding source system keys
Optional metadata and business rules for survivorship

No additional attributes of the entities (for example, Name, Address in the case of a Customer) are held in the Registry system. Instead, these attributes are actually retrieved from the source systems directly, based on looking up the source system key value that is now held in the Registry. An example of how this would work for managing Customer master data derived from two Customer source systems is shown below:

When a given master data entity must be queried or extracted by a downstream system, a distributed query must be executed, which is initiated from the MDM database, as shown. This query must join each system together by using the unique member key that is maintained in the registry mapping.

Given that the non-key attributes of the entities reside in the original different line of business systems, the registry approach typically delivers a read-only master data solution. As it only maintains a mapping, there is no interface or mechanism for allowing a user to update a given attribute (such as a Product Category, for example), before distributing that update back to the source systems.

The fact that updates are not allowed simplifies the implementation of a Registry system, although the approach is not without complication. If we have two legitimate sources that update Customer data (such as the Sales Order Processing System and the CRM), then there is a possibility that we will encounter duplicates. A set of business rules should be set up to deal with this eventuality, by picking a single surviving record when a duplicate has been found. The survivorship process of removing the duplicates happens in the distributed query, which returns the single merged record, but there is no process of pushing the corrected changes back to the source systems.

The advantage to the registry approach is that it's possible to extract a real time view of a given master data entity without interfering with the various source systems. It's also relatively easy to set up, given that there are a variety of relational databases that are capable of issuing distributed queries.

The main problem with the registry approach is that the large, cross-database queries that are required in order to combine the data are often slow to execute, especially if the data sets are large.

In addition, any time a new system is created or purchased, the distributed queries will have to re-written to incorporate the new source.

Hybrid

The Hybrid is a blend of the other two architectural styles, including a separate database as per the Registry style, but also including all, or most, of an entity's attributes as seen with the Transaction Hub approach.

The aforementioned database is the core of the Hybrid solution, and contains the following data per entity that is to be managed:

Member Key/Unique Identifier
The full set of entity attributes that must be shared across different applications
Metadata and business rules for survivorship and data cleaning

The key concept behind the Hybrid model is that the line of business applications continue to connect directly to their original data stores, as they did before the MDM program existed. In some cases, the original applications update the local data stores directly; carrying out inserts updates and deletes for master data entities.

The role of the Hybrid MDM database is to connect to the individual application databases and to detect changes in the sources for the master data entities that are being managed. The changes that get detected are physically propagated to the MDM database, which maintains a separate, clean and conformed copy of the data.

Once data has been picked up from the application database, a number of operations may be applied to the data:

Data quality checks and business rules will be run on the data to correct any problems that can be automatically fixed
De-duplication and survivorship may occur so that an entity such as Customer gets conformed from multiple data sources
Nominated users may enrich the data to add additional attribute values that are not present in the original data source

At a high level, the above will ensure that the new MDM database now contains a clean set of members that meet the data quality and structure rules as defined by the business.

Now comes the task of distributing the data out to any system that needs it, which from an operational system perspective can be somewhat of a challenge. We know that with the Hybrid approach, all applications continue to connect to their own data sources. The complication arises if the source applications are continuing to insert and update data. It may be possible to prevent, or lock, users out of updating entities directly in the source applications, but this will not always be the case. Therefore, when data gets sent back to the source applications, updates to the source application master data must be made, with the understanding that update conflicts could occur.

An overview of the Hybrid approach is shown below:

New applications that get implemented can either source master data from the MDM database, usually via web services, or simply fit into the same synchronization approach that is indicated above. The choice of whether to connect a new application directly will largely depend on whether it is developed in-house or purchased.

The Analytical MDM scenario that was presented earlier in the chapter of adding an extra Customer attribute, namely Industry Classification is dealt with elegantly by the Hybrid approach. Attributes that are needed by the source systems are sent back to the source systems, whereas in this case the Industry Classification would only be sent to the data warehouse.

Incidentally, the Transaction Hub approach also deals with the "extra attribute problem", but is likely to be less appealing to most organizations when compared to the Hybrid approach. The Hybrid approach has its challenges, but they are around pushing data into source applications, rather than re-architecting those applications.

Microsoft SQL Server 2008 R2 Master Data Services

Microsoft SQL Server 2008 R2 Master Data Services

Overview of this book

Related Content you might be interested in

Current Title:

Microsoft SQL Server 2008 R2 Master Data Services

Different approaches to Master Data Management

Transaction Hub

Registry

Hybrid