It's time to migrate master data management upstream to an integration hub or, ideally, an enterprise MDM system. And if you have yet to do anything about data consistency, take these four steps toward integration and stewardship.

InformationWeek Staff, Contributor

February 7, 2007

12 Min Read

I admit I'm a little slow sometimes when it comes to embracing market trends. That's because I think many market trends end up being more hype than substance, but that's not always the case. Take master data management, a trend that has been at the top of the technology buzzwords list for the last few years. A decade ago, the idea of creating and maintaining a single source for information about customers, products and other entities was considered a pipe dream, but this is a trend that is increasingly real.

Warren Thornthwaite

This article examines the problems addressed and the business value delivered by master data management. We'll also look at the three common approaches to creating and managing master data, detailing the pros and cons of each approach. Finally, we'll give you four solid recommendations on how to proceed with master data management no matter which approach makes the most sense for your organization.

Source System Disparities

The difficulties in creating a single view of the enterprise stem from the quality of source-system data. One of the biggest challenges is that multiple source systems collect and maintain their own versions of the same data elements. We find entries for the same customer in separate systems when the Web registration screen captures customer name and address while the shipping system keeps another copy of the customer's name and address. When that customer calls for support, we create another customer record and capture another address. The same customer may even have duplicate entries within the same system, as when the customer registers twice on the same Web site.

One of the goals of ERP systems was to address this problem by creating a single, centralized transaction system with data elements that are shared across all transaction processes. The "E" stands for "enterprise," right? Most organizations have yet to reach this level of source-system integration. They either have systems outside the ERP system or are using the ERP system independently. In other words, multiple divisions or business units may have separate customer sets that might even be kept in the same physical table. When it comes time to create an enterprise information resource, these "separate" divisions have usually been working with many of the same customers.

The problem gets worse when you have to include data from external sources. External customer demographic data, retail product sales data and even product data from manufacturers or distributors rarely have the same customer or product keys as your internal systems.

The Need for Master Data

Clean master data can benefit the organization by increasing revenues and profits and improving productivity and customer satisfaction. The requirement for integrated data at the operational level has coalesced under the general term master data management. MDM is the idea that there should be a single source for information about a customer or product: the master source. Ideally, all transaction systems will interact with the master source to get the most current values for a particular attribute, like name or address. MDM solves the data disparity problem where it should be solved: once, at the source.

It's not enough to keep track of the correct current value of an attribute. Analytic needs and compliance reporting demand tracking of historical changes to the MDM function. This requirement makes MDM more complex but doesn't fundamentally change what needs to happen, or how.

There are three common approaches to creating MDM systems. 1) Let the data warehouse do it. This approach is the least disruptive to existing systems, but it has hidden costs and limitations. 2) Create an operational MDM to integrate data from multiple sources on a transactional basis. This approach surfaces the hidden costs and improves timing and accessibility to the master data, but it still leaves multiple, disparate copies of data in various transaction systems. 3) Create an Enterprise MDM system of record for transaction systems data and facilitate the sharing of that data as needed. Let's take a closer look at each approach.

Approach 1: The Conformed Data Warehouse

The goal of creating a true enterprise understanding of the business has forced the data warehouse to deal with disparate sources. A decade ago, most transaction systems managers did not care about data integration because it was not a business priority. Their top priority was meeting the specific transaction needs as quickly as possible. The idea that the transaction systems should create and manage a centralized master data set was considered a grand vision at best, but not practical; more often, it was considered a good joke.

If data warehouse team members wanted a single master version of each dimension, we had to build it ourselves. We used the extract, transformation and load (ETL) process to pull data from multiple sources, cleanse, align, standardize, and integrate it in order to create what we call an enterprise conformed dimension (see "Approach 1: Master Data in the Conformed Data Warehouse").

The master conformed dimension contains attributes that are common across transaction systems, such as customer name, and attributes that are unique to individual systems, such as ship-to address. The master conformed dimension maps the source system keys to a single master surrogate key for each entry in the entity. This mapping is the Rosetta stone that ties data from one source system to another.

Once the master dimension is in place, the effort does not end. Changes and additions to the shared attributes from different source systems must be entered into the dimension. Also, the dimension management system is usually called upon to identify the best version of a shared attribute out of all possible versions that exist. This determination, known as survivorship, involves complex business rules and comparisons, and it's part of the ETL pipeline (described in the ETL Toolkit book).

To many, having the data warehouse create and manage master data feels wrong because it flies in the face of the fundamental tenets of quality. In the 1950s, Edward Deming and his counterparts taught us that you must fix quality problems at their source or you will be doomed to fix them forever. Every night, the ETL process will push its Sisyphean rock up to the top of the hill, only to have it knocked back down by the transaction systems the next day. Unfortunately, a majority of organizations have been unwilling to address the problem any other way, so we had to do it in the data warehouse.

Approach 2: The MDM Integration Hub

Most organizations have multiple customer touch points, and they often have a separate customer relationship management system and a Web-based customer interface that keeps some of its own customer data. Companies that want to address the problem of disparate data might begin by leaving each silo transaction data store in place while building a centralized master data store. A copy of all the attributes is placed in the MDM system, and all future updates are replicated into the MDM system from each source. As a result, the MDM system will contain multiple versions of the same attributes as they are collected by the various source systems. We call this version of the MDM system the Integration Hub because it serves as the point of integration for downstream systems (and in some cases for source systems as well).

This MDM approach usually generates the same valuable outputs as the data warehouse version: the source key mapping table, and the best, or master, value for each attribute – the equivalent of the survivorship function in the ETL process. Most MDM systems of this type employ complex rules that assess the accuracy and validity of each source for a given attribute and determine the best, or master, value for that attribute. These rules include an assessment of the relative accuracy of the different sources, how recently the source has been updated, where the update came from and so on. The Data Steward is responsible for creating and maintaining these rules through the MDM front end (see "Approach 2: The MDM Integration Hub").

The Integration Hub approach does not attempt to fix the problem at the source; it simply moves the cleansing, aligning, standardizing and integration upstream a bit. The Integration Hub is better than the data warehouse approach because it can operate at transaction service levels, letting some of the operational systems make calls to the Integration Hub to get the most current value of an attribute. It also becomes a data provider for downstream systems, like the data warehouse. At last, the data warehouse can stop pushing the rock up the hill. Applying this approach to customer data is essentially what customer data integration (CDI) is all about.

Approach 3: The Enterprise MDM System

The third approach MDM is to create a centralized database that holds the master version of all attributes of a given object along with the interface functions needed to tie this master data back to the transaction systems. In other words, the master database serves as the data store of record for all participating transaction processes. Rather than having each system maintain its own customer data, transaction systems all create, query and update attributes in the centralized MDM. The user-facing portion of each system must first identify the exact customer or product involved in a given transaction and obtain its unique key from the MDM. The MDM must keep all attributes that are (or might be) required by more than one system and must also allow each system to define its own custom attributes. In addition, master data systems must have standard APIs available to all client systems (see "Approach 3: The Enterprise MDM System).

The Enterprise MDM does not solve the problem of disparate data by itself. One system can define an attribute that is the equivalent of an already existing attribute. The data steward must be responsible for each master data object, and he or she will have cross-system responsibility for uniquely defining each attribute and monitoring the usage of those attributes. As Kimball Group has long espoused, strong, centralized data governance is the mandatory foundation for successfully creating information that's conformed across the enterprise.

This single, centralized database is obviously the ideal solution, but it requires every transaction system that shares data to give up ownership of shared data sets. This is not trivial because it requires rewriting core parts of each system, or at least redirecting the data requests to the external MDM system. On the other hand, the Enterprise MDM is the ideal place to start a service oriented architecture (SOA) deployment because a large number of clients could take advantage of the obvious reusability of the service. From what we've seen, far fewer than 10 percent of organizations have an Enterprise MDM system in place.

Four Steps Toward MDM

Regardless of how far you go towards creating an Enterprise MDM, there are several common steps you need to take:

Set up a data governance function. First, create a permanent data governance function that includes, or is managed by, the business side. This group must drive agreement on names, definitions, systems of record and business rules for every entity and attribute in question. Create an organizational process for ongoing review and enforcement to ensure correct usage and appropriate sharing of data.

Integrate existing data. You must take a first pass at integrating data from the various sources in the organization. This may be fairly simple for some attributes, like Product Category, once the business rules are defined. However, it is a massive effort if you need to match and deduplicate more complex entities such as Customer. You will never be 100 percent right. Remember that the resulting integrated data must be maintained. If this is as far as you get, consider yourself above average.

Work toward the ideal. Rewriting existing systems and reworking business processes to use the master data system rather than local data sets is a massive undertaking. The organization must be committed to correcting the problem at the source. This effort involves data governance resources, development resources, purchased software and an overall redesign of the core systems and the development process.

Enlist others. If you are getting no help from the larger organization, remember that the results of integration have value to downstream systems such as CRM and Customer Service. Try to enlist them in the fight for creating an integration hub or enterprise MDM system. All organizations should have a data stewardship function to define shared data elements and monitor their usage. You should also be practice MDM in the form of creating conformed dimensions in the data warehouse ETL process. Beyond this, the organization should be working on moving the MDM function upstream into the operational systems, initially as an integration hub, and ultimately as an enterprise master data management system.

Warren Thornthwaite is a member of the Kimball Group, teaches at Kimball University and is coauthor of The Microsoft Data Warehouse Toolkit (Wiley, 2006) and The Data Warehouse Lifecycle Toolkit (Wiley, 1998). Write to him at [email protected].

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights