The Perception of Master Data Management
Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of its critical data to one file, called a master file, which provides a common point of reference. When properly done, MDM streamlines data sharing among personnel and departments. In addition, MDM can facilitate computing in multiple system architectures, platforms and applications. The benefits of the MDM paradigm increase as the number and diversity of organizational departments, worker roles and computing applications expand. For this reason, MDM is more likely to be of value to large or complex enterprises than to small, medium-sized or simple ones. When companies merge, the implementation of MDM can minimize confusion and optimize the efficiency of the new, larger organization. For MDM to function at its best, all personnel and departments must be taught how data is to be formatted, stored and accessed. Frequent, coordinated updates to the master data file are also essential.
Master data management (MDM) is meant to deliver a near real-time, hub-based and synchronized master record of information to any seat or point of view in the organization. Master records are created with data that is defined, integrated and reconciled from multiple systems (customer relationship management, financial, supply chain, marketing etc.) and classified by type (e.g. product master, customer master, location master etc.). MDM is often pursued by data type through programs that address Customer data integration (CDI) or product information management (PIM), though many observers believe true MDM requires reconciliation of all data types. Critical to MDM are the notions of data quality and matching, which technology tools can help to automate.
Most software systems have lists of data that are shared and used by several of the applications that make up the system. For example, a typical ERP system as a minimum will have a Customer Master, an Item Master, and an Account Master. This master data is often one of the key assets of a company. It’s not unusual for a company to be acquired primarily for access to its Customer Master data.
Essential data types
There are essentially five types of data in corporations:
Customer visit such as to Web site or facility; account
Product purchased or manufactured; SCM involvement
Unit acquired by opening a PO; approval process necessary
HR hires, numerous forms, orientation, benefits selection, asset allocations, office assignments
MDM project plan
An MDM project plan will be influenced by requirements, priorities, resource availability, time frame, and the size of the problem. Most MDM projects include at least these phases,
MDM is a complex process that can go on for a long time. Like most things in software, the key to success is to implement MDM incrementally, so that the business realizes a series of short-term benefits while the complete project is a long-term process. No MDM project can be successful without the support and participation of the business users. IT professionals do not have the domain knowledge to create and maintain high-quality master data. Any MDM project that does not include changes to the processes that create, maintain, and validate master data is likely to fail. The rest of this paper will cover the details of the technology and processes for creating and maintaining master data.
Creating a Master List
Whether you buy a tool or decide to roll your own, there are two basic steps to creating master data: clean and standardize the data, and match data from all the sources to consolidate duplicates. Before you can start cleaning and normalizing your data, you must understand the data model for the master data. As part of the modeling process, the contents of each attribute were defined, and a mapping was defined from each source system to the master-data model. This information is used to define the transformations necessary to clean your source data.
Cleaning the data and transforming it into the master data model is very similar to the Extract, Transform, and Load (ETL) processes used to populate a data warehouse. If you already have ETL tools and transformation defined, it might be easier just to modify these as required for the master data, instead of learning a new tool. Here are some typical data-cleansing functions:
Most tools will cleanse the data that they can, and put the rest into an error table for hand processing. Depending on how the matching tool works, the cleansed data will be put into a master table or a series of staging tables. As each source is cleansed, the output should be examined to ensure the cleansing process is working correctly.
Matching master-data records to eliminate duplicates is both the hardest and most important step in creating master data. False matches can actually lose data (two Acme Corporations become one, for example) and missed matches reduce the value of maintaining a common list. The matching accuracy of MDM tools is one of the most important purchase criteria. Some matches are pretty trivial to do. If you have Social Security numbers for all your customers, or if all your products use a common numbering scheme, a database JOIN will find most of the matches. This hardly ever happens in the real world, however, so matching algorithms are normally very complex and sophisticated. Customers can be matched on name, maiden name, nickname, address, phone number, credit-card number, and so on, while products are matched on name, description, part number, specifications, and price. The more attribute matches and the closer the match, the higher degree of confidence the MDM system has in the match. This confidence factor is computed for each match, and if it surpasses a threshold, the records match. The threshold is normally adjusted depending on the consequences of a false match. For example, you might specify that if the confidence level is over 95 percent, the records are merged automatically, and if the confidence is between 80 percent and 95 percent, a data steward should approve the match before they are merged.
Most merge tools merge one set of input into the master list, so the best procedure is to start the list with the data in which you have the most confidence, and then merge the other sources in one at a time. If you have a lot of data and a lot of problems with it, this process can take a long time. You might want to start with the data from which you expect to get the most benefit having consolidated; run a pilot project with that data, to ensure your processes work and you are seeing the business benefits you expect; and then start adding other sources, as time and resources permit. This approach means your project will take longer and possibly cost more, but the risk is lower. This approach also lets you start with a few organizations and add more as the project demonstrates success, instead of trying to get everybody on board from the start.
At this point, if your goal was to produce a list of master data, you are done. Print it out or burn it to a CD, and move on. If you want your master data to stay current as data is added and changed, you will have to develop infrastructure and processes to manage the master data over time. The next section provides some options on how to do just that.
Master data management best practices
When considering a new discipline like master data management (MDM), it’s only natural to seek out people who have been there and done that.
But MDM best practices are still emerging and it’s not easy to get organizations to talk about their MDM experiences. Kalido Inc., a Burlington, Mass.-based MDM technology vendor, admits that it has a hard time getting customers to talk to the press.
All this secrecy around successful MDM programs doesn’t help companies looking for best practices, which is partly why Kalido sponsored a customer audit and MDM best practices study by San Mateo, Calif.-based analyst firm Ventana Research. Its researchers examined the best practices of five anonymous Kalido customers to reach their conclusions. The Ventana study, an experienced consultant, and a European telecom maker finally shed some light on the best (and worst) practices for MDM success.
1. Get business involved — or in charge.
“MDM has to be driven by business needs, otherwise it may turn out to be just another database that must be synchronized with all the other ones,” said David Loshin, president of Knowledge Integrity Inc., a Silver Spring, Md.-based consultancy that provides an MDM strategy development service and has worked on enterprise-scale initiatives.
Similarly, the Ventana study found that businesspeople, rather than IT, should drive the process. Support ranging from C-level executives to senior managers to business end users was critical for success, Ventana found. It’s often hard to motivate an organization to get behind the dry prospect of MDM, but early enterprise-wide support is important in the long run, users said. If key corporate goals are tied to the project through a solid business case, it should be a straightforward task to demonstrate benefits and generate excitement.
2. Allow ample time for evaluation and planning.
Plan at least three months for evaluation, talk to reference customers, and do a proof-of-value project with samples of real company data, Kalido users told Ventana researchers. Don’t underestimate the time and expertise needed to develop foundational data models, users said.
“It’s more complex than people realize — and that requires starting early and using real data for planning,” said David Waddington, a Ventana vice president and research director who worked on the study.
IT’s cooperation was an area of concern, as some companies have experienced delays in projects waiting for permission and access rights, Ventana found.
3. Have a big vision, but take small steps.
Consider the ultimate goal, but limit the scope of the initial deployment, users told Ventana. Once MDM is working in one place, extend it step by step, they advised. Business processes, rather than technology, are often the mitigating factor, they said, so it’s important to get end-user input early in the process.
“If you’re just interested in getting consistent customer data, it’s very important to do that against the bigger background of ‘how am I going to manage all of my master data longer term?'” Waddington explained. “Then you don’t end up in the situation [of] having to link together a whole lot of different solutions.”
4. Consider potential performance problems.
Performance is the 800-pound gorilla quietly lurking in the MDM discussion, Loshin cautioned.
Different architectures can mean different performance penalties. For example, if a company uses the master hub style of MDM, record creation flows through a single point, which can become a bottleneck. Also, with many applications relying on MDM, the workflow, system priorities and order of operations become critical issues to consider up front. How companies solve this potential performance problem varies, Loshin said, because it’s inherently related to their unique architectures.
5. Institute data governance policies and processes.
Allow time and money for people and process change management, and don’t underestimate the size of the job, experts agreed. Swedish telecom equipment maker Ericsson learned that the politics of data governance can be quite difficult, according to Roderick Hall, senior project manager. Long before deploying SAP MDM, the Stockholm-based company instituted a master data group to manage critical data assets. It’s a “shared services” group that provides services to both IT and business. The group started as part of the finance department, but the function changed with the realization that master data management was a company-wide concern, Hall said. Their job isn’t always easy.
Although some departments, such as finance, saw the value of centralizing master data management, Hall said, other groups were reluctant to give up data ownership.
“To get acceptance of the fact that people have got to give up the freedom to correct their own master data to some faceless group in Stockholm [where the master data group is located] has been a pretty hard battle,” Hall said.
6. Carefully plan deployment.
MDM is still relatively new, so training of business and technical people is more important than ever, Ventana found. Using untrained or semi-trained systems integrators and outsourcing attempts caused major problems and project delays for MDM users, Waddington said.
Then, there’s the prospect of rolling out a program that has an impact on many critical processes and systems — no trivial concern. Loshin recommended that companies should plan an MDM transition strategy that allows for static and dynamic data synchronization.
“Trying to adjust the underlying infrastructure without affecting day-to-day operations can be as challenging as fixing potholes in the highway without disrupting traffic,” Loshin said.
There are three basic styles of architecture used for MDM hubs: the registry, the repository, and the hybrid approach. The hybrid approach is really a continuum of approaches between the two extremes of registry and repository.
While master data management solutions may take many forms, most of them share similar architecture. This architecture is what allows for the accurate, consistent management of data and data processes by maintaining a structured environment under which MDM tools can operate. At the core of these systems is the MDM hub, a database in which master data is cleaned, collected and stored. MDM solutions may use multiple hubs to govern different sets of data, such as product information, customer data and site data, and each hub generally utilizes one of three common models: transaction/repository, registry, or hybrid.
In a transaction/repository-style hub, all relevant data is stored and accessed from a single database, and the database must contain all of the information needed by the different applications which access it. All data is consolidated and centralized, and published to the individual data sources after it has been linked and matched. This style of hub allows for a single source of data to be created, minimizing duplication by making it easier to detect as data is collected and cleaned. However, the transaction/repository style has drawbacks as well. Existing applications may have to be modified to use the master data, and in some cases this is not possible. Different applications and services which serve as an interim interface between the MDM software and the data-dependent applications may be needed and this can add to costs. Also, data models need to be complex enough to include all relevant information for the applications that utilize them, but not so large that they become overly large.
Registry style hubs, in contrast, do not store master data in the hub, but rather master data is maintained within native application databases. The hub instead stores lists of keys with which to access all relevant attributes for a specific master data entity, linking these attributes between application databases. The registry style hub allows for applications to remain fairly intact as all data is managed within native databases. However, when requests are made to access master data, data must be located, a query must be distributed between numerous databases, then a list of the requested data must be formed all in real time, and as the number of source databases grows, this can become increasingly inefficient. In addition, duplicate data entities can reside on different databases, or even within the same database, and while consolidation and cleaning of individual databases would be ideal, it is not always practical. Another disadvantage is that when new databases are to be included in the hub registry, new keys must be added to the existing tables, which may also require altering how queries are generated.
Figure 1. MDM hub architecture
Hybrid style hubs utilize methods from both transaction/repository and registry style hubs, and try to address some of the issues present in each. Since it may not be practical to update existing applications or to send inefficient, massive queries across several databases, the hybrid system combines some of the advantages present in the other models by leaving master data on the native databases, generating keys and IDs to access this data, but replicating some of its important attributes to the hub. When queries are made, the hub can service the more common requests, and queries only need to be distributed for the less-used attributes, which results in a more efficient process. While the hybrid style combines advantages of both of its parent models, it has its own disadvantages. Since it stores replicated data from outlying databases, it may run into updating issues, and, like the transaction/repository style, deciding which attributes to store, naming to be used and format to store them in can create problems.
The heterogeneous (and proprietory) nature of MDM’s components and modules makes training and prototyping the first priority for an IT shop that has just embarked on a MDM implementation. DBAs, System Administrators and Basis professionals should look very closely at MDM for opportunities to implement best practices learned on other application suites. Solution Architects, Developers and Data Modelers should attempt to apply and scale their existing SDLC discipline for design, development, documentation and production-support, to MDM.
 Master Data Management, By Loshin, 16 Sep 2008,Elsevier
 Master Data Management and Customer Data Integration for a Global Enterprise by Alex Berson-The MK/OMG Press
 Data Quality Articles Journal