The Perception of Master Data Management

The Need for Global Ethics: Dogmas Reduction Benefits the World Over
January 30, 2017
New Tax Increases for Health Care Reform
January 30, 2017

The Perception of Master Data Management


Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of its critical data to one file, called a master file, which provides a common point of reference. When properly done, MDM streamlines data sharing among personnel and departments. In addition, MDM can facilitate computing in multiple system architectures, platforms and applications. The benefits of the MDM paradigm increase as the number and diversity of organizational departments, worker roles and computing applications expand. For this reason, MDM is more likely to be of value to large or complex enterprises than to small, medium-sized or simple ones. When companies merge, the implementation of MDM can minimize confusion and optimize the efficiency of the new, larger organization. For MDM to function at its best, all personnel and departments must be taught how data is to be formatted, stored and accessed. Frequent, coordinated updates to the master data file are also essential.


Master data management (MDM) is meant to deliver a near real-time, hub-based and synchronized master record of information to any seat or point of view in the organization. Master records are created with data that is defined, integrated and reconciled from multiple systems (customer relationship management, financial, supply chain, marketing etc.) and classified by type (e.g. product master, customer master, location master etc.). MDM is often pursued by data type through programs that address Customer data integration (CDI) or product information management (PIM), though many observers believe true MDM requires reconciliation of all data types. Critical to MDM are the notions of data quality and matching, which technology tools can help to automate.

Master Data

Most software systems have lists of data that are shared and used by several of the applications that make up the system. For example, a typical ERP system as a minimum will have a Customer Master, an Item Master, and an Account Master. This master data is often one of the key assets of a company. It’s not unusual for a company to be acquired primarily for access to its Customer Master data.

Essential data types

There are essentially five types of data in corporations:

  • Unstructured—This is data found in e-mail, white papers like this, magazine articles, corporate intranet portals, product specifications, marketing collateral, and PDF files.
  • Transactional—This is data related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and non-monetary interactions.
  • Metadata—This is data about other data and may reside in a formal repository or in various other forms such as XML documents, report definitions, column descriptions in a database, log files, connections, and configuration files.
  • Hierarchical—Hierarchical data stores the relationships between other data. It may be stored as part of an accounting system or separately as descriptions of real-world relationships, such as company organizational structures or product lines. Hierarchical data is sometimes considered a super MDM domain, because it is critical to understanding and sometimes discovering the relationships between master data.
  • Master—Master data are the critical nouns of a business and fall generally into four groupings: people, things, places, and concepts. Further categorizations within those groupings are called subject areas, domain areas, or entity types. For example, within people, there are customer, employee, and salesperson. Within things, there are product, part, store, and asset. Within concepts, there are things like contract, warrantee, and licenses. Finally, within places, there are office locations and geographic divisions. Some of these domain areas may be further divided. Customer may be further segmented, based on incentives and history. A company may have normal customers, as well as premiere and executive customers. Product may be further segmented by sector and industry. The requirements, life cycle, and CRUD cycle for a product in the Consumer Packaged Goods (CPG) sector is likely very different from those of the clothing industry. The granularity of domains is essentially determined by the magnitude of differences between the attributes of the entities within them

Life Cycle- CRUD cycle

Master data can be described by the way that it is created, read, updated, deleted, and searched. This life cycle is called the CRUD cycle.






Customer visit such as to Web site or facility; account

Product purchased or manufactured; SCM involvement

Unit acquired by opening a PO; approval process necessary

HR hires, numerous forms, orientation, benefits selection, asset allocations, office assignments


Contextualized views based on credentials of viewer

Periodic inventory catalogues

Periodic reporting purposes, figuring depreciation, verification

Office access, reviews, insurance-claims, immigration


Address, discounts, phone number, preferences, credit accounts

Packaging changes, raw materials changes

Packaging changes, raw materials changes

Immigration status, marriage status, level increase, raises, transfers


Death, bankruptcy, liquidation, do-not-call.

Canceled, replaced, no longer available

Obsolete, sold, destroyed, stolen, scrapped

Termination, death


CRM system, call-center system, contact-management system

ERP system, orders-processing system

GL tracking, asset DB management

HR LOB system

Data to be Managed

o       Behavior

o       Life Cycle

o       Cardinality

o       Lifetime

o       Complexity

o       Value

o       Volatility

MDM project plan

An MDM project plan will be influenced by requirements, priorities, resource availability, time frame, and the size of the problem. Most MDM projects include at least these phases,

  • · Identify sources of master data.
  • · Identify the producers and consumers of the master data
  • Collect and analyze metadata about for your master data
  • · Appoint data stewards
  • · Implement a data-governance program and data-governance council.
  • · Develop the master-data model
  • · Choose a toolset
  • · Design the infrastructure
  • · Generate and test the master data
  • · Modify the producing and consuming systems
  • · Implement the maintenance processes.

MDM is a complex process that can go on for a long time. Like most things in software, the key to success is to implement MDM incrementally, so that the business realizes a series of short-term benefits while the complete project is a long-term process. No MDM project can be successful without the support and participation of the business users. IT professionals do not have the domain knowledge to create and maintain high-quality master data. Any MDM project that does not include changes to the processes that create, maintain, and validate master data is likely to fail. The rest of this paper will cover the details of the technology and processes for creating and maintaining master data.

Creating a Master List

Whether you buy a tool or decide to roll your own, there are two basic steps to creating master data: clean and standardize the data, and match data from all the sources to consolidate duplicates. Before you can start cleaning and normalizing your data, you must understand the data model for the master data. As part of the modeling process, the contents of each attribute were defined, and a mapping was defined from each source system to the master-data model. This information is used to define the transformations necessary to clean your source data.

Cleaning the data and transforming it into the master data model is very similar to the Extract, Transform, and Load (ETL) processes used to populate a data warehouse. If you already have ETL tools and transformation defined, it might be easier just to modify these as required for the master data, instead of learning a new tool. Here are some typical data-cleansing functions:

  • Normalize data formats. Make all the phone numbers look the same, transform addresses (and so on) to a common format.
  • Replace missing values. Insert defaults, look up ZIP codes from the address, look up the Dun & Bradstreet number.
  • Standardize values. Convert all measurements to metric, convert prices to a common currency, change part numbers to an industry standard.
  • Map attributes. Parse the first name and last name out of a contact-name field, move Part# and partno to the PartNumber field.

Most tools will cleanse the data that they can, and put the rest into an error table for hand processing. Depending on how the matching tool works, the cleansed data will be put into a master table or a series of staging tables. As each source is cleansed, the output should be examined to ensure the cleansing process is working correctly.

Matching master-data records to eliminate duplicates is both the hardest and most important step in creating master data. False matches can actually lose data (two Acme Corporations become one, for example) and missed matches reduce the value of maintaining a common list. The matching accuracy of MDM tools is one of the most important purchase criteria. Some matches are pretty trivial to do. If you have Social Security numbers for all your customers, or if all your products use a common numbering scheme, a database JOIN will find most of the matches. This hardly ever happens in the real world, however, so matching algorithms are normally very complex and sophisticated. Customers can be matched on name, maiden name, nickname, address, phone number, credit-card number, and so on, while products are matched on name, description, part number, specifications, and price. The more attribute matches and the closer the match, the higher degree of confidence the MDM system has in the match. This confidence factor is computed for each match, and if it surpasses a threshold, the records match. The threshold is normally adjusted depending on the consequences of a false match. For example, you might specify that if the confidence level is over 95 percent, the records are merged automatically, and if the confidence is between 80 percent and 95 percent, a data steward should approve the match before they are merged.

Most merge tools merge one set of input into the master list, so the best procedure is to start the list with the data in which you have the most confidence, and then merge the other sources in one at a time. If you have a lot of data and a lot of problems with it, this process can take a long time. You might want to start with the data from which you expect to get the most benefit having consolidated; run a pilot project with that data, to ensure your processes work and you are seeing the business benefits you expect; and then start adding other sources, as time and resources permit. This approach means your project will take longer and possibly cost more, but the risk is lower. This approach also lets you start with a few organizations and add more as the project demonstrates success, instead of trying to get everybody on board from the start.

Another factor to consider when merging your source data into the master list is privacy. When customers become part of the customer master, their information might be visible to any of the applications that have access to the customer master. If the customer data was obtained under a privacy policy that limited its use to a particular application, you might not be able to merge it into the customer master. You might want to add a lawyer to your MDM planning team.

At this point, if your goal was to produce a list of master data, you are done. Print it out or burn it to a CD, and move on. If you want your master data to stay current as data is added and changed, you will have to develop infrastructure and processes to manage the master data over time. The next section provides some options on how to do just that.

Master data management best practices

When considering a new discipline like master data management (MDM), it’s only natural to seek out people who have been there and done that.

But MDM best practices are still emerging and it’s not easy to get organizations to talk about their MDM experiences. Kalido Inc., a Burlington, Mass.-based MDM technology vendor, admits that it has a hard time getting customers to talk to the press.

All this secrecy around successful MDM programs doesn’t help companies looking for best practices, which is partly why Kalido sponsored a customer audit and MDM best practices study by San Mateo, Calif.-based analyst firm Ventana Research. Its researchers examined the best practices of five anonymous Kalido customers to reach their conclusions. The Ventana study, an experienced consultant, and a European telecom maker finally shed some light on the best (and worst) practices for MDM success.

1. Get business involved — or in charge.

“MDM has to be driven by business needs, otherwise it may turn out to be just another database that must be synchronized with all the other ones,” said David Loshin, president of Knowledge Integrity Inc., a Silver Spring, Md.-based consultancy that provides an MDM strategy development service and has worked on enterprise-scale initiatives.

Similarly, the Ventana study found that businesspeople, rather than IT, should drive the process. Support ranging from C-level executives to senior managers to business end users was critical for success, Ventana found. It’s often hard to motivate an organization to get behind the dry prospect of MDM, but early enterprise-wide support is important in the long run, users said. If key corporate goals are tied to the project through a solid business case, it should be a straightforward task to demonstrate benefits and generate excitement.

2. Allow ample time for evaluation and planning.

Plan at least three months for evaluation, talk to reference customers, and do a proof-of-value project with samples of real company data, Kalido users told Ventana researchers. Don’t underestimate the time and expertise needed to develop foundational data models, users said.

“It’s more complex than people realize — and that requires starting early and using real data for planning,” said David Waddington, a Ventana vice president and research director who worked on the study.

IT’s cooperation was an area of concern, as some companies have experienced delays in projects waiting for permission and access rights, Ventana found.

3. Have a big vision, but take small steps.

Consider the ultimate goal, but limit the scope of the initial deployment, users told Ventana. Once MDM is working in one place, extend it step by step, they advised. Business processes, rather than technology, are often the mitigating factor, they said, so it’s important to get end-user input early in the process.

“If you’re just interested in getting consistent customer data, it’s very important to do that against the bigger background of ‘how am I going to manage all of my master data longer term?'” Waddington explained. “Then you don’t end up in the situation [of] having to link together a whole lot of different solutions.”

4. Consider potential performance problems.

Performance is the 800-pound gorilla quietly lurking in the MDM discussion, Loshin cautioned.

Different architectures can mean different performance penalties. For example, if a company uses the master hub style of MDM, record creation flows through a single point, which can become a bottleneck. Also, with many applications relying on MDM, the workflow, system priorities and order of operations become critical issues to consider up front. How companies solve this potential performance problem varies, Loshin said, because it’s inherently related to their unique architectures.

5. Institute data governance policies and processes.

Allow time and money for people and process change management, and don’t underestimate the size of the job, experts agreed. Swedish telecom equipment maker Ericsson learned that the politics of data governance can be quite difficult, according to Roderick Hall, senior project manager. Long before deploying SAP MDM, the Stockholm-based company instituted a master data group to manage critical data assets. It’s a “shared services” group that provides services to both IT and business. The group started as part of the finance department, but the function changed with the realization that master data management was a company-wide concern, Hall said. Their job isn’t always easy.

Although some departments, such as finance, saw the value of centralizing master data management, Hall said, other groups were reluctant to give up data ownership.

“To get acceptance of the fact that people have got to give up the freedom to correct their own master data to some faceless group in Stockholm [where the master data group is located] has been a pretty hard battle,” Hall said.

6. Carefully plan deployment.

MDM is still relatively new, so training of business and technical people is more important than ever, Ventana found. Using untrained or semi-trained systems integrators and outsourcing attempts caused major problems and project delays for MDM users, Waddington said.

Then, there’s the prospect of rolling out a program that has an impact on many critical processes and systems — no trivial concern. Loshin recommended that companies should plan an MDM transition strategy that allows for static and dynamic data synchronization.

“Trying to adjust the underlying infrastructure without affecting day-to-day operations can be as challenging as fixing potholes in the highway without disrupting traffic,” Loshin said.

MDM Architecture

There are three basic styles of architecture used for MDM hubs: the registry, the repository, and the hybrid approach. The hybrid approach is really a continuum of approaches between the two extremes of registry and repository.

While master data management solutions may take many forms, most of them share similar architecture. This architecture is what allows for the accurate, consistent management of data and data processes by maintaining a structured environment under which MDM tools can operate. At the core of these systems is the MDM hub, a database in which master data is cleaned, collected and stored. MDM solutions may use multiple hubs to govern different sets of data, such as product information, customer data and site data, and each hub generally utilizes one of three common models: transaction/repository, registry, or hybrid.

In a transaction/repository-style hub, all relevant data is stored and accessed from a single database, and the database must contain all of the information needed by the different applications which access it. All data is consolidated and centralized, and published to the individual data sources after it has been linked and matched. This style of hub allows for a single source of data to be created, minimizing duplication by making it easier to detect as data is collected and cleaned. However, the transaction/repository style has drawbacks as well. Existing applications may have to be modified to use the master data, and in some cases this is not possible. Different applications and services which serve as an interim interface between the MDM software and the data-dependent applications may be needed and this can add to costs. Also, data models need to be complex enough to include all relevant information for the applications that utilize them, but not so large that they become overly large.

Registry style hubs, in contrast, do not store master data in the hub, but rather master data is maintained within native application databases. The hub instead stores lists of keys with which to access all relevant attributes for a specific master data entity, linking these attributes between application databases. The registry style hub allows for applications to remain fairly intact as all data is managed within native databases. However, when requests are made to access master data, data must be located, a query must be distributed between numerous databases, then a list of the requested data must be formed all in real time, and as the number of source databases grows, this can become increasingly inefficient. In addition, duplicate data entities can reside on different databases, or even within the same database, and while consolidation and cleaning of individual databases would be ideal, it is not always practical. Another disadvantage is that when new databases are to be included in the hub registry, new keys must be added to the existing tables, which may also require altering how queries are generated.

Figure 1. MDM hub architecture

Hybrid style hubs utilize methods from both transaction/repository and registry style hubs, and try to address some of the issues present in each. Since it may not be practical to update existing applications or to send inefficient, massive queries across several databases, the hybrid system combines some of the advantages present in the other models by leaving master data on the native databases, generating keys and IDs to access this data, but replicating some of its important attributes to the hub. When queries are made, the hub can service the more common requests, and queries only need to be distributed for the less-used attributes, which results in a more efficient process. While the hybrid style combines advantages of both of its parent models, it has its own disadvantages. Since it stores replicated data from outlying databases, it may run into updating issues, and, like the transaction/repository style, deciding which attributes to store, naming to be used and format to store them in can create problems.


The heterogeneous (and proprietory) nature of MDM’s components and modules makes training and prototyping the first priority for an IT shop that has just embarked on a MDM implementation. DBAs, System Administrators and Basis professionals should look very closely at MDM for opportunities to implement best practices learned on other application suites. Solution Architects, Developers and Data Modelers should attempt to apply and scale their existing SDLC discipline for design, development, documentation and production-support, to MDM.





[4]   Master Data Management, By Loshin, 16 Sep 2008,Elsevier

[5]   Master Data Management and Customer Data Integration for a Global Enterprise by Alex Berson-The MK/OMG Press

[6]   Data Quality Articles Journal

Source by V V Narendra Kumar

Related eBooks

Leave a Reply

Your email address will not be published. Required fields are marked *