Voting rights issues are at stake in a case in South Carolina that poses a classic and complicated MDM problem.
As anyone who has tried to compile a “reliable” list of, say, customers from equally “reliable” sources--websites, CRM and ERP systems, sales and marketing applications, established third-party data sources, and the like--has found out, this can be difficult. There are three reasons for this:
1) Data is vertically and horizontally fragmented across sources, because some sources have only some attributes. For example, one source has name, date of birth, and address, while another has name and Social Security number, making them vertically fragmented. Some sources have only sub-lists of customers, making them horizontally fragmented. For example, websites only have users who actually use the site, while CRM systems have current and some past customers.
2) Data quality in nearly all our sources and systems is--and there's no kind way to say this--atrocious. Names are incomplete, Social Security numbers unreliable, addresses incorrect, and different sources have different values for the same attribute.
3) Various organizations have different perspectives on the interpretation of information. Take something as simple as gender: To begin with, is “gender” the same as “sex”? Of course not. How many categories of gender (or sex, as you will) do we need to define? Read this and decide for yourself. Bottom line the definition of gender in your sources or databases will depend on what purpose the information serves at source. The same is true for many other seemingly obvious entities and attributes.
As a result, it becomes incredibly difficult to take a list of customers from, say, your website and another list from your legacy system, and merge them into one comprehensive, complete, reliable list: How do you determine if “G. P. Burdell” with birth date June 8, 1927, is the same as “George Burdel” with birth date 08-06-27?
This is where master data management comes in. MDM takes in data from multiple sources, weeds out duplicates, and creates a “golden record” for each entity instance. It takes the best value for each attribute--e.g. citizen, customer, supplier, etc.--from the sources, using fuzzy logic and algorithms to match the data. In our example, MDM determines that the correct person is George P. Burdell with birth date June 8, 1927. This approach is no silver bullet, of course, and can be expensive to implement, but it does a good job of creating a unique list of entities.
Coming back to voter identification, it’s not clear if either the state of South Carolina or the DOJ are using (or plan to use) MDM to arrive at an agreeable voter count. However, the matter is likely headed to the Supreme Court, so maybe that’s where we'll see an MDM solution implemented. I can see it coming handy in a variety of situations in years to come, and justice will be well served.
Rajan Chandras has more than 20 years of experience advising and leading business technology initiatives, with a focus on strategy and information management. Write him at rchandras at gmail dot com.
According to our Outlook 2012 Survey, IT should expect soaring demand but cautious hiring as companies use technology to try to get closer to customers. Also in the new, all-digital issue of InformationWeek: Inside Windows Server 8. (Free registration required.)
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
InformationWeek Tech Digest, Nov. 10, 2014Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?