SmartAdvice: Clean And Manage Company Data To Learn What Information You've Got - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

11:00 PM

SmartAdvice: Clean And Manage Company Data To Learn What Information You've Got

It's worth the effort to clean data files and manage your company's data before it manages you, The Advisory Council says. Also, adopt enforceable HR policies to discourage users from downloading programs that may contain malware; and five steps to consider in launching a mentoring program.

Editor's Note: Welcome to SmartAdvice, a weekly column by The Advisory Council (TAC), an advisory service firm. The feature answers three questions of core interest to you, ranging from career advice to enterprise strategies to how to deal with vendors. Submit questions directly to [email protected]

Question A: We're plagued by data quality problems such as inconsistent, incorrect, and redundant customer data (e.g., multiple records for the same customer, but with different spellings). What can we do about it?

Our advice: Murphy's law, applied to data, says that if the same data is stored in two different places, the information will be become inconsistent. Traditionally, redundancy and data hygiene aren't carefully controlled.

Modern enterprise systems address this problem by implementing data-synchronization procedures between integrated systems via publish-and-subscribe or request-and-respond interfaces or, better yet, a single instance of truth across all applications. Modern database systems incorporate data-integrity subsystems to manage data of all types and metadata in the repository, including:

  • Intra-record integrity to enforce constraints on data item values and types;

  • Referential integrity to enforce the validity of references between records; and

  • Concurrency control for multiple users.

Yet the problems of data hygiene persist and are pervasive due to mergers, changes in business requirements or business processes, or just statistically based on the growing volume of data. The best first step on the road to recovery is to assess the quality the data from the perspective of whether the data is good (valid) or bad (invalid) via a systematic audit process. Validity is a measure of relevance of the data to the process or analysis at hand.

The next step is to prioritize the data into A, B, and C priorities:

  • A-priority data must contain close to zero defects (e.g., error or omission could have high cost of failure). For example, the misspelling of a customer name could cost loss of the customer;

  • B-priority data is important second-priority data. For example, a misspelling in a catalog product description may be embarrassing, but it wouldn't detract from understanding the product; and

  • C-priority data is optional or noncritical data where the cost of omission and error is marginal, such as demographic data gathered only for statistical aggregation.

You must address cleansing of A-priority data. Then address B-priority data as resources allow, and as appropriate based a cost-benefit analysis.

Related Links

Advanced Data Cleansing with Oracle9i Warehouse Builder

Trillium Software Data Analytics

Initiate Customer Data Integration

Data scrubbing, also called data cleansing, is the process of amending or removing data that is incorrect, incomplete, improperly formatted, or duplicated. Using a data-scrubbing tool can save a significant amount of time and can be less costly than fixing errors manually.

Begin with a small sample of 50 to 100 occurrences of the A- and B-priority data and measure it for accuracy to get an idea of the extent of any accuracy problems.

Matching data redundantly stored in disparate databases is one of the painful data-cleansing problems. You should first seek to consolidate any duplicate records within a single file or database. Keep a cross-reference table to relate the surviving "occurrence-of-record" to the records that previously existed. This is used to redirect any business transaction using "old" identifiers to the occurrence-of-record. Also maintain an audit file with before and after images of the data to assure you can reconstruct the original records.

Then de-duplicate and consolidate the records within all the other redundant files, selecting the most reliable values for propagation. Correct and synchronize data values at each source for consistency to the extent possible. Maintain your cross-reference table of related occurrences.

The goal of data management is to provide the infrastructure to transform raw data into consistent, accurate, and reliable corporate information. Its foundation consists of a two-step process:

  • Data profiling -- Understanding the quality of the data you have; and

  • Data cleansing and integration -- Combining similar data from multiple sources.

Begin with discovery; end with enlightenment!

-- Peter Taglia

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 3
Comment  | 
Print  | 
More Insights
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
How GIS Data Can Help Fix Vaccine Distribution
Jessica Davis, Senior Editor, Enterprise Apps,  2/17/2021
Graph-Based AI Enters the Enterprise Mainstream
James Kobielus, Tech Analyst, Consultant and Author,  2/16/2021
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
White Papers
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll