Data acquired from mergers, acquisitions, foreign divisions, third-party data providers, and customers or partners entering information on their own should be suspect, analysts and database managers say. "But the worst thing of all is the paper document," First Health's Boeving says, because it can include errors from both the person who filled out the document and the data-entry clerk.
Even companies that have engaged in data cleansing once often lapse on maintenance. That leads to data-quality drift when data quality goes down without input from anyone simply because the data is no longer accurate relative to what's happening in the real world. Marriage, death, divorce, moves, change of business, and change of product suppliers all create drift, which is estimated at upwards of 2% a month for the U.S. population, according to Trillium Software.
"We try to stay on top of company-executive changes, but even with our research and subscription to multiple business-data providers, we sometimes end up listing two different CEOs," says Bill Schumacher, senior VP of content at business data aggregator and provider OneSource Information Services Inc.
Most companies still favor building their own data-scrubbing tools because simple data validation is relatively easy to program, and they think simple data cleansing is all they need. It's also much less expensive than the $100,000 to $200,000 that data-quality software can cost.
But complex data validation isn't easy to program or anticipate, so high-end data-quality tools offer much more.These suites, which vendors say typically take two to five days to implement, include both batch and real-time audits, repairs, records matching, and augmentation of additional data to records, such as geospatial data, worldwide postal-code information, or a new company affiliation or part information.
Sheer power is also a good reason to move to a commercial data- cleansing tool, says Peter Harvey, president and CEO of Intellidyn, a data-management and research firm. One of Intellidyn's jobs is to give clients credit data on U.S. consumers. "I have to update the credit history of everyone in the U.S. on my system on a frequent basis," Harvey says. The multiterabyte database is updated and checked using tools from DataFlux Corp. "This used to take days to do using home-built tools. Now we can perform the entire upload and clean in 16 hours," Harvey says.
Other companies use data-quality software to mitigate risk. FedEx's Insight service lets business customers enter their address, instead of account number or individual tracking codes, to see delivery information on single or multiple packages.
Since the "ship to" information is entered via a shipper-originated automated airbill, typos and other mistakes can enter the database at FedEx. FedEx is using data-quality tools from Trillium Software to compare and repair that information. Trillium's data-quality tools check the information against the customer's data, which is input into Insight upon registration for the online service. This ensures all shipments to a person or company are correctly grouped. It also means customers don't have to guess at misspellings of their names, companies, or addresses in order to find shipments coming to them. "We want to use the shipper-input information, but we just can't bet the initiative on them providing 100% accuracy," FedEx's Lesser says.
Strategies to fight data degradation range from the simple to the very complex, with results that typically map closely to the effort. That's why First Health, FleetBoston, FedEx, and OneSource take it so seriously. "Our executives don't ask about ROI. They ask if we'll lose out on an opportunity by not doing this," Boeving says. And your company's loss may be another's gain.