A home in the small town of Valparaiso, Ind., valued at $121,900 somehow wound up recorded in Porter County's computer system as being worth a whopping $400 million. Naturally, the figure ended up on documents used to calculate tax rates. By the time the blunder was uncovered in February, the damage was done: Valparaiso, its school district, and government agencies were forced to slash their budgets by $3.1 million when they found they wouldn't be getting the tax dollars after all.
It's a nightmare scenario--and one like it could be yours. Bad data remains a major cause of botched marketing campaigns, failed CRM and data warehouse projects, angry customers, and lunkhead decisions. Despite all we know about the importance of data scrubbing and quality management, many companies are still using data that's redundant, incomplete, conflicting, outdated, and just plain wrong.
Bad data isn't a new problem, but urgency in dealing with it is at an all-time high. Customers are voicing anger at the mistargeted marketing pitches and poor service that result from off-the-mark data, and they're taking their business elsewhere. Companies are investing billions of dollars in CRM applications and data integration projects to gain a better view of their customers--only to discover that conflicting data makes them blind. "Our marketing effectiveness leads to our sales effectiveness, which leads to our service effectiveness. Data quality is key to the success of that," says Chuck Scoggins, VP of customer solutions at Hilton Hotels. "If you don't have quality data, that whole chain breaks down."
Managers and employees increasingly base decisions on insights gleaned from performance management applications and dashboards. But business intelligence tools are only as good as the data that goes into them; faulty data leads to ill-informed decisions. The ramifications range from ticked-off customers to misled investors to testy regulators. Executives can face jail time under the Sarbanes-Oxley Act if they don't have financial data in order. Bad data can even increase the cost and time involved in completing mergers by making it more difficult to integrate operations and combine customer lists.
The problem is getting harder to manage as the amount of data generated and maintained by many businesses doubles every 12 to 18 months. And as more businesses share information with outside partners and customers, more bad data is being exposed to others. Lax quality is familiar to anyone with a mail box: Consumers get credit card pitches from issuers with which they already have cards, mailings from charities in triplicate with slightly different name spellings, and warranty extension offers from auto dealers for cars they no longer own.
Occasional inconvenience for consumers aside, low-quality data is foremost a problem for the company holding it. Bad data can be an embarrassment--companies are loath to talk openly about internal data disasters. Businesses may be legally bound to share information about security breaches that result in consumers' personal information being compromised, but that's not the case with bad data. As a result, tales of mishaps are hard to come by, even as the problem persists.
The biggest obstacle to fixing the mess is that business managers view data quality as a technical problem, when business processes are really what's broken. IT has little control over the sales rep who gets a customer address wrong on an order or the manufacturing manager who enters an incorrect part number in an inventory database. A Gartner survey of 600 executives in November found that 49% think the IT department is responsible for their organizations' data quality; much smaller numbers say responsibility lies with top execs, data quality teams, line-of-business managers, and others.
"Business has to accept the fact that it has primary responsibility for data quality. Data is a business asset," says Nigel Turner, who as project lead manager for data quality programs at BT Group (formerly British Telecom) in the late '90s helped get that company's data cleanup efforts off the ground.
Gartner estimates that more than 25% of critical data within large businesses is somehow inaccurate or incomplete. And that imprecise data is wreaking havoc. Fifty-three percent of the 750 IT professionals and business executives surveyed by the Data Warehousing Institute late last year said their companies had experienced problems and suffered losses or increased costs because of poor-quality data, up from 44% in a similar survey in 2001.
While IT managers may not own the processes that spew bad data, they can make the business case to change those processes to improve data quality. Moreover, they can provide the technology to support those improved processes and, since no process is perfect, operate the tools needed to automate the downstream steps of identifying and correcting bad data.
Turner, then in BT's corporate strategy division, recognized that the telecommunications company was spending a great deal of effort correcting data. Rather than create a top-down, companywide program, Turner targeted line-of-business operations and identified a data quality "champion" in each to lead an information management forum. The groups targeted specific projects with demonstrable returns on investment, such as improving names and addresses in marketing data to reduce the number of letters sent to the wrong people and improving private-line inventory record keeping to increase the number of disconnected circuits returned to stock for reuse.
"We had to prove to BT that these things were worth doing," Turner says. "Data quality isn't very sexy." The original budget for the data quality efforts was a measly $30,000. As the project expanded, Turner's group developed a data quality methodology incorporating best practices gleaned from inside the company and from outside experts, and centralized data quality management. Recognizing that errors will creep into databases despite its best efforts, BT uses data profiling and cleansing tools from Trillium to identify and remove errant data.
The efforts have paid off: BT has realized as much as $800 million in aggregate savings by improving inventory management, boosting productivity through improved automated interactions with suppliers and customers, and reducing revenue leakage through more accurate billing. BT has parlayed its data quality know-how into a consulting business headed by Turner.
Still, data quality problems are legion and seem to exist to some degree at all manner of companies that manage large quantities of information. Darren Cunningham, product marketing director at Business Objects, shares the story of a consumer technology manufac- turer that routinely sent half of its catalogs to the wrong addresses until a manager pointed out the high number of catalog returns and customer complaints. Taking steps to correct the problem saved the company $12 million a year, Cunningham says.
Data quality initiatives can be part of broader data governance programs. Data governance, a relatively new concept, applies best practices to how information is managed, secured, and used across an organization. It requires establishing a formal set of business processes and policies to ensure that data is handled in a prescribed fashion. Data governance includes standard definitions for data elements to be used throughout a company--just what a "lost customer" is, for example--and metrics for measuring data quality, says Terry Haas, director of the enterprise data management practice at PricewaterhouseCoopers. Data governance also defines the data management roles and responsibilities of managers and employees and limits the ability to change data to designated "data stewards."
There's no standard way of measuring data quality. Bank of America and Cintas use Six Sigma as a yardstick. (Six Sigma is a methodology for measuring and removing defects from everything from data to manufactured products.) Hilton Hotels uses the probability of correctness indicator, or PCI, which assigns data a rank of one through nine based on its trustworthiness. Hilton rates 95% of its customer data at the high end, in the one through four categories. But, reflecting an emphasis on measuring data quality projects by their ROI, BT's Turner says the one metric that matters is money.
The bank designated data stewards in business units and the IT department, and some with companywide responsibility. Data quality managers meet monthly to resolve problems. Bank of America uses commercial and custom-built data profiling and matching tools to examine and, when necessary, correct data sent to the warehouse. Today, in addition to regulatory compliance, the bank's data quality efforts are driven by its risk management practices, the need to manage customer data from multiple channels, and cross-selling efforts.
Integrating data from multiple business operations has also been a challenge at Cintas, which created new divisions as it expanded beyond its core employee uniform business into areas such as providing businesses with cleaning supplies and document storage and shredding services. That has resulted in customer data silos throughout the company, database marketing manager Becki Wessel says.
To help with cross-selling, data from all divisions is collected in a data warehouse, but the information is sometimes duplicated with slight variations. Some customers are listed in multiple databases but with enough variation in name or address to be identified as different people. Those discrepancies have sometimes led to existing customers being identified as new prospects--an embarrassing situation when a sales rep shows up. An added danger is that sales reps could begin to distrust leads provided by marketing, Wessel says. Or two customers could be close enough in spelling to be tagged as the same customer, costing the company a sales opportunity.
As part of a project to overhaul its data warehouse, Cintas has been installing quality management software from Dataflux that will identify duplicate customer records and standardize customer data collected monthly from each division's database. The system is expected to be fully functional by next month, but a pilot project already has improved the company's ability to match customer names.
While Cintas is integrating customer data on a monthly batch basis, other companies do so in real or near-real time, which makes data quality even more difficult. More companies also are adding third-party data that may be erroneous or inconsistent. Bank of America's Carlson notes that the globalization of business--and data sources--further complicates the problem.
But it's difficult to sort out which category to put all the other guests in when they make a reservation. Hilton uses a combination of custom-built tools and software from Group 1 (owned by Pitney Bowes) to match a guest's name and address with information already in the database. That includes a Soundex algorithm that matches names based on phonetic pronunciation rather than spelling. Only 40% of all customers are matched with an existing profile, and new profiles are created for the rest, Scoggins says.
Master data management projects tied to CRM, ERP, and supply chain management systems are some of the biggest drivers of data quality programs, AMR Research analyst Bill Swanton says. Master data management involves using a centrally managed database of customer names, product numbers, and other critical data. "Typically, we see people getting the data quality religion because they implement a big, expensive IT project and it doesn't work," Swanton says.
At BMW Group Canada, customer data is generated by retail and call center operations, company-sponsored events, and direct mail and Internet marketing campaigns. Since June, the company has been centralizing all that customer data in a Siebel CRM system and using data matching software from Trillium to eliminate duplicate customer records, standardize names and addresses, and fill out incomplete records. Before the system was installed, dealers, BMW financial services, and other operations had their own customer databases, and customers complained that they had no central point of contact, marketing services manager Kelly Lam says. The company is also saving on mailing costs by complying with address format standards set by Canada Post, he says.
But the ultimate goal of data quality improvement is to catch errors at the point of entry or, even better, prevent errors from occurring at all, says Philip Russom, senior manager of research and services at the Data Warehousing Institute.
Some companies already are thinking along those lines. Cintas is considering using the data matching capabilities in the Dataflux system to correct data on the fly as divisional employees enter it into the system, rather than when it enters the data warehouse. Accurate data starts at the beginning--and the work required to keep it clean never ends. Says Wessel: "As long as you're fixing things on the back end, you're not correcting the problem."
Illustration by Jay Montgomery