Getting Quality Data Now

We ask a group of vendors specializing in helping companies with data quality problems to offer advice on how to avoid "bad data in, bad data out" syndrome.
As anyone who has ever tried to generate any sort of report from a data repository knows, bad data in equals bad data out. The problem is, it's often unclear that data is "bad" or unusable -- until you need to use it.

Getting good, usable data out of your business intelligence takes some planning. That's because ensuring data quality requires different areas of an organization to communicate and come to consensus on certain issues. But it'll be worth the effort: The Data Warehousing Institute estimates that low data quality costs companies $611 billion annually. And Price Waterhouse Coopers has stated in previous reports that low data quality leads to 75 percent of general budget overruns. There's no better time to start cleaning up your act than today -- but it won't be easy.

"Execution is complicated and complex and generally effects every part of organization," says Sam Barclay, vice president of business development at customer relationship management (CRM) firm StayInFront Inc. "Various departments have to agree on formats. The key thing about data quality is getting accurate information in the first place. The problem is that while information has become easier to collect in the last 25 years, that [situation] has not resulted in better data."

StayInFront provides CRM applications, decision support tools and e-business systems. The company combines its technologies with an extensive implementation and support infrastructure, and offers customers the option of either using one or some of its tools, or of implementing a complete solution.

If you are wondering how to start cleaning up your process, be sure you have a clear understanding of your business goals. First off, experts suggest, determine what you are going to need to know from your data. That way, you'll be able to input data that is accurate and complete. Obviously, different departments have different needs. It's not an easy task to build a consensus among diverse users, but if all those concerns are addressed before a single piece of data is input, you will save a world of hurt later.

"Quality is difficult to define, but the user is the one ultimately deciding whether the data is of quality," notes Barclay.

But what if it's too late? What if you've inherited mounds of data that don't seem to correlate, or, worse, that make no sense? Again, you need a plan. What exactly do you need from the information? Full customer names? Phone numbers? Product preferences? Once you have an action plan, you know where to direct your efforts in order to clean and analyze your data. For example, do you need software that can recognize that customer Bob Schwartz of Newtown, CT is the same as customer Robert Schwartz of Newtown, Conn.? The process can be painstaking: "We found that 25 percent of [our clients'] time is spent clarifying bad data," says Tim Furey, CTO of consulting firm Conversion Services.

Next comes the integration stage. Here, you -- or a service you've contracted -- will check the information. For example: Are your phone numbers valid? Are they all input in a consistent manner?

"[Studies have found that] five percent of customer master files have data that is wrong," says Furey. With the cost for each file ranging from $100 to $1000 annually, those are expensive mistakes to maintain.

Once the revisions are complete, you are ready for the augmentation process. Here, you add relevant information to the data. For example, if in stage one the goal identified is to complete customer records so each has a phone number, then that's the data that is searched for and added.

Finally, once the data is complete, consistently input and accurate, monitor it daily. "Only that will tell you if you are meeting your action plan," says Barclay.

It is a far easier -- and less expensive -- venture to put processes into place initially to ensure only clean data enters the system. Unfortunately, many companies are forced to implement a "passive" approach (i.e., extracting the data from the system and fixing it afterward) because they've inherited faulty information. "The cost of a passive approach is 200 percent higher than an active one," says Furey. His company, Conversion Services, offers consulting services focused on data warehousing, business intelligence and data management solutions. Because of the tremendous costs involved, "senior management looks at this now as a strategic issue," he adds.

The impact to the bottom line of having poor-quality data has shaken companies from top to bottom. "Increasingly," says Garry Moroney, CEO of Similarity Systems, "there are more C-level execs involved, particularly in compliance." Similarity Systems offers two products that help organizations identify and correct data quality problems: Axio for profiling, and Athanor for data quality management. Moroney notes that while the data quality process is typically driven by IT, it's crucial to get support at the top.

"The approach we take allows our users to [improve quality] on a gradual basis: Start with auditing data by the BI guys, and then work through," he says. "Increasingly, we see data quality groups with senior-level management sponsorship."

No matter what platform or type of software you start with, the most important concept to remember is that data quality is a process, not an event. Ensuring good, usable information is a process that happens every single day.

"Most important is to make the data trusted," says Furey. "The customer needs to have trust in the data -- or they won't use it."

And what's the good of collecting all that information if you can't use it?

Jennifer Bosavage is a freelance writer based in Huntington, N.Y. You can contact her at [email protected]