Software // Information Management
10:41 PM
Rajan Chandras
Rajan Chandras
The Analytics Job and Salary Outlook for 2016
Jan 28, 2016
With data science and big data top-of-mind for all types of organizations, hiring analytics profes ...Read More>>

Is Data Quality Overrated?

MetLife and search-and-rescue teams prove that sometimes as-is data is good enough for serving customers and saving lives.

Am I being a heretic here? IT professionals swear by the mantra of data quality. Business is forever hand-wringing over it and willing to invest in a multi-million-dollar initiative to improve data quality (if the last one did not do too well, they're always game to sponsoring another go at it).

Vendors have made -- and are still making -- a killing selling data quality tools, to the extent that Gartner now has a magic quadrant devoted to this segment. The challenge of gaining and maintaining enterprise data quality has given rise to whole new discipline, which some call Data Governance (DG). And DG, in turn, has led to the emergence of a whole new set of tools, of course.

So what madness has come upon me that I seek to storm this formidable fortress?

Relax, I'm doing nothing of the kind. It's true that I have some strong feelings about after-the-fact measures for improving data quality. And yes, the term "data governance" continues to amuse and bemuse me. But what has me excited is refreshing case studies demonstrating a relatively unheralded truth: not every value proposition that involves data requires rigorous and pricey up-front investments in data quality improvement.

Case in point: a recent breakthrough project in which MetLife put together a consolidated customer view using data from more than 70 systems, moving from pilot to rollout in 90 days.

[ Want more on quick, easy data integration? Read MetLife Uses NoSQL For Customer Service Breakthrough. ]

Sure, there will be limitations to what must necessarily be, in some respects, a quick-and-dirty solution. For example, customer data was integrated without using the kind of sophisticated customer matching algorithms found in professional-grade master data management (MDM) tools (although MetLife does have a separate MDM effort under way). Also, I imagine that data cleansing and standardization along the way was minimal at best.

But what's not to like about an inspired three-month initiative involving 70 systems that reduces some customer-service processes from 40 clicks down to one click and as many as 15 different screens to one? The plan is to roll this out to 3,000 call center and research staff within six months. These are compelling numbers.

In a different sort of example, consider a report on about crisis-mapping technologies that can help humanitarian organizations deliver assistance to victims of civil conflicts and natural disasters by receiving and processing eyewitness reports submitted via email, text message and social media, and then building interactive geospatial maps, all in real time.

One such open-source solution, called Ushahidi, was used to crowdsource a live crisis map of the 2010 earthquake in Haiti. The map helped the U.S. Marine Corps locate victims lying under the rubble of collapsed buildings and helped save hundreds of lives.

There was a caveat when the application was deployed in that it did not support automated categorizing and geo-tagging of incoming text. That had to be done manually. This can only go so far. The Japanese earthquake and tsunami in 2011 generated more than 300,000 Tweets every minute. And when Hurricane Sandy hit the U.S. eastern seaboard last year, there were more than 20 million Tweets -- hardly something that can be processed manually.

The inherent limitations of this approach spurred the author of the article, Patrick Meier, and his team to enhance Ushahidi with a set of Twitter classifiers -- algorithms that could automatically identify Tweets that were relevant and informative to the crisis at hand. For example, classifiers automatically categorize eyewitness reports, infrastructure-damage assessments, casualties, humanitarian needs, offers of help and so on.

But given the quality of incoming data -- terse text with an emphasis on emotion rather than nicety of speech -- what results can we expect? Not too bad, as it turns out; initial accuracy rates range between 70% and 90%. Meier and his team are now working on developing more sophisticated algorithms that can be trained to better interpret incoming messages, leading to continued improvements in accuracy.

Both the above examples demonstrate, albeit in slightly different ways, a simple maxim: Sometimes "as is" data quality serves the purpose.

In the first case, the data in the underlying systems has clearly enabled integration to a sufficient extent -- a common customer key, for example, that has migrated to multiple systems (demonstrating that some good can come out of point-to-point integration, too). Equally importantly, the information is being consumed by humans -- call center operators, for example -- so that data-quality issues can be identified as they surface, giving MetLife an opportunity to clean up its information and tie together formerly disparate records.

In the second case, substantial value was derived despite the free-form, low-quality textual data. This is to be expected, as it's precisely the purpose of techniques such as pattern recognition, natural language processing and sentiment analysis.

There are a slew of use cases where granular data quality doesn't matter much. Typical examples include summary-level and statistical reporting/analytics. If a trucking company is looking to identify most frequently used or most-profitable routes, for example, individual discrepancies in transportation records don't really matter.

So, does data quality matter? Of course it does. The problem isn't that we are too obsessed with data quality; the problem is that we (still) aren't taking it seriously enough. Data quality continues to be an after-thought, addressed through ad-hoc and localized measures.

However, not everything needs to wait upon big-bang data quality initiatives. It's not a bad idea to take the occasional step back and ask yourself what business value can be obtained from data as is. Sometimes "good enough" data quality is just that.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
EB Quinn
EB Quinn,
User Rank: Apprentice
6/5/2013 | 9:43:21 AM
re: Is Data Quality Overrated?
The #1 obstacle is UNDERSTANDING data, which may or may not lead to recognizing data quality issues.
User Rank: Author
5/30/2013 | 8:49:33 PM
re: Is Data Quality Overrated?
Data quality can be an obstacle if IT waits to have data perfect and complete before it puts data in employees' hands. IT can't anticipate every use case and know what data's most valuable. The case I've heard from more than one CIO is that people aren't motivated to get totally data accurate and complete until they know it's being used -- until the business case is made. When IT asks for sales per store in the Middle East in order to fill out a data model, it draws yawns, When the CEO asks "why don't we have that data?" it gets done.
User Rank: Apprentice
5/30/2013 | 2:49:00 PM
re: Is Data Quality Overrated?
Rajan, great article! Data quality can be a slippery slope for some organizations. No
system is perfect, and to try and achieve perfection all the time doesnG«÷t make
business sense. But I think you're asking two separate questions. First,
is data quality overrated? I would only say that we are in a business
world obsessed with data. Data rules in todayG«÷s business environment
(think Big Data) and their value will always be gauged on a spectrum of
acceptability. But is data quality overrated? Not likely. I
am a firm believer that organizations should improve the quality of the
specific data that meets the needs of their business. Only business people,
with specific subject matter expertise, can know how data needs be prepared for
their business processes. Beyond that, perhaps there is no need. However,
once you establish the state the data needs to be in for the business to
operate efficiently, effectively and profitably, then I would say you can't
over rate data quality, otherwise the business fails. The second question
is, What defines "good enough?" The answer to that question in my
mind is: it depends. Sometimes, but not always, data needs to be perfect
or as close to perfect as possible. LetG«÷s use your disaster relief
example. No, the data in this specific instance didnG«÷t have to be 100% to
be effective (I digress, I noticed that you did say that the vendor selling the
system was trying to improve the system in order to process more data.
Why would they do that if their product adequately met the needs of the
market?). But, what if youG«÷re the insurance company that covers that same
area. DonG«÷t you want to know 100% of the households your organization
covers who were affected by the disasterG«™.and to what extent their damages are
going to need to be coveredG«™.and to what extent that is going to affect your
bottom line? To me, those answers need to be as close to 100% as
possible, otherwise I am the customer on the Today show telling Matt Lauer that
my insurance company doesnG«÷t care about me. There will always be
examples of how data quality doesnG«÷t have to be perfect, and in those
instances, let the needs of the business drive the equation. But where it
is needed and the technology of the times can provide a solution (not a tool),
then many organizations are going to want the data G«£as good as they can get
it.G«• Lastly, I would also agree with you that any vendor selling perfect
data all the time should be shown the door.
Tony Kontzer
Tony Kontzer,
User Rank: Apprentice
5/30/2013 | 7:09:54 AM
re: Is Data Quality Overrated?
There's a more general lesson here about the technology treadmill that so many companies are on. It seems to me that insisting on improving data quality isn't unlike replacing networking gear simply because something new has hit the market rather than out of an actual need. As the old saying goes, it it ain't broke, don't fix it.

Tony Kontzer
InformationWeek Contributor
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
How to Knock Down Barriers to Effective Risk Management
Risk management today is a hodgepodge of systems, siloed approaches, and poor data collection practices. That isn't how it should be.
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.