Software // Information Management
Commentary
5/11/2010
11:46 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

How to Define Accuracy in Analytics for Business

For information retrieval and analytics purposes, we need a broad definition of "accuracy."

Esteban Kolsky, a customer-strategies research analyst, has blogged for semantic-technologies vendor Attensity on How Accuracy in Analytics Matters for Businesses. It's a thought-provoking article, yet a central statement of his calls for exploration, that "The only way to measure accuracy is by comparing the results of the computer analysis to similar analysis done by humans." There's more to accuracy, and more to computer analysis, than you might think.Kolsky's topic is social-media analytics. His focus is the subjective content of on-line text: feelings rather than facts. Subjective content -- attitude, opinion, and even emotion -- is different from objective facts. Subjectivity is uniquely human, often situational, and culturally linked. We all know that no two people will always agree on any matter of opinion or attitude. No two people "pick the same" (in Kolsky's words), even when it comes to a classification as seemingly simple as positive/negative/neutral/mixed sentiment polarity, 100% of the time. Scientific studies and practical tests I've seen suggest that people agree at a 80%-90% rate when it comes to sentiment classification.

Given measured rates of human-human non-agreement, and with the age of intelligent(-seeming) machines looming, is "Did the computer pick the same a human would've picked?" -- which human? -- the only, or even the best, accuracy criterion? Surely there's much to be learned in comparing, or even working from the consensus of, different machine methods, in contrasting and compiling machine-machine results.

Further, implicit in Kolsky's analysis, in my reading, is an incomplete understanding of "accuracy." Any accuracy definition that looks primarily at precision -- in this context the same as "correctness" -- just one of three components of accuracy, is incomplete.

Kolsky focuses on the task of determining the sentiment of "a specific word or combination of words," on "the computer's perception that a tweet or blog post has positive or negative inclination." His definition would cover very discrete tasks adequately -- taking the SAT, reading single blogs or tweets -- but competitive businesses can not afford over-focused insularity. They must concern themselves with a huge swathe of social and news media. So what of the other two components of information retrieval-analysis accuracy, recall and relevance?

"Recall" is the proportion of pertinent material that is retrieved. On the recall front, there's no contest: machines can operate 24/7, they can parse material in and across multiple human languages (where no one person can handle more than a handful), and they can sift through vast volumes of material very quickly. The machines win hands-down.

As for relevance, well, I won't argue that machines perform better than humans in rank-ordering lists to respond to differing business or other criteria. I will argue, however, that machines can outperform humans in discovering obscure or even hidden relationships in large volumes of data. This ability is what data mining is all about: fitting models to data for predictive purposes. Those models may be hard to understand -- they lack explanatory transparency -- but we use them nonetheless because they work. Relationships are key to social-network analysis, as are measure-driven model for quantities such as impact, velocity, and authority. These quantities may factor into relevance. And relevance matters -- alongside precision and recall -- to a complete accuracy picture.

Finally, Kolsky's focus on the link between accuracy (however defined) and the business bottom line is spot-on. He recommends removing biases from analytics, improving accuracy, and looking at multiple customer-data sources and cross-referencing them. These are important steps that can quantifiably contribute to meeting cost, efficiency, profitability, satisfaction, and other business goals. Accuracy in analytics does indeed matter for businesses.


If you'd like to further explore information retrieval and analytics methods and applications, consider attending the 6th annual Text Analytics Summit, slated for May 25-26 in Boston. I'll reprise my role as chair and teach a pre-summit Introduction to Text Analytics the afternoon of May 24.For information retrieval and analytics purposes, we need a broad definition of "accuracy."

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.