Commentary
12/16/2010
01:54 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail

Five Ways To Fool A Twitter Sentiment Tool

Do sentiment-analysis tools pass the accuracy test? Here are five tests along with results using freely available products.



Like many social-media analyses, the idea for this one originated on a social platform. In this case it was Twitter, the most accessible, easy, and free-wheeling of them all and hence a great place to exchange information and opinions.

Tweets (and other social and online postings) have immense business value, in particular as a "voice of the customer" information source about products and services and also about politics, family, and just about every other aspect of our daily lives. Their subjectivity -- they voice opinions and not just facts -- is what makes them particularly valuable.

How do we get at this business value, at sentiment especially, and how do we make sure we're doing it reasonably well? You do need software to get a complete picture of the online universe. There are dozens of tools on the market, some very good, some not so strong. Accuracy, as measured by precision, recall, and relevance, is essential. I'll suggest a few simple tests that can help you assess the precision (proportion of cases correctly classified) of tools you may be considering, focusing on Twitter sentiment analysis.

One of my tweeple, @Pythonner (a.k.a. David Nadeau) posted a lookout, Twitter + sentiment analysis, nicely integrated: http://www.tweetfeel.com/. The Tweetfeel tool from Conversition does make it easy to view Twitter-message sentiment via what the company characterizes as "real-time Twitter search with feelings using insanely complex sentiment analysis."

Accurate automated sentiment analysis is insanely difficult given the complexity of human language and expression. As expert systems pioneer Edward A. Feigenbaum observed, "Reading from text in general is a hard problem, because it involves all of common sense knowledge."

Text is full of subjectivity. There is no text-analysis problem harder than correctly parsing attitudes, opinions, feelings and emotions.

Five Tests

My first test is sentiment classification for someone who has recently died. Tools trip easily on that one. Tweetfeel is no exception, as seen in a search on Elizabeth Edwards.

The tweet, "ummm I feel totally @#$%!... elizabeth edwards lost her battle with cancer??? omg!!!!! how sad" is NOT, as rated by Tweetfeel, negative.

The folks from Conversition did make clear, "TweetFeel is intended as a quick, fun tool. It's a 'give back to the community' application."

Prompted by another of my tweeple, @yehaskel (a.k.a. David Yehaskel), I'll offer four more tests, essentially quick and easy ways to trip up (most) Twitter sentiment tools but also good starters for any evaluation of the more capable tools on the market, albeit not available for open trial.

Test two is a polysemous word (a word with multiple meanings) where one meaning is a sentiment indicator. Try "kind" with Tweetfeel and you'll see the issue: It uniformly keys on "kind," a sign of positive feelings, where "kind" very often is used in the sense of "type" or "variety."

How do you disambiguate usage to fix this confusion? One basic step is to look at surrounding words. "A kind," "the kind," and "what kind" point to likely "type" or "variety" use.

Test three is sentiment analysis of messages with multiple opinion holders. Try a search on words such as "said" or, on Twitter, "RT" ("retweet"). Here's an example of a tweet that was, indeed, misclassified:

"RT @ShayIzKilla: Im hating RT @ChocolateWast3d: Oxtails stew on deck.< #Oxtail wait deh where my plate at" Tough stuff, and I don't mean oxtail, which is tender if you cook it long enough. Here, the original poster implicitly likes oxtail stew given that he or she is about to eat some: "Oxtails stew on deck." The response "Im hating" is negative but not so strongly, the equivalent of "yuck." That response elicits another, a positive one, "#Oxtail wait deh where my plate at."

Folks, this is how people communicate on social networks; it's "natural language." If you claim to do sentiment analysis, you have to handle it. Send the tweet + RT-response + RT-response to a Twitter sentiment engine. The freebie toy tools may get tripped up by the language, and regardless, they likely won't distinguish the three opinion holders and their three opinions. They'll give an overall sentiment rating which, whether correct or incorrect, is wrong given that what seems a single message is really three.

Test four involves inability to correctly resolve sentiment object. Here's a tweet rated negative by a socialmention search on "oxtail":

"Damn. I could eat jerk & oxtail a few times per wk RT @IMSTAIN: Had Jerk Chicken Once...#Lowkey Threw Up RT (cont) http://tl.gd/2jra75" I searched on "oxtail," so I expect the sentiment rating to be for "oxtail," reflecting the text "I could eat jerk & oxtail a few times per wk" with the intensifier "Damn." (I added the poster's name to the search, in order to bring up the particular message I was looking for, only after observing the incorrect classification.) Instead, socialmention either incorrectly keyed on "Damn" as a sign of negative sentiment -- although any tool that analyzes slang-filled social media should know better -- or is fooled by the "Threw Up" associated with jerk chicken.

The key to passing tests three and four is the ability to break messages into appropriate chunks, which may be phrases, quoted strings, or retweets within a longer message. For example, the folks from Conversition say that "Tweetfeel is meant to measure a subset of data around nouns," which I take to mean that it does focus sentiment analyses on searched-on terms.

Text five explores (a widespread) inability to distinguish opinion holder from sentiment object. I can't offer you an example from any of the free sentiment Twitter analyzers because none handle this important distinction so far as I know. They analyze at message level, which is obviously insufficient. Instead, consider Appinions (a rebranding of Jodange's opinion-extraction tools).

Check out the results for this search on Barack Obama: The Appinions WCubed tool distinguishes -- and lets you explore results according to -- opinion holder, associated topics, and keywords. As a plus, you can filter results by sentiment polarity (positive, negative, neutral) and type of information source.

Going Beyond

Clearly, many tools that automate sentiment analysis have a long way to go, judging from the free Twitter-analysis tools, to reach accuracy levels that will make them reliable and useful for business decision making. Still, I'm encouraged by progress and also by the market availability of stronger tools, which typically carry out deeper linguisitic analyses of social messages (and enterprise feedback).

Unfortunately, most of the more sophisticated tools/vendors -- Attensity360, Clarabridge, IBM, Lithium Technologies, OpenAmplify, Open Text (disclosure: a consulting client), Radian6, SAS Social Media Analytics, Sysomos -- don't have open, online demo interfaces like Appininions', which is a shame. (Lexalytics does have an online demo, but it was offline for maintenance when I was writing this article.)

Count on continued progress, not only in accuracy but also in other areas that affect the usability and usefulness of automated sentiment analysis. These areas include:

Ability to go beyond polarity, to classify sentiment not only as positive/negative/neutral but also according to task-relevant emotion categories such as happy/sad/angry and satisfied/dissatistfied.

Ability to discern intent signals such as purchase plans and service dissatisfaction that's a service-cancellation precursor.

Contextual advertising, the ability, for instance, to ensure that a car ad isn't displayed next to a news report on a vehicle crash.

With a bit of progress, soon even freebie Twitter sentiment tools will pass basic tests. Attention will then shift from accuracy debates to the really hard problem of creating business intelligence from online, social sources.

Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation and chair of the Sentiment Analysis Symposium.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Email This  | 
Print  | 
RSS
More Insights
Copyright © 2021 UBM Electronics, A UBM company, All rights reserved. Privacy Policy | Terms of Service