Healthcare // Security & Privacy
News
4/14/2014
04:47 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Twitter Posts Betray Illness

Tweets reveal whether you have influenza, according to Penn State researchers.

15 Apps For Healthy Living
15 Apps For Healthy Living
(Click image for larger view and slideshow.)

In 2008, when Google began tracking flu-related search terms as a way to estimate flu infections, researchers were optimistic about the potential of the Internet as a medium for data mining. Since then, Google Flu Trends hasn't performed as well as hoped.

Now it's Twitter's turn. Scientists from Pennsylvania State University claim to have developed a way to identify Twitter posts that are viral in the medical sense of the word.

In a recently published paper, "On the Ground Validation of Online Diagnosis with Twitter and Medical Records," Penn State researchers say they have created "a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data."

The researchers obtained information from Penn State University's Health Services about 104 individuals who had been diagnosed with influenza by a medical professional during the 2012 through 2013 flu season. They also obtained data about 122 people who had not been diagnosed with the flu during this period. After discarding the data of a handful of individuals for a variety of reasons, the researchers set out to analyze the tweets from both groups in their study to determine whether they could diagnose influenza from Twitter posts.

[Facebook has made some big adjustments. Read Facebook Changes: What To Expect.]

The researchers demonstrated that they could indeed make that determination, with greater than 99% accuracy by combining text analysis, anomaly detection, and social network analysis.

There are related projects underway: The Parkinson's Voice Initiative, for example, is an effort to detect Parkinson's symptoms from voice analysis. But voice analysis involves active user participation; Twitter data is published and awaiting data miners.

The implications from a healthcare perspective are promising, as the Penn State research suggests a further method to complement traditional epidemiological data collection.

The implications from a privacy perspective, however, are rather chilling: "It would seem that simply avoiding discussing an illness is not enough to hide one's health in the age of big data," the researchers conclude.

The Penn State researches note that although they focused on remotely reconstructing a confidential diagnosis of influenza, this technique could be used to identify diseases associated with greater social stigma like HIV. Social media now clearly has a potential social cost.

At the same time, awareness of this technique could undermine it. That was part of the problem with Google Flu Trends -- news reports about influenza and about the way researchers were trying to correlate Google search queries with influenza cases made Google Flu Trends less accurate. There was more to it than that, however.

Reports in Nature in 2013 and Science in 2014 took issue with the accuracy of Google Flu Trends data during the flu 2011-2012 and 2012-2013 flu seasons. The paper that appeared in Science, "The Parable of Google Flu: Traps in Big Data Analysis," cited problems with Google's algorithm and what the paper's authors called "big data hubris," the assumption that online data collection can replace, rather than augment, traditional data collection methods.

Google has been taking steps to improve Flu Trends, but the authors of the the Science paper, David Lazer and Gary King of Harvard, Ryan Kennedy of the University of Houston, and Alessandro Vespignani of Northeastern University, in a separate paper, "Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season," claim that the issues identified with Google Flu Trends have gotten worse.

Despite some positive effects from Google's effort to dampen anomalous data spikes, the researchers say a major issue is Google's lack of transparency and lack of communication with researchers, who want access to Google's data to check its results. "[Google Flu Trends] has not been very forthcoming with [its data] in the past, going so far as to release misleading example search terms in previous publications."

"We review the Flu Trends model each year to determine how we can improve. We welcome feedback on how we can refine Flu Trends to help estimate flu levels and complement existing surveillance systems," a Google spokesperson said via email.

Social media data mining might provide unprecedented insight into undisclosed medical conditions, but it also provides ample opportunity for errors and raises profound privacy questions.

What do Uber, Bank of America, and Walgreens have to do with your mobile app strategy? Find out in the new Maximizing Mobility issue of InformationWeek Tech Digest.

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Kristin Burnham
50%
50%
Kristin Burnham,
User Rank: Author
4/15/2014 | 2:31:09 PM
Re: Accuracy
I agree, Whoopty. (Plus, the last thing on my mind when I have the flu is tweeting about it.)
Susan Fourtané
50%
50%
Susan Fourtané,
User Rank: Ninja
4/15/2014 | 10:19:11 AM
Re: Tipoffs?
Thomas, 

"Actually, they dropped individuals who, for example, didn't have Twitter accounts."

Well, of course. There is no point in analyzing tweets of non-existing accounts. Yet, the whole thing doesn't make too much sense to me. 

Do you believe this research is accurate, or useful in any way? 

-Susan
Thomas Claburn
50%
50%
Thomas Claburn,
User Rank: Author
4/15/2014 | 9:54:35 AM
Re: Tipoffs?
>In other words, they discarded the individuals who hadn't given any clue about their flu in their tweets. :D

Actually, they dropped individuals who, for example, didn't have Twitter accounts.
Whoopty
50%
50%
Whoopty,
User Rank: Ninja
4/15/2014 | 7:09:32 AM
Accuracy
Considering there is absolutely no way to verify any of the information collected this way, without somehow having access to that person's medical records AND they would have had to had visited a medical professional to confirm it themselves, this seems like an entirely redudant exercise. 
Susan Fourtané
50%
50%
Susan Fourtané,
User Rank: Ninja
4/15/2014 | 3:07:11 AM
Re: Tipoffs?
Thomas, 

"After discarding the data of a handful of individuals for a variety of reasons, the researchers set out to analyze the tweets from both groups in their study to determine whether they could diagnose influenza from Twitter posts."

In other words, they discarded the individuals who hadn't given any clue about their flu in their tweets. :D

-Susan
Susan Fourtané
50%
50%
Susan Fourtané,
User Rank: Ninja
4/15/2014 | 3:04:03 AM
Re: Tipoffs?
Laurianne, 

You don't have to be a data scientist to read my tweets from last week and conclude that I had a flu. Between the text analysis of my own tweets and the replies I got it was pretty obvious. 

Penn University should occupy its data scientists in something more productive. 

-Susan
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
4/14/2014 | 5:41:33 PM
Crowd awareness corrupts social net analysis
My diagnosis: Crowd awareness of Google Flu Trends leads to trend results corruption. If you have the flu and know Google is watching your searches, you may modify your key word choices to maintain a little privacy.
Thomas Claburn
50%
50%
Thomas Claburn,
User Rank: Author
4/14/2014 | 5:02:09 PM
Re: Tipoffs?
Not really. It was a combination of keyword analysis and other data.
Laurianne
50%
50%
Laurianne,
User Rank: Author
4/14/2014 | 4:56:54 PM
Tipoffs?
Did the researchers share any sample Twitter post tipoffs that you had the flu? Was it people saying they were tired, for instance, or was it that someone's typical Twitter volume went down?
Healthcare Data Breaches Cost More Than You Think
Healthcare Data Breaches Cost More Than You Think
Healthcare providers just don't get it. They refuse to see the need to fully secure their protected health information from unauthorized users -- and from authorized users who abuse their access privileges. As a result, they don't allocate enough budgetary resources for securing medical data.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.