Healthcare // Analytics
News
10/25/2013
03:25 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

How Google Flu Trends Blew It

Last year, Google Flu Trends made a mountain out of a molehill by overestimating the incidence of influenza. Blame the media.

Google made its name by counting online links as votes for the most relevant answer to search queries. In response, Internet users began gaming Google's election count by voting early and often -- creating extra links to make their websites rank higher in Google's index -- and Google was forced to take countermeasures to defend against manipulation.

Yet the company had to learn this lesson again with its Flu Trends website. Google created Flu Trends in 2008, based on the insight that searches about the flu have some correlation with the number of people dealing with the flu.

"[I]f we tally each day's flu-related search queries, we can estimate how many people have a flu-like illness," the company said in 2008 when it launched the service.

Google's laudable goal was to provide people with more timely information about the spread of the flu than traditional epidemiological surveillance data compiled by the Centers for Disease Control. But the company had to revise its approach to ensure that its data, in addition to being timely, is accurate.

[ Google is upping its online shopping game. Read Google Offers Shoppers Same-Day Delivery. ]

During the 2012-13 flu season, Google Flu Trends got it wrong. As the company documents in a recently published analysisof its approach to disease tracking, Google overestimated the incidence of flu in the U.S. by more than six percentage points, almost six times higher than the highest estimation error seen since the site launched. In the week of Jan. 13, 2013, Google put the incidence of flu in the U.S. at 10.56% of the population. The CDC put the number at only 4.52%.

What went wrong? Two words: The media. Google says it has concluded that its disease-detection algorithms "were susceptible to heightened media coverage."

This probably wasn't a difficult conclusion to reach because Google has been aware of the problem since it launched Flu Trends. Following the website's debut in 2008, the New York Timespublished an article about Google Flu Trends and included an example query that Google was actually monitoring in its flu prediction model. As a result, many Internet users tried that search term, driving up query volume and skewing Google's results.

The lesson here is rich with irony: To effectively assess data from a public source, the algorithm must remain private, or someone will attempt to introduce bias.

Google has been relying on "spike detectors" to compensate for surges of "inorganic" search traffic. But it turns out that Google underestimated the influence of the media. The company anticipated that search query spikes would last three days to a week. During the 2012-13 flu season, they lasted for months.

Google also notes in its analysis that it did not update its flu prediction model annually because the one built in 2009 had been performing well.

So to make its flu forecast more accurate, Google adjusted its spike detection algorithm to better assess the influence of the media. It also modified its algorithm by applying a statistical method called Elastic Net. Using these techniques, the variance between Google Flu Trends and CDC data last season would only have been about one percentage point.

Google Flu Trends is likely to remain a useful complement to traditional epidemiological surveying. But Google and other companies looking to leverage data harvested from the Internet might need to start treating what they gather not as low-hanging fruit but as something already poisoned.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
MedicalQuack
50%
50%
MedicalQuack,
User Rank: Moderator
10/30/2013 | 5:16:32 PM
re: How Google Flu Trends Blew It
Google doesn't always get it right, they suspended my account earlier this year as they said not name compliant...machine had not learned enough yet and took my real surname of Duck and read me as a real duck...funny story and then I had to submit a website to where they could verify me..and I did. right on Google Blogger for six years:) duh? I often wondered if the Crows and the Beavers folks got the same treatment:)

http://ducknetweb.blogspot.com...
Joe Stanganelli
50%
50%
Joe Stanganelli,
User Rank: Ninja
10/29/2013 | 7:44:20 AM
re: How Google Flu Trends Blew It
On a related note, I wonder how much Google Flu Trends drove down flu incidences by making people more aware of the spread of flu and their help, encouraging flu-stopping habits (vaccinations, hand-washing, etc.).
Joe Stanganelli
50%
50%
Joe Stanganelli,
User Rank: Ninja
10/29/2013 | 7:43:09 AM
re: How Google Flu Trends Blew It
Still, it's a PR success for Google, even if they never do get the algorithm right. The tech community doesn't tend to remember when Google has failed (you rarely see discussions about Google Wave and Google Buzz anymore) -- only that Google is "innovative."
David F. Carr
50%
50%
David F. Carr,
User Rank: Author
10/29/2013 | 1:46:17 AM
re: How Google Flu Trends Blew It
I also wonder about the incidence of people who think they have the flu, when they actually have some other sort of bug. Maybe it should be called the "people feeling crummy" algorithm
majenkins
50%
50%
majenkins,
User Rank: Ninja
10/28/2013 | 5:13:22 PM
re: How Google Flu Trends Blew It
. . . might need to start treating what they gather not as low-hanging fruit but as something already poisoned.
I like that and agree it much of that fruit is already poisoned.
Big Love for Big Data? The Remedy for Healthcare Quality Improvements
Big Love for Big Data? The Remedy for Healthcare Quality Improvements
Healthcare data is nothing new, but yet, why do healthcare improvements from quantifiable data seem almost rare today? Healthcare administrators have a wealth of data accessible to them but aren't sure how much of that data is usable or even correct.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 24, 2014
Start improving branch office support by tapping public and private cloud resources to boost performance, increase worker productivity, and cut costs.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.