How Google Flu Trends Blew It - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Healthcare // Analytics
03:25 PM
Connect Directly

How Google Flu Trends Blew It

Last year, Google Flu Trends made a mountain out of a molehill by overestimating the incidence of influenza. Blame the media.

Google made its name by counting online links as votes for the most relevant answer to search queries. In response, Internet users began gaming Google's election count by voting early and often -- creating extra links to make their websites rank higher in Google's index -- and Google was forced to take countermeasures to defend against manipulation.

Yet the company had to learn this lesson again with its Flu Trends website. Google created Flu Trends in 2008, based on the insight that searches about the flu have some correlation with the number of people dealing with the flu.

"[I]f we tally each day's flu-related search queries, we can estimate how many people have a flu-like illness," the company said in 2008 when it launched the service.

Google's laudable goal was to provide people with more timely information about the spread of the flu than traditional epidemiological surveillance data compiled by the Centers for Disease Control. But the company had to revise its approach to ensure that its data, in addition to being timely, is accurate.

[ Google is upping its online shopping game. Read Google Offers Shoppers Same-Day Delivery. ]

During the 2012-13 flu season, Google Flu Trends got it wrong. As the company documents in a recently published analysisof its approach to disease tracking, Google overestimated the incidence of flu in the U.S. by more than six percentage points, almost six times higher than the highest estimation error seen since the site launched. In the week of Jan. 13, 2013, Google put the incidence of flu in the U.S. at 10.56% of the population. The CDC put the number at only 4.52%.

What went wrong? Two words: The media. Google says it has concluded that its disease-detection algorithms "were susceptible to heightened media coverage."

This probably wasn't a difficult conclusion to reach because Google has been aware of the problem since it launched Flu Trends. Following the website's debut in 2008, the New York Timespublished an article about Google Flu Trends and included an example query that Google was actually monitoring in its flu prediction model. As a result, many Internet users tried that search term, driving up query volume and skewing Google's results.

The lesson here is rich with irony: To effectively assess data from a public source, the algorithm must remain private, or someone will attempt to introduce bias.

Google has been relying on "spike detectors" to compensate for surges of "inorganic" search traffic. But it turns out that Google underestimated the influence of the media. The company anticipated that search query spikes would last three days to a week. During the 2012-13 flu season, they lasted for months.

Google also notes in its analysis that it did not update its flu prediction model annually because the one built in 2009 had been performing well.

So to make its flu forecast more accurate, Google adjusted its spike detection algorithm to better assess the influence of the media. It also modified its algorithm by applying a statistical method called Elastic Net. Using these techniques, the variance between Google Flu Trends and CDC data last season would only have been about one percentage point.

Google Flu Trends is likely to remain a useful complement to traditional epidemiological surveying. But Google and other companies looking to leverage data harvested from the Internet might need to start treating what they gather not as low-hanging fruit but as something already poisoned.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
David F. Carr
David F. Carr,
User Rank: Author
10/29/2013 | 1:46:17 AM
re: How Google Flu Trends Blew It
I also wonder about the incidence of people who think they have the flu, when they actually have some other sort of bug. Maybe it should be called the "people feeling crummy" algorithm
Joe Stanganelli
Joe Stanganelli,
User Rank: Author
10/29/2013 | 7:44:20 AM
re: How Google Flu Trends Blew It
On a related note, I wonder how much Google Flu Trends drove down flu incidences by making people more aware of the spread of flu and their help, encouraging flu-stopping habits (vaccinations, hand-washing, etc.).
Joe Stanganelli
Joe Stanganelli,
User Rank: Author
10/29/2013 | 7:43:09 AM
re: How Google Flu Trends Blew It
Still, it's a PR success for Google, even if they never do get the algorithm right. The tech community doesn't tend to remember when Google has failed (you rarely see discussions about Google Wave and Google Buzz anymore) -- only that Google is "innovative."
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll