It wasn't your typical data breach. In an effort to help researchers interested in search queries, AOL last month shared about 20 million search words and phrases used by 658,000 of its subscribers. But the move backfired as AOL got a harsh lesson in just how revealing search terms can be.
The AOL cache, supposedly stripped of personally identifiable information and posted online, laid bare eye-opening details of nameless subscribers. Social Security numbers, names, dates of birth, cell phone numbers, hometown stores and hospitals, graphic sex acts--all available for public viewing on the Web.
AOL apologized for the error and withdrew the site, but the damage was done. In a startling example of connecting the dots, The New York Times tracked down 62-year-old Thelma Arnold, a resident of Lilburn, Ga., based on her AOL searches. Although AOL wiped all of the data from public-facing sites, fleet-fingered third parties copied the AOL database and made it searchable elsewhere on the Web.
The gaffe demonstrates how much personal information can be gleaned from searches and how quickly it can spread; the resulting uproar could catalyze data protection efforts by some lawmakers. "We must stop companies from unnecessarily storing the building blocks of American citizens' private lives," Rep. Edward J. Markey, D-Mass., huffed last week in a statement. Earlier this year, Markey introduced a bill to stop companies from warehousing certain types of search data. A spokesman says Markey hopes for a renewed push to pass the bill when legislators return from summer break.
Markey had better be prepared for a hard slog. His bill has languished in subcommittee since February. The federal government uses gobs of data, including personal data, for Homeland Security data mining and other efforts. The Bush administration earlier this year demanded keywords, URLs, and other information from AOL, Google, MSN, Yahoo, and 30 other companies in a quest to prove the necessity of the 1998 Child Online Protection Act. And the Justice Department is pushing for laws that would force Internet service providers and some other companies to retain Web-usage data for specified periods to ensure that it's available if needed for law enforcement.
Technology And Policy
Search data is valuable to companies like Google for its own analysis, to provide personalized searches, and to deliver targeted advertising. Data retention also is a regulatory requirement in some quarters. Lauren Weinstein, a privacy advocate with People for Internet Responsibility, says the opposing viewpoints miss the important middle ground. "We need to have a new path for dealing with this data that involves both technology and policy. We're not doing that work now," he says. Companies like Google must develop policies that make Web-usage data even more anonymous than it is today, Weinstein says.
Google won praise from privacy advocates this year for fighting the government subpoena for search data, though in the end a court forced it to turn over a list of 50,000 URLs. AOL has said its compliance with that government mandate was limited to aggregate lists and anonymous search terms. Of course, the data AOL volunteered to researchers in July might be described in the same way.
Google CEO Eric Schmidt, speaking at an industry conference last week, called AOL's release of data "a terrible thing." Schmidt said Google would continue to store search histories indefinitely and reassured users that their data would stay secure, though he couched his vow with "never say never."