At the same time regulators in the European Union are pressing search engine leaders Google, Microsoft, and Yahoo to modify their policies for retaining personal data, scientists at China's Taiyuan University of Technology are researching new ways to collect and correlate data about Web surfers to provide more precise search engine results.
The Taiyuan University of Technology research is testing software agents that crawl through any search engine looking not only for searched keyword results but also for any personal data that's been collected about the surfer. The goal here is to use information about the surfer's background or interests, combined with information about their past searches, to filter out extraneous results.
"The Working Party will deal with search engines in general and scrutinize their activities from a data protection point of view, because this issue affects an ever growing number of users," the group said in a statement.
Google is the first major search engine provider to offer some visibility into its data retention policies, but the Article 29 Working Group wants the search leader to go further. Google in May provided the group with information about how long it stores server-log information. The company's policy is to "anonymize" server logs that are older than 18-24 months, a practice that the group said, in a letter to Google Privacy Counsel Peter Fleischer, "does not seem to meet the requirements of the European legal data protection framework." Further, Google hasn't specified to the group's satisfaction the purposes for which server logs are kept.
The group does like Google's plans to use more anonymous data, but notes that even "anonymized" data can still contain the user's network prefix. There are also concerns that Google can reverse the process used to make users anonymous when it wants more info about a surfer. The group has pointed out that, even though Google is based in the United States, it is legally obligated to comply with European privacy laws. The same applies to Google's competitors in the search market, including Microsoft and Yahoo, neither of which has specified any time limits on the data that they hold on users.
More than half of all searches are conducted using the Google search engine, while Yahoo is used about 21% of the time, and Microsoft MSN/Windows Live Search is tapped about 8%, according to the Nielsen//NetRatings MegaView Search report for June 2007.
Search data privacy concerns are likely to be perceived differently depending upon the surfer's age, said Mark Lobel in an interview. Lobel is a partner in PricewaterhouseCoopers Advisory Services practice who focuses on information privacy and security. In general, baby boomers have a greater expectation of that a Web site or search engine will keep their information confidential, unless the user explicitly gives permission to share that information. "Generation Y is much more willing to trade their privacy for value," he told InformationWeek.
Indeed, user demographics are likely to play an important role in the future of privacy on the Web when permissive data sharing is involved. The E.U. is more concerned with the subtle aggregation and sales of search data, and it's going to continue to press search engine companies until they come clean.