Following the Sept. 11, 2001, terrorist attack on the United States, there was a lot of talk about the failure to connect the dots, to recognize the terrorist network by the ties between the individuals involved in the plot.
Almost eight years later, connecting the dots across two or more social networks turns out to be fairly straightforward, which turns out to have significant privacy implications.
Two researchers from the University of Texas at Austin have demonstrated that anonymous users of Twitter who participate in other social networks can be identified with a low error rate by analyzing the network of connections between people across different services.
In a research paper, Arvind Narayanan, a Ph.D. candidate at the University of Texas at Austin, and Vitaly Shmatikov, a computer science professor at the university, explain, "We give a concrete demonstration of how our de-anonymization algorithm works by applying it to Flickr and Twitter, two large, real-world online social networks. We show that a third of the users who are verifiable members of both Flickr and Twitter can be recognized in the completely anonymous Twitter graph with only 12% error rate, even though the overlap in the relationships for these members is less than 15%."
While Flickr and Twitter are the two social networking services used for the study, the authors state that the technique can be applied to any set of social networks where some real-life information is exposed at the edges.
The notion that unknown individuals can be identified through connections with other Internet services isn't surprising. When the contents of the Yahoo Mail account of former vice presidential candidate Sarah Palin were posted online last year, the account name associated with the posting, "rubico," was quickly linked to college student David Kernell through the e-mail address "email@example.com." (Earlier this month, Kernell pleaded not guilty to a four-count indictment arising from the incident.)
Nonetheless, being able recognize such connections through automated means with a high degree of accuracy suggests that privacy controls on social networks create a false sense of security.
Narayanan and Shmatikov observe that their work suggests several possible attack scenarios.
"The strongest adversary is a government-level agency interested in global surveillance," they explain in a FAQ that accompanies their paper. "Its objective is large-scale collection of detailed information about as many individuals as possible. Another attack scenario involves abusive marketing. If an unethical company were able to de-anonymize the graph using publicly available data, it could engage in abusive marketing aimed at specific individuals. "
They also speculate that their de-anonymization technique could be exploited by phishers, spammers, stalkers, investigators, nosy colleagues, employers, and neighbors.
The researchers conclude that the distinction between personally identifiable and non-personally identifiable information is a fiction and should be dropped from privacy policies. This would make it clear that in the context of social networks, any information can potentially be used to identify someone.
They also support disclosure by social networks of any information sharing, rather than disclosure only when information is deemed to be personally identifiable, in order to give affected users the opportunity to opt out.
2009 marks the 12th year that InformationWeek will be monitoring changes in security practices through our annual research survey. Find out more and take part.