Researchers at universities in Australia, Spain, and the US, in conjunction with UNICEF, have found that Twitter posts can be mined to measure unemployment.
In a study published through Cornell's arXiv.org, researchers Alejandro Llorente, Manuel Garcia-Herranz, Manuel Cebrian, and Esteban Moro "demonstrate that behavioral features related to unemployment can be recovered from the digital exhaust left by the microblogging network Twitter."
"Digital exhaust" is a curious choice of words, because it implies that social media data is a worthless byproduct of online interaction, something to be cast aside. Yet the researchers' findings suggest the very opposite: Social media exhaust is the primary product. It's gold, rather than garbage, and social media users don't realize the value they're throwing away. The social media realm is a charity to benefit businesses.
Three decades ago, the technologist and writer Stewart Brand famously observed the tension inherent in how we value information.
[Does big data need a social media approach? Read Does Big Data Need A 'LinkedIn For Analytics'?]
"Information wants to be free," Brand said in one of his several variations on this theme. "Information also wants to be expensive. Information wants to be free because it has become so cheap to distribute, copy, and recombine -- too cheap to meter. It wants to be expensive because it can be immeasurably valuable to the recipient. That tension will not go away."
Facebook, Google, Twitter, and the rest of the social media and advertising industry want people to believe that their online work -- their posts, pictures, and associated data -- isn't worth anything, in order to capture the full value of this free bounty of insight. Pollute the world with your digital exhaust; we'll clean up, all the way to the bank.
Llorente and his colleagues used a data set of 19.6 million geolocated Twitter messages in Spain from Nov. 29, 2012, to June 30, 2013, and a data set detailing unemployment across various regions of the country to uncover a relationship between economic metrics and social behavior. Interestingly, they noted a correlation between misspellings in tweets, which they take as a proxy for education level, and unemployment.
The researchers consider their success in using Twitter posts to assess employment status to be a "a proof of concept for how a wide range of behavioral features linked to socioeconomic behavior can be inferred from the digital traces that are left by publicly available social media." And they argue that governments might be able to adapt social media surveillance as an alternative to more costly traditional data gathering methods related to public policy.
"The immediacy of social media may also allow governments to better measure and understand the effect of policies, social changes, natural or man-made disasters in the economical status of cities in almost real-time," the researchers state.
Others have already reached this conclusion and are monitoring social media for threats to the powers that be, at home and abroad. Intelligence agencies have been wise to the value of social media exhaust for years, both for the overtly expressed sentiment and for the web of personal relationships exposed through social network links.
In fact, data mining to assess aspects of society has become commonplace. Google began using search queries to assess flu infections in 2008. This year, researchers have demonstrated that Twitter posts can be used to infer whether the person posting is ill. And web pages are laden with tracking scripts to gather data.
Interpreting that data correctly, however, remains a challenge. Google's flu tracking system turned out to be inaccurate. Worse, it would not be difficult to construct a social media study to support a predetermined conclusion for political or economic gain. So Twitter used as a way to measure employment should be assessed with caution.
Twitter has long been wise to the value of its data. In its IPO filing, it disclosed that it had made $47.5 million selling data to other companies in 2012, up from $28.6 million the year before.
Companies want information to be free, so they can sell it at great cost.
Apply now for the 2015 InformationWeek Elite 100, which recognizes the most innovative users of technology to advance a company's business goals. Winners will be recognized at the InformationWeek Conference, April 27-28, 2015, at the Mandalay Bay in Las Vegas. Application period ends Jan. 16, 2015.