The volume of digital information will nearly double every two years between today and 2020, reaching 40,000 exabytes, or 40 trillion gigabytes, in just seven years. But for organizations to glean valuable insights from their growing stockpiles of data, they'll have to do a much better job of analyzing it.
In fact, only 3% of data today is tagged and a scant 0.5% is analyzed, according to a new study by research firm IDC. The report, "Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East," was sponsored by data storage giant EMC. It explores many facets of the emerging big data phenomenon, including the type of information collected, the cost required to maintain it and growing security concerns, particularly in emerging markets.
The study's authors, IDC analysts John Gantz and David Reinsel, define the digital universe as one that contains a vast assortment of unstructured data: YouTube videos and photos from mobile phones; HD movies; ATM banking data; security footage; VoIP calls; and even subatomic impacts recorded by CERN's Large Hadron Collider.
And that's just the beginning.
"The digital universe lives increasingly in a computing cloud, above terra firma of vast hardware datacenters linked to billions of distributed devices, all governed and defined by increasingly intelligent software," Gantz and Reinsel write. But despite a growing interest in the digital universe, the study finds that very little of its data is analyzed.
"If we're exploring for digital oil, it is early days indeed," said Chuck Hollis, EMC vice president and global marketing CTO, in a phone interview with InformationWeek. "We're only tagging 3% and analyzing 0.5% -- and we're struggling to do that as well. Boy, a lot of work we've got to do."
The types of data "ripe" for analysis include: surveillance footage; embedded and medical devices, including sensors implanted in the body; entertainment and social media; and consumer images, which would benefit from advanced tagging algorithms that analyze images in real time, the study says.
The rise of big data brings security concerns as well. A third of the data in the digital universe requires some sort of protection, either to prevent snooping or theft, or to adhere to regulations, IDC reports. But today only about 20% of this information is adequately protected, with data in emerging markets being the least secure.
"Therefore, like our own physical universe, the digital universe is rapidly expanding and incredibly diverse, with vast regions that are unexplored and some that are, frankly, scary," write Gantz and Reinsel.
EMC's Hollis points to one potentially controversial study finding: IDC predicts that only 13% of data will be stored in the cloud by 2020.
"Conventional wisdom is, 'Oh, it's all going to the cloud,'" said Hollis, who added that IDC's 13% estimate "is a much smaller proportion than a lot of industry pundits would estimate."
Other study findings include:
-- Most of the data in the digital universe -- 68% in 2012 -- is created and consumed by consumers.
-- By 2020, up to 33% of the digital universe will contain information that might be useful if analyzed, versus 25% today.
-- Emerging markets will boom: Between 2012 and 2020, their share of the expanding digital universe will expand from 36% to 62%.
-- IT spending on the infrastructure of the digital universe, including hardware, software, services and staff, will grow by 40% between 2012 and 2020.
-- Machine-generated data is projected to increase 15 times by 2020.
Join Cloud Connect for a free webcast with "Cloudonomics" author Joe Weinman. Cloudonomics is a new way to discuss the benefits of private clouds. Many have focused on the cost reduction possibilities while others have focused on business agility. However, private clouds can play a strategic role, as well. The Cloudonomics webcast happens Dec. 12. (Free registration required.)