Big data, at least as we know it today, is largely inaccessible to the masses. Governments, corporations, and other well-funded organizations can afford teams of data scientists and pricey storage systems to glean insights from petabytes pulled from a variety of sources. Everyone else -- not so much.
Perhaps the solution is a "public cloud," a repository of research data accessible not only to powerful institutions, but to everyday folks as well. As CloudSigma CEO Robert Jenkins sees it, this "cloud-based data discovery model" would allow greater access to big data from government and science agencies, as well as create new business opportunities and even enable citizens to conduct their own research.
Founded in 2009, CloudSigma is a cloud infrastructure-as-a-service (IaaS) provider based in Zurich, Switzerland. In a phone interview with InformationWeek, Jenkins said the company's vision for a democratized cloud grew from its collaboration with the European Space Agency (ESA), which works with CloudSigma on a variety of projects.
[The Urban Observatory: a big data telescope for urban planners. Read more at Urban Observatory Maps Comparisons Between Cities.]
One of the big challenges over the last few years has been making public data accessible to users," said Jenkins. "There's a certain moral satisfaction in democratizing access to public data, given that taxpayers paid for its creation in the first place."
For instance, massive data sets, such as those generated by ESA satellites, can prove invaluable to a variety of users ranging from oil companies to local municipalities. "ESA has radar coverage of the world -- essentially 100% of the world, [and] almost everybody is covered," Jenkins said. "It refreshes every nine days, which in logical terms is real-time."
How is this data valuable? Here's one example: By providing a satellite view of soil moisture, ESA's radar maps can help local governments detect landslide risks. "That's incredibly useful to municipalities in terms of avoiding landslides," Jenkins noted.
Unfortunately, the amount of information generated by ESA satellites -- as well as countless other sources of big data -- is too unwieldy for individuals and tight-budgeted organizations to utilize. An ESA satellite, for instance, might generate a terabyte of data every day. "It's incredibly valuable data, but most of us don't have access to the kind of computing and storage you need to feasibly do something useful [with it]," Jenkins said.
Data sets this large have incredible inertia, he added. "If you want to transfer them from one server farm to another -- from data center to data center -- it's going to take days, if not weeks, and that's on a very big, 10-gig line, which most of us don't have access to."
Which brings us to the public cloud concept. Jenkins explained, "Our vision is: Let's get that data, put it in a public cloud, make it accessible, and it's in the right place because you have computing, networking, and access. It democratizes it."
An early example of this is Helix Nebula, a new European science cloud initiative that partners the continent's IT providers, including CloudSigma, with three of its leading research centers: ESA, CERN (European Council for Nuclear Research), and EMBL (European Molecular Biology Laboratory). The pilot project, however, is initially designed to meet the research needs of big science, not citizens and small businesses.
For cloud providers, democratizing big data could prove profitable. "We don't need the public institution to pay us to store that data, if the data has even a reasonable level of interest from end users," Jenkins said.
Value-added service providers could provide a "usability layer" on top of this data. "This is a great opportunity for innovation for entrepreneurs," Jenkins pointed out. "They can get access to these huge data sets that previously had been very elitist because of the infrastructure requirements to engage with them."
There's a degree of self-interest here, of course, as Jenkins runs a cloud provider that could benefit greatly from this data-for-the-masses business model. Still, it's an intriguing vision worth a closer look.
"We can release a wave of innovation around this data that's kind of locked up," Jenkins added. "People talk about big data a lot, but nobody's really thinking about the practicalities of it. Of course, Google can do big data analysis... but we don't get mass innovation until we all have some sort of meaningful access."
InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.