Open data advocates tout free government data as one of the world's great resources for fueling innovation and economic growth. However, despite some early successes in helping entrepreneurs turn data into dollars, White House efforts to jump-start an open data economy are taking longer than administration officials had hoped. And some experts worry that not enough attention is being paid to who will pay for hosting and sustaining the coming flood of data.
Those were some of the sobering messages to come out of the Data Innovation forum hosted last week by the Information Technology & Innovation Foundation's Center for Data Innovation.
The federal government collects a vast amount of data. The problem, according to US deputy CTO Nick Sinai, is that too much of it remains trapped in formats and systems that lie beyond the public's reach. The National Oceanographic Atmospheric Administration estimates that only 10% of its data is available online, Sinai said during a keynote speech at the forum.
All the data collected by the US is "a valuable national asset and should be open and available to the public wherever possible," Sinai said. But even when the public can access the data, it's often too difficult to find, understand, or reuse.
[Agencies must do more to liberate their data. Here are six suggestions for How Government Can Make Open Data Work.]
That belief, and the prospects for creating a new economic growth engine, led the White House to issue what some regard as a watershed policy in May. President Obama's Open Data Policy declared that agencies must treat the data they generate as a public asset. It directed agency executives to begin inventorying those assets. The president also directed agencies to make the data they produce available in machine-readable formats that businesses and the public can readily use.
But Francine Berman, a Rensselaer Polytechnic Institute computer science professor, said making government data more accessible addresses only part of the open data challenge.
Berman chairs the US Research Data Alliance and directs the Center for a Digital Society. She raised concerns about the future of the data economy. In particular, she worries about the risks of data blackouts -- when data feeds become unavailable, as happened during the recent government shutdown -- and the long-term sustainability of research data as business and funding models evolve.
"Research data has to live somewhere. The issue is who pays the data bill," Berman said during one of the forum's discussion panels. "If you think about who hosts the data, who uses the data, and who pays for the data, there are important groups of stakeholders" that must be taken into account.
She also argued that open data proponents overlook the fact that large amounts of health and science research data come from projects that depend on funding vehicles that can end suddenly. "What happens to the data when the funding runs out? You get a stakeholder version of Hot Potato, where someone else is expected to pay for the data. Whose responsibility is it?"
The problem can't get solved by government, industry, or academia alone, Berman said. It crosses every sector. "It's absolutely critical to think about who will host and pay the mortgage" on a wide range of data if the nation is to sustain the engine for data innovation.
Capitalizing on open data
The White House has made some progress in priming the pump for open data. In addition to setting the Open Data Policy, officials overhauled Data.gov, the portal for accessing free government data, to make it easier to find and download any of more than 85,000 data sets.
The administration also seized on the work federal CTO Todd Park had begun at the Department of Health and Human Services in 2010. Park had gathered leaders from the White House, federal agencies, academia, social sectors, public health communities, IT vendors, and major companies to catalyze the use of government data to improve healthcare. Those health events became the model for the White House, which has since hosted a string of Datapaloozas (including two this month) in the fields of education, energy, and public safety.
Getting entrepreneurs and large companies to invest in open government data takes more than inviting executives to the White House. The biggest challenge is setting aside the money to go after information ore that must be mined, refined, and brought to market.
Monsanto's $1 billion acquisition of Climate Corp. in October was a wakeup call for investors. Climate Corp., founded in 2006 by two former Google executives, built its business using government data to help farmers lock in profits in the face of adverse weather.
Joel Gurin, senior adviser at New York University's GovLab and author of the book Open Data Now, has identified 500 US companies whose business models rely significantly on government data. "What's interesting is that it's hard to predict how government data will ultimately be used," Gurin told attendees at the data forum in Washington last week. The Reagan administration released GPS data to the public 10 years before commercial devices became widely available to use the data, and it took another 10 years before smartphones made GPS data a ubiquitous service. "It would not have been possible to predict how GPS would be used back then. That's true for open government today."
Despite all the talk about liberating government data, the task of making data usable amid competing interests continues to stand in the way of faster open data exploitation. Cash-strapped local governments debate whether it makes sense to give data away for free or find ways to generate revenue streams from it.
Berman said that in some disciplines (particularly health sciences), data provides a competitive advantage that keeps it from being shared freely. One model to overcome that hurdle is a policy instituted by the National Institutes of Health, which requires researchers accepting grant money to deposit their data in an NIH repository, making it available to others.
The flip side of open data, according to Berman, is deciding which data is valuable and worth keeping. Though the cost of storage has become almost negligible, "we don't have the storage to keep all the data we're collecting. We really have to make decisions on what to keep and how to instrument the systems to help manage the daily deluge of new data. It's a social challenge as much as a technical issue."
Find out how a government program is putting cloud computing on the fast track to better security. Also in the Cloud Security issue of InformationWeek Government: Defense CIO Teri Taki on why FedRAMP helps everyone.