NASA has been publishing a lot of data for a long time, but getting it into standard machine-readable formats is still a herculean task.
NASA Spinoffs: 6 Innovations In Health & Medicine
(Click image for larger view and slideshow.)
NASA has been an open data operation since the passage of the National Aeronautics and Space Act of 1958, in the very earliest days of the Space Race after Sputnik. The agency has always published untold volumes of scientific data.
Yet the kind of standardized, machine-readable data demanded by the Obama Administration's Open Government Initiative remains a challenge.
"That made more complicated -- or, you might say, made wonderful -- the job we were already doing," NASA open innovation program manager Beth Beck said in an interview. "Big data is NASA -- that's what we have -- but taking all that data and making it machine readable, that's a big job." Most of the data is already digital and readable by some internal applications created by NASA and its network of contractors. The challenge is finding it in a sprawling, decentralized organization and putting it in a form that others can use. Some important data is locked up in the form of PDFs of scientific articles, when a data analyst would much prefer structured XML or even a comma-delimited download of tabular data.
That job rests on Jason Duley, whom Beck introduces as her "data emperor." His real title is lead software developer for the open innovation team. NASA has established the data.nasa.gov website, which feeds into the centralized data.gov catalogue established by the White House. NASA's open data effort has intensified since February or March, when the White House Office of Management and Budget began pushing the open government policy harder, Duley said. A year ago, NASA had 25 published open-data sets. Within a few months, Duley expects to have more than 4,000 available. However, that is still only scratching the service.
"The biggest challenge is finding the data, because NASA is so large, and the field centers all operate like different corporations, and they all have a different governance model," Duley said. There is no guarantee that important data even resides on a NASA server -- sometimes data storage is delegated to a contractor, and there might be no way of knowing that if you weren't part of the project that created it.
Mars data from data.nasa.gov.
"There's not a clear process where someone initiates a workflow to open their data," Duley said. Beginning to create that workflow and a standardized information architecture is one of his current goals. "There should be a notification scheme that will allow us to discover this data, rather than have to hunt for it. I want to delegate responsibility for modeling the data to the people who own it."
The process has to be efficient, Beck said, because the open data initiative is essentially an unfunded mandate, or at least there is very little additional money available to make it work. "It's not our mission to be the data-on-demand agency, but that's one of the mandates. It is dizzying, very hard to keep up with what new mandates are. Even if we put a process in place, the mandates could change."
In particular, the OMB is now asking NASA to identify the top five users of its data from outside government and how they are using that data -- something the agency has never tracked before, or not in a systematic way. The agency is supposed to be able to show "what are they using the data for, what's the financial benefit they get from it, and why do they want the data," she said.
Duley said one possibility for tracking usage more accurately might be to make data available through an application programming interface, rather than straight data downloads. "That would give us traceability into who is using our data, to what extent, and what services they're accessing."
API access to NASA data might also be a way of bringing open data and big data together more, allowing outsiders to access and manipulate the data without necessarily downloading it in bulk. Almost by definition, true big data sets are too large to download easily.
Though Beck's group is working to set up a repository for scientific publications, in most cases it will not host the data itself. Instead,
David F. Carr oversees InformationWeek's coverage of government and healthcare IT. He previously led coverage of social business and education technologies and continues to contribute in those areas. He is the editor of Social Collaboration for Dummies (Wiley, Oct. 2013) and ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Cybersecurity Strategies for the Digital EraAt its core, digital business relies on strong security practices. In addition, leveraging security intelligence and integrating security with operations and developer teams can help organizations push the boundaries of innovation.