Big Data // Hardware/Architectures
News
11/21/2013
08:06 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%

How NASA Manages Big Data

Unmanned space missions can generate hundreds of terabytes of data every hour. What's a space agency to do?

NASA's LADEE Moon Mission: 5 Goals
NASA's LADEE Moon Mission: 5 Goals
(click image for larger view)

NASA has dozens of missions active at any given time: Robotic spacecraft beaming high-resolution images and other data from great distances; Earth-based projects surveying polar ice or examining global climate change. As you might imagine, the volume of data generated by these multiple efforts is staggering.

For Chris Mattmann, a principal investigator for the big-data initiative at NASA Jet Propulsion Laboratory (JPL) in Pasadena, Calif., the term "terabytes of data" is hardly daunting.

"NASA in total is probably managing several hundred petabytes, approaching an exabyte, especially if you look across all of the domain sciences and disciplines, and planetary and space science," Mattmann told InformationWeek in a phone interview. "It's certainly not out of the realm of the ordinary nowadays for missions, individual projects, to collect hundreds of terabytes of information."

[ NASA has found a faster way to download data from its missions to scientists on Earth. Read NASA's Moon Laser Sets Data Speed Record. ]

Not surprisingly, massive volumes of data bring formidable challenges, including the enduring big data question: What should we keep?

Not all bits need to be preserved for eternity, of course, and the trick is to determine which to save for the archives, and which to mine for insights but ultimately discard.

At NASA, the goal of some big data projects is to archive information, which means "keeping the bits around and doing data stewardship," says Mattmann.

Data from NASA's Earth Observing System (EOS) satellites and field measurement programs, for instance, is stored in the agency's Distributed Active Archive Center (DAAC) facilities, which process, archive, and distribute the information.

"Their responsibility… is to be the stewards and preservers of the information. It's a fairly large project, and their job is to ensure the bits are preserved, that they hang around."

Some big data projects, however, hinge more on analysis than stewardship. One radio astronomy example is the planned Square Kilometre Array (SKA), which will include thousands of telescopes in Australia and South Africa to explore early galaxy formation, the origins of the universe, and other "cosmic dawn" mysteries.

"In that particular case, there are a lot of active analytics and analysis problems that [researchers] are more interested in than necessarily keeping the data around."

Another example is the US National Climate Assessment, a federal climate-change research project that Mattmann participates in. Its primary role is "to produce better measurements of snow-covered areas, and measurements of snow in areas where dust, black carbon, and other pollutants typically impact the way that satellites see snow," says Mattmann.

"That's an example, on the Earth side, of where it's mainly a big data analytics problem and not a preservation problem."

NASA must manage hundreds of petabytes of information generated by missions and analyses.
NASA must manage hundreds of petabytes of information generated by missions and analyses.

JPL's big data operations use a lot of open-source software, most notably Hadoop, a development that suits Mattmann and his team of 24 data scientists just fine.

Here's why: Since 2005, Mattmann has been a major contributor to the Apache Software Foundation (ASF)'s big-data efforts.

"I was one of the people who helped invent the Hadoop technology," said Mattmann, who was on the project management committee for a large-scale search engine "that Hadoop kind of got spun out of."

Today, Mattmann sits on the ASF's board of directors.

Open-source projects are "really useful in the context of government, and in terms of us wanting to save money."

NASA, he pointed out, also makes good use of Apache TIKA, an open-source tool for detecting and extracting metadata and structured text from documents, to decipher the 18,000 to 50,000 file formats available online.

"For us, file formats are where all the scientific observations, metadata, and information about the data are stored," said Mattmann. "We have to reach into files, crack them open, and pull this information out, because a lot of it feeds algorithms, analytics, and visualizations."

Bold visions are competing with practical budget realities for federal IT leaders. Our latest annual survey looks at the top IT priorities. Also in the new, all-digital Tech Priorities issue of InformationWeek Government: IT leaders are making progress improving the efficiencies in their IT operations, but many lack the tools to prove it. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Susan Fourtané
50%
50%
Susan Fourtané,
User Rank: Ninja
11/23/2013 | 11:38:07 AM
The art of mining fascinating data
Jeff, 

Nice, and interesting article. :) 

NASA must be one of the biggest data collectors on the planet. 

I believe responding to the question "What should we keep?" doesn't have to be easy as from such a massive volume of data if not of immediate value some data might be of value for a space exploration data archive.  

Deciding on the value of such fascinating data is most likely not an easy task.

-Susan
Brian.Dean
50%
50%
Brian.Dean,
User Rank: Ninja
11/23/2013 | 7:26:20 AM
Re: Data scientists
WKash, you made an excellent point about government becoming a fertile breeding ground for big data scientists. I think data scientists are like any other professions but since they are costing something in the range of 120k to 300k illustrates that they are in short supply at the moment. Hence it is nice to see the White House taking interest in Big Data as throughout the economy and global economy data scientists are needed badly.

I understand that it is becoming a trend to have a major economic crisis once every 50 years or so, it happened during the Dutch Tulip trading till the 2008 housing crisis, some say that the bit-coin market is next while others say that it is a new form of currency, speculations aside, student debt will be the next crisis if the students do not get real jobs soon -- a situation which I think could have been avoided if Big data was used while giving out student loans.
WKash
50%
50%
WKash,
User Rank: Author
11/22/2013 | 7:11:42 PM
Re: Data scientists
Brian, thanks for weighing in.  It would seem that NASA, NOAA, Commerce, the IRS (not to mention the CIA) and other federal agences handling big data sets will become fertile breeding grounds for the emerging field of data scientists -- and hopefully the tools that can support senior leaders who aren't data scientists, but still need the insights big data analytics can provide.

 
WKash
50%
50%
WKash,
User Rank: Author
11/22/2013 | 5:01:39 PM
Re: Data scientists
Brian, thanks for weighing in.  It would seem that NASA, NOAA, Commerce, the IRS (not to mention the CIA) and other federal agences handling big data sets will become fertile breeding grounds for the emerging field of data scientists -- and hopefully the tools that can support senior leaders who aren't data scientists, but still need the insights big data analytics can provide.

 
Brian.Dean
50%
50%
Brian.Dean,
User Rank: Ninja
11/22/2013 | 3:43:52 PM
Re: Data scientists
Good point and I am sure that this number of 24 is already understated because I guess, they are only counting the people who have math and programming skills as data scientists but even then right besides them would also exist people with math and astronomy skills, or math and climatology skills, that are needed for the exact same job to even function.

It requires a lot of money, but at the end of the day if they manage to mine data to produces valuable insight like the "Hubble Deep Field" which changes entire perspectives or cosmic inflation etc then I think it is money well spent.
WKash
50%
50%
WKash,
User Rank: Author
11/21/2013 | 5:23:09 PM
Data scientists
Interesting to note that JPL has 24 data scientists on board.  At the rate NASA/JPL are accumulating new data, it will be interesting to see how that number grows along with the volume of data.
Shepy
50%
50%
Shepy,
User Rank: Apprentice
11/21/2013 | 9:14:15 AM
no budget
"Open-source projects are "really useful in the context of government, and in terms of us wanting to save money.""

Considering how miniscule the NASA budget it is, i bet they cant get enough of FOSS
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest Septermber 14, 2014
It doesn't matter whether your e-commerce D-Day is Black Friday, tax day, or some random Thursday when a post goes viral. Your websites need to be ready.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.