Government Toils To Create Big Data Infrastructure - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Government // Big Data Analytics

Government Toils To Create Big Data Infrastructure

Government is slowly puzzling out how to extract knowledge from the most data it has ever had available to it.

year gathering, processing, and disseminating data. A recent Commerce Department report estimates that this data contributes from $24 billion to $221 billion annually to private sector revenues.

But big data differs qualitatively as well as quantitatively from merely large amounts of data. It is characterized not only by its volume, but by its complexity, the fact that it consists of both structured and unstructured data, and that it often is distributed among a variety of storage facilities, sometimes located far apart. This makes it difficult to use traditional data processing on big data. It requires not only more computing power, but new ways to store, search, access, transport, analyze, and visualize it.

Brand Niemann, founder of the Federal Big Data Working Group and former senior enterprise architect and data scientist at the Environmental Protection Agency, advises agencies, "The best way to deal with big data is to start small." Learn how to use distributed structured and unstructured data, and then scale up, he says.

Given the potential value of big data being assembled in government and the private sector and the challenges of using it, the Obama administration in 2012 announced a Big Data Initiative, with more than $200 million committed from six agencies to foster research and development on how to better extract useful information from these large masses of data. Within 18 months of launching the initiative, the National Science Foundation announced about $150 million worth of grants for projects ranging from cancer genome and human language processing to data storage. Also investing in big data programs were the Defense Advanced Research Projects Agency (DARPA), NASA, the U.S. Geological Survey, the National Institutes of Health, and the Energy Department.

Exciting times
The DOE projects are focusing on data management and indexing techniques for large and complex datasets. One effort is the Federated Earth System Grid (ESG), created to provide the climate research community -- both in and out of government -- with access to hundreds of petabytes of simulation data. It is a federated architecture with multiple portals, means of access, and delivery mechanisms. The framework has three tiers: Metadata services for search and discovery, data gateways that act as brokers handling data requests, and ESG nodes with the data holdings and metadata accessing services.

Another Energy Department effort is Toolkit for Extreme Climate Analysis (TECA), for visualizing multi-terabyte climate simulations intended to further understanding of global climate change. The sheer size of datasets being analyzed presents challenges, and they are growing exponentially with improvements in the models and the speed of the computers running the simulations. Conventional visualization and analytical tools use a serial execution model, limiting their use with very large datasets. TECA is a step toward a parallel model for analysis.

These efforts are advancing the specialized area of climate research, Wehner said. Today there are only five groups in the world running these types of simulations; in five years there will be 12. "It's an extremely exciting time for this kind of science," he said.

One of the primary responsibilities of Energy Department labs is overseeing the nation's nuclear arsenal, which relies on computer simulations since the halting of nuclear testing. Grider, who "owns" the high-performance computing center at Los Alamos, has been working with big data since 1971, "building datasets that were far bigger than would fit in anybody's memory."

Over the years, the fidelity required to understand what is going on in nuclear weapons has grown to the point that new storage solutions are required, and the lab will be installing two to three petabytes of memory in the next year, part of a system that eventually will store from 200 to 500 petabytes.

But "it's bigger than just storage," Grider said. Dealing with these volumes of data requires not only capacity, but the bandwidth to access the data and tools to manage it.

Los Alamos will be using Scality's Ring software-defined storage system. The system provides massively scalable object storage that is hardware agnostic, so the lab can use any medium it wants in the system. That was one of the attractions of the Scality Ring system, said Leo Leung, Scality's head of corporate marketing. "They wanted to choose the hardware later. They wanted to separate that decision."

Scality's Ring software provides centralized management of distributed storage, without bottlenecks or a central point of failure. It uses erasure coding, a technique that is used in cloud storage to protect data and make it

Next Page

William Jackson is writer with the <a href="" target="_blank">Tech Writers Bureau</A>, with more than 35 years' experience reporting for daily, business and technical publications, including two decades covering information ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2 of 3
Comment  | 
Print  | 
More Insights
White Papers
More White Papers
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
Charlie Babcock,
User Rank: Author
10/3/2014 | 6:44:04 PM
Let's put more investment into Dept. of Energy data
Nice piece by Willaim Jackson. Government has the data -- lots of it. But it's hard to get it into the right hands and systems to analyze it and make it useful. I would think the Department of Energy would benefit from a big investment in handling big data.
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
Register for InformationWeek Newsletters
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Flash Poll