Data Management

'Exaflop' Supercomputer Planning Begins

Backed by $7.4 million in funding, computer scientists aim to narrow the gap between theoretical peak performance and actual performance through new architectures.

Thomas Claburn, Editor at Large, Enterprise Mobility

February 22, 2008

3 Min Read

Researchers at Sandia and Oak Ridge National Laboratories are preparing for the challenges of developing an exascale computer at the new Institute for Advanced Architectures.

Through the IAA, scientists plan to conduct the basic research required to create a computer capable of performing a million trillion calculations per second, otherwise known as an exaflop. That's a million times faster than today's teraflop computers and a thousand times faster than the petaflop barrier, which was broken in 2006.

Sandia's ASCI Red became the world's first teraflop computer in late 1996.

Backed by $7.4 million in funding, computer scientists aim to narrow the gap between theoretical peak performance and actual performance through new architectures.

"We're actually not building an exaflop supercomputer," said Sandia project lead Sudip Dosanjh. Rather, he said, the U.S. Department of Energy and the National Security Agency have made it clear that they expect to have need for exaflop computing around 2018. The anticipated applications, he said, include large-scale prediction, such as global climate change predictions, materials science analysis, fusion research, and national security problems that he could not discuss.

To meet those requirements, "there are a number of research challenges we need to get to work on," said Dosanjh. "We really need to do that in collaboration with industry and academia. We want to do R&D that will impact real systems in the next decade."

One such challenge is power consumption. "An exaflop supercomputer might need 100 megawatts of power, which is a significant portion of a power plant," said Dosanjh. "We need to do some research to get that down. Otherwise no one will be able to power one."

Then there's the issue of reliability, which tends to decline as the parts count increases. Given that an exascale computer might have a million hundred-core processors, Dosanjh speculated that such a machine might run for 10 minutes before suffering a failure. To manage a machine with so many parts, new fault-tolerance schemes need to be developed.

Data movement is also a critical concern, said Dosanjh. "The rate of memory access has not kept up with the ability of these processors to do floating point operations," he said.

And in addition to the hardware engineering challenges, programmers have to be educated to write code for such massively parallel systems. "As far as the industry is concerned, there needs to be an education effort as well to get people trained to write software at this scale," said Dosanjh.

Just such an effort is already under way. Last October, Google and IBM launched an educational initiative to teach programmers at several universities how to code for large-scale distributed computing systems.

The IAA had its initial meeting in January, attended by almost 50 representatives from government, academia, and industry. The topic of discussion was memory in high-performance computing. At the organization's next meeting, Dosanjh said researchers will discuss interconnects, the networks inside supercomputers.

About the Author

Thomas Claburn

Editor at Large, Enterprise Mobility

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful master's degree in film production. He wrote the original treatment for 3DO's Killing Time, a short story that appeared in On Spec, and the screenplay for an independent film called The Hanged Man, which he would later direct. He's the author of a science fiction novel, Reflecting Fires, and a sadly neglected blog, Lot 49. His iPhone game, Blocfall, is available through the iTunes App Store. His wife is a talented jazz singer; he does not sing, which is for the best.

See more from Thomas Claburn

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

'Exaflop' Supercomputer Planning Begins

About the Author

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

<span class="ArticleBase-LargeTitle">'Exaflop' Supercomputer Planning Begins</span>'Exaflop' Supercomputer Planning Begins

About the Author

Editor's Choice

'Exaflop' Supercomputer Planning Begins