Infrastructure // Networking
06:53 PM
Connect Directly
Repost This

'Exaflop' Supercomputer Planning Begins

Backed by $7.4 million in funding, computer scientists aim to narrow the gap between theoretical peak performance and actual performance through new architectures.

Researchers at Sandia and Oak Ridge National Laboratories are preparing for the challenges of developing an exascale computer at the new Institute for Advanced Architectures.

Through the IAA, scientists plan to conduct the basic research required to create a computer capable of performing a million trillion calculations per second, otherwise known as an exaflop. That's a million times faster than today's teraflop computers and a thousand times faster than the petaflop barrier, which was broken in 2006.

Sandia's ASCI Red became the world's first teraflop computer in late 1996.

Backed by $7.4 million in funding, computer scientists aim to narrow the gap between theoretical peak performance and actual performance through new architectures.

"We're actually not building an exaflop supercomputer," said Sandia project lead Sudip Dosanjh. Rather, he said, the U.S. Department of Energy and the National Security Agency have made it clear that they expect to have need for exaflop computing around 2018. The anticipated applications, he said, include large-scale prediction, such as global climate change predictions, materials science analysis, fusion research, and national security problems that he could not discuss.

To meet those requirements, "there are a number of research challenges we need to get to work on," said Dosanjh. "We really need to do that in collaboration with industry and academia. We want to do R&D that will impact real systems in the next decade."

One such challenge is power consumption. "An exaflop supercomputer might need 100 megawatts of power, which is a significant portion of a power plant," said Dosanjh. "We need to do some research to get that down. Otherwise no one will be able to power one."

Then there's the issue of reliability, which tends to decline as the parts count increases. Given that an exascale computer might have a million hundred-core processors, Dosanjh speculated that such a machine might run for 10 minutes before suffering a failure. To manage a machine with so many parts, new fault-tolerance schemes need to be developed.

Data movement is also a critical concern, said Dosanjh. "The rate of memory access has not kept up with the ability of these processors to do floating point operations," he said.

And in addition to the hardware engineering challenges, programmers have to be educated to write code for such massively parallel systems. "As far as the industry is concerned, there needs to be an education effort as well to get people trained to write software at this scale," said Dosanjh.

Just such an effort is already under way. Last October, Google and IBM launched an educational initiative to teach programmers at several universities how to code for large-scale distributed computing systems.

The IAA had its initial meeting in January, attended by almost 50 representatives from government, academia, and industry. The topic of discussion was memory in high-performance computing. At the organization's next meeting, Dosanjh said researchers will discuss interconnects, the networks inside supercomputers.

Comment  | 
Print  | 
More Insights
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.