All it took was President Barack Obama's signature to activate the National Strategic Computing Initiative (NSCI) late last month. The move frames a continuing partnership between IBM and the US Department of Energy to develop and deploy the latest generation of supercomputers.
But rather than throwing more processing power at big computational problems, IBM is taking a broader, systemic view on how data flows through a system and where computation can be placed to process data as it moves.
The concept is known as data-centric supercomputing.
"Measuring a supercomputer by floating point [operations] is an archaic idea," said David Turek, IBM's vice president for Technical Computing and a 25-year veteran in the supercomputing field.
Instead, one has to look at "exa-scale" computing, not measured exclusively in "exa-flops," but in multiple dimensions, Turek explained. It means up-scaling input/output, memory bandwidth, memory footprint, and other parameters. "It's based on how much faster you can cause your application to run," he said. "Floating point may be unrelated to making the app move faster."
On first appearance, exa-scale looks like exabytes of data being crunched by exaflops of computing power. "The movement of data creates delay," Turek noted. You have to change the architecture of the system to process data where it moves rather than shuttling it to a parallel processing array.
"We are using infrastructure to deal with the problem in totality," Turek said. That might mean doing basic analysis of data while it is in storage or doing MapReduce in the network as the data is flowing.
How It Works
To illustrate this concept, Turek offered as an example past work he did with an oil company. Oil exploration relies on seismic data, which is pretty voluminous to start. Supercomputers are a necessity here to operate the algorithms needed to sort and analyze the data.
Adding more racks of processors would only have increased solution times by maybe 2% to 3%, Turek said. "Workflow is so wrapped up in managing the data" that focusing on the processing power alone "was ill-founded as a computational strategy," he explained.
President Obama's NSCI announcement provides some additional strategic context framing IBM's $325 million contract with the DOE to deliver two data-centric supercomputers in 2017.
The first will go to the Lawrence Livermore National Laboratory in California, which is tasked with the strategic maintenance of the US nuclear arsenal. (The last physical test of a nuclear warhead by the United States was in 1996. All subsequent "tests" have been computer simulations of warhead designs. )
The second supercomputer, known as Summit, will go to the Oak Ridge National Laboratory in Tennessee, where work is already being done on smart electrical grids and carbon sequestration but could branch out to computational chemistry and life sciences.
Summit will replace the Titan Cray XK7 that has been chugging along at peak speeds of 27 petaflops since 2012. Summit will boast "40 teraflops per node vs. 1.5 teraflops per node for Titan," explained Buddy Bland, the project director for the DoE's Oak Ridge Leadership Computing Facility. But Summit will only pack 3,500 nodes versus about 18,000 for Titan, he added, and each of Summit's nodes will have about half-a-terabyte of memory.
"Our users really like that," Bland added.
"Today on Titan, the GPUs and CPUs do not have shared memory," Bland continued. Users have to move data from one to the other. Summit will allow CPUs and GPUs to access the same data from a shared memory. This will result in a "dramatic improvement in performance," Bland added. "It makes it easier for the user not to worry about data motion."
Currently, the DoE is working with IBM and Nvidia to craft the code needed to use Summit. "You have to get the data laid out in the system to take advantage of the memory bandwidth," said Bland. This is yet another refinement of computing practice where data is moved from slow-moving memory to high-speed memory as needed.
Another tweak Summit adds is "scratch pad memory." Cache, which determines where data is, "does not know what you will need next," Bland said. Scratch pad memory enables users to allocate data to memory and to keep it there until the user commands it to move elsewhere.
For IBM, data-centric supercomputing also serves as a "proof of concept" that can be scaled down to meet the more humble needs of university departments or sold to large corporations handling big data analytics.
National Security And Business Use
Turek admits that the market for highly advanced supercomputers is pretty limited to only a few institutions operating at the extreme fringes of computer science. The power requirement alone for the two DoE systems comes in at roughly 10 megawatts each. (One megawatt of electricity can power roughly 1,000 homes.)
But it's not just computer science for the sake of it.
The current record holder for the world's "most powerful" supercomputer is the People's Republic of China. The Tianhe-2 can perform at 33.86 petaflops, roughly twice the speed of the Titan supercomputer now housed at Oak Ridge.
Tianhe-2 is operated by China's National University of Defense Technology at Guangzhou and is one of four supercomputer centers designated by the United States as acting against American security interests. Chinese news agency Xinhua stated that Tianhe-2 is used for genetic analysis, drug development, and aerodynamic analysis for aircraft and high-speed trains.
Under Obama's NSCI, the US will build and deploy a system that will operate at 1 exaflop -- or 1,000 petaflops. But please be mindful of "Turek's Asterisk," as processing power alone no longer is the sole definition of how powerful a supercomputer is.