Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.
Joao-Pierre S. Ruth
April 11, 2023
9 Min Read
The Frontier supercomputer at the Department of Energy’s Oak Ridge National Laboratory.Oak Ridge National Laboratory
The biggest, fastest supercomputers in the world, such as Frontier and Sierra, are capable of astounding processing speed, yet they must do so within reasonable power consumption limits.
The energy needed to run supercomputers, as well as keep them cool, is not insignificant -- and in some cases rivals the power demands of entire neighborhoods. Those who oversee these machines continue to find ways to be efficient with energy usage as the world tries to be a bit greener while also supporting productivity.
Tasks given to supercomputers with extraordinary compute power are often of significant importance, for instance national defense or virus research. With such vital work to accomplish, balancing energy consumption and sustainability is often integral to maintaining steady operation of these machines.
Green Supercomputing for Herculean Tasks
The supercomputer known as Sierra, one of the fastest in the world, is used for predictive applications to oversee the reliability of the nation’s nuclear weapons.
“We have the mission of strengthening the United States’ security through the development and application of world-class science,” says Anna Maria Bailey, high-performance computing (HPC) chief engineer and co-chair of the energy-efficient HPC working group at Lawrence Livermore National Laboratory, home of Sierra. The lab’s major goal, she says, is to enhance the nation's defense and reduce any global threats via supercomputing. Sierra runs on IBM Power9 CPUs and Nvidia Tesla V100 GPUs. At peak performance, its processing speed hits 125 petaflops, has 1.38 petabytes of memory, and consumes 11 megawatts of power.
Sierra's Master Plan for a Sustainable Supercomputer
The lab looked into a master plan for sustainability in 2008, Bailey says, which included eventually relocating from facilities that were 30 to 60 years old to newer buildings that could offer more efficiency at the power levels needed for a more advanced machine. “That actually became kind of the basis of our sustainability plan,” she says. “It was an iterative process where we were identifying gaps.”
That process included exploring emerging cooling technologies, Bailey says, including possibilities of liquid cooling in combination with air cooling as those resources evolved on the market. “We have gone from full air to about 90% liquid cooling, 10% air,” she says. “We’re still going to be looking at future technologies for emerging cooling.”
Air cooling puts its own burden on energy loads as air chillers typically must be powered to bring down ambient temperatures. Augmenting that with liquid cooling resources can reduce the need to make the air frigid in rooms that house supercomputers. “We were running the room at 58 degrees F,” Bailey says. “You could hang meat in there.” The lab makes use of local campus water to feed its liquid cooling machines, she says, as they try to get away from using chillers.
The combination of the shift to newer facilities where feasible, dynamic monitoring and control, and consolidation of resources has furthered the lab’s efficiency. “We actually have a lot of energy savings because the campus is so small,” Bailey says. “It’s one square mile, and all the utilities are in the same space.”
As supercomputer vendors develop more powerful machines, she expects changes to come in cooling options to accommodate such escalation. “We just can’t keep adding power,” Bailey says. “At some point, there needs to be a breakthrough in technology. That’s kind of what we’re hoping for.
Breaking Moore's Law
With ongoing advances in supercomputing, further miniaturization and complexity of microchips, and other factors, some have posited the slowing or end of Moore’s Law, which declared that within every two years, the number of transistors in integrated circuits doubled.
Developments in Infrastructure and Hardware
Zachary Smith, global head of edge infrastructure services for Equinix, says, “We’re just having a much faster evolution in terms of technology sets.” The cycle of technology improvements for high-end computing is moving very quickly, he says. The consumer market might see devices recycled steadily back into the system, but supercomputers and other large datacenters tend not to see that pace of replacement of machinery.
On the hardware development side, there is a desire to build machines more efficiently as well as make use of renewable energy resources such as hydroelectric in the manufacture and operation of supercomputers, says Mike Woodacre, CTO of HPC and AI at Hewlett Packard Enterprise. “We really have driven up the efficiency of delivering power to electronics over the last decade or so,” he says. “We’re trying to make sure we minimize loss as you go from the input to the data center to the input to the electronic components.”
Is This Really the End of Moore's Law?
Woodacre also says there may be significant challenges with the changes in scaling and the slowdown in Moore’s Law.
Excitement around the Frontier supercomputer, the current fastest in the world, was not just that it was the first to surpass the exascale barrier of one quintillion calculations per second, but it did so at over 50 gigaflops per watt. “A big breakthrough in energy efficiency,” Woodacre says. He expects future supercomputer architecture to combine high-performance computing and AI. “Basically, using AI technology to accelerate the efficiency of the HPC programs you’re running.”
HPE Frontier supercomputer at Oak Ridge National Laboratory. (Source: Oak Ridge National Laboratory)
With Great Power Comes Responsibility for Sustainability
The most powerful supercomputers are capable of rather monumental tasks, according to Bronson Messer, director of science for the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory. “Supercomputers at the scale that we typically field them have a lot more in common with things like the James Webb Space Telescope or the Large Hadron Collider at CERN than they do with somebody’s data form that’s in a suburban office building. They are unique scientific instruments.”
However, unlike scientific instruments such as the Large Hadron Collider, supercomputing can be brought to bear on almost any scientific or engineering discipline, he says. “There are questions that can only be answered effectively by computation in fields as varied as climate modeling, nuclear physics, computational engineering, stellar astrophysics, and biomedical investigation,” Messer says. “We sort of cover the full panoply of scientific inquiry.”
It takes a substantial amount of energy to support such feats. About 40 megawatts of power is fed into Oak Ridge’s building that holds its multiple data centers, he says. “We have moved beyond the scale of a single suburban power substation. We’re now at the scale of sort of two of those.”
Frontier is the largest machine at Oak Ridge, and when running at full bore it needs about 29 megawatts of power, Messer says. More than a decade ago, he says, the Department of Energy set a goal to see a supercomputer that can operate at Frontier’s scale -- 1.6 exaflops at peak -- with energy efficiency an essential element in that objective. Building such a computer in 2012 would have taken far too much electricity to run it, he says. “The efficiency for the machines simply wasn’t there,” Messer says. “A lot of effort was expended to improve the energy efficiency of high-end computing hardware to the point where it’s now a viable thing to think about.”
Energy Efficient GPUs
The introduction of the first hybrid CPU-GPU supercomputer was a crucial step in realizing that efficiency goal for higher processing speeds, he says, thanks to the energy efficiency of GPUs compared with CPUs. “There’s no free lunch, however, and what I always say about GPUs is they’re terrific, they’re really, really fast, but they’re abysmally stupid,” Messer says. “They do the same thing over and over and over again, and they are not particularly easy to program.”
Now, a decade later, he says hybrid CPU-GPU computing has become the accepted paradigm for high-end supercomputing. “I think everybody sort of realized it’s the only way to get to where you need to be,” Messer says, “both in computational efficiency and in pure unmitigated computational power.”
Hot Water Cooling
In addition to leveraging GPUs in supercomputers for efficiency, Oak Ridge uses hot water cooling for Frontier. “Water comes in on the ‘cold side’ at 92 degrees F and leaves well over a hundred,” he says. “That's a significant savings in electricity for us because we don’t have to run chillers except in the very, very hottest days of July or August in East Tennessee.”
That might sound counterintuitive, using water at such high starting temperatures, but Messer says by not cooling the water below 90 degrees, it can still help cool the supercomputer without the need for refrigerants to make the water colder or air chillers.
“It’s such a huge benefit not to have to run those chillers because that’s essentially sort of reverse air conditioning, which costs a lot of money to power,” he says. "Whereas operative cooling, all I have to do is run pumps to sort of cascade the water over cooling pins, and that's it. And let the atmosphere do the work."
Oak Ridge is also home to other supercomputers, including Summit, which like Sierra, was built with IBM CPUs and Nvidia GPUs. Summit was part of a supercomputing consortium leveraged to model very large data sets to assist in understanding and combating COVID-19.
Frontier, from Hewlett Packard Enterprise, runs on AMD CPUs and GPUs and has achieved exaflop speeds, meaning it can process more than one quintillion operations per second. Frontier also ranks among the most energy-efficient supercomputers.
Supercomputers of the future will continue to need to be efficient as their power needs inevitably grow. It might not be far-fetched to imagine that more exotic, sustainable energy sources might come also into play. “The thing that might be the closest thing that's not utterly science fiction is small modular nuclear reactors,” Messer says. “Is it impossible to think that we could have a small modular reactor powering a supercomputing center? I think it's not too far-fetched at all. And then of course there's also the promise of fusion power.”
What to Read Next:
About the Author(s)
Joao-Pierre S. Ruth covers tech policy, including ethics, privacy, legislation, and risk; fintech; code strategy; and cloud & edge computing for InformationWeek. He has been a journalist for more than 25 years, reporting on business and technology first in New Jersey, then covering the New York tech startup community, and later as a freelancer for such outlets as TheStreet, Investopedia, and Street Fight. Follow him on Twitter: @jpruth.
You May Also Like
10 Considerations to Building Hybrid Mesh Firewall
Hybrid Mesh Firewall: An Essential Solution for Today's Distributed Enterprise
MontanaPBS Shifts to Agile Broadcasting With Help from Raritan KVM Solutions
Solution Brief: Fortinet FortiFlex Delivers Usage-Based Security Licensing That Moves at the Speed of Digital Accelerationâ€‹
Checklist: Top 6 Considerations to Optimize Your Digital Acceleration Security Spend