In Depth: Supercomputers Get A Speed Boost From Specialized Chips
Computer engineers are increasingly using hardware accelerators to break through the limitations of all-purpose microprocessors.
They're far outside the computing mainstream. One CIO refers to their work as a "future conversation," and a senior Microsoft technologist has called them "lunatics." But a handful of computer engineers, laboring in geeky backwaters of the industry, are applying obscure chip designs in an effort to reclaim the leaps in computing speed that defined the IT market for most of its history.
Computer scientists at IBM, Cray, Sun Microsystems, and U.S. and Japanese universities and research labs are experimenting with specialized computer chips built just for number crunching that can give big boosts to software apps without taxing microprocessor I/O or adding much in the way of heat or costs. The techniques come in response to a sharp slowdown in the performance increases of traditional microprocessors over the past several years.
Design for a range of uses, Tokyo Tech's Matsuoka says.
So far, the biggest computer chipmakers--Intel, Advanced Micro Devices, Sun Microsystems, and IBM--have addressed problems of rising heat and leveling speed by designing chips with two or more CPUs on a single silicon die. But performance barriers may await the multicore approach, and some technologists are betting on an alternative. They're using specialized chips such as field-programmable gate arrays (FPGAs), application-specific integrated circuits, graphics processors, and even chips designed for video games inside computers that tackle a range of scientific and business applications. The Tokyo Institute of Technology last month unveiled the world's seventh-fastest supercomputer, an NEC-designed behemoth of 665 Sun Microsystems servers housing 10,480 of AMD's Opteron chips. The system crunched 38.2 teraflops (trillion floating point operations per second) on a benchmark test. The institute will add 360 boards featuring hardware accelerator chips from ClearSpeed Technology, which makes customized ASICs that can boost application performance. That could add 5 to 10 teraflops to the supercomputer's number.
A key design point for the system was to balance the general-purpose Opterons with the special-purpose ClearSpeed chips, exploiting the low cost of x86 technology to bring supercomputing to a large number of users. "We wanted the best of both worlds," says Satoshi Matsuoka, a professor responsible for computing infrastructure at Tokyo Tech. Since the supercomputer runs a variety of programs, including analyzing protein molecules, studying hurricanes and typhoons, simulating blood flow in the brain, and studying the effects of planets' magnetic fields, "we couldn't really push the specialized portion too far," he adds. But building future machines that exceed the petaflops mark--an industry goal of computing at 1,000 teraflops--will almost certainly require accelerator chips. "That's where the future trend is," Matsuoka says.
Need For Speed
Hardware acceleration shows up today in video game systems and high-end entertainment PCs, as well as in appliances such as TCP/IP and Secure Sockets Layer accelerators. Accelerators also played a role in IBM's Deep Blue chess computer that beat world champion Garry Kasparov in a 1997 rematch between the grand master and the machine. And FPGAs, which can be issued new instructions after they're built, have long been popular as test chips for prototypes.
What's different is the desire to extend accelerators to more applications. Instead of simply off-loading work to a specialized processor, companies want to integrate the accelerators on the same piece of silicon with the CPUs. They're trading the ability to run any application and easier programming for blazing speed.
Multicore chips are no panacea, says IBM's Turek.
Accelerators aren't appropriate for every job; they're difficult to program, and apps need to be tuned for the hardware. "When it clicks, it runs fast. When it doesn't, it's slow as a dog," Matsuoka says. Potential uses include digital video processing, computer animation, financial analysis, seismic image processing in the oil industry, and biotech. So far, real-world deployments can be counted on one hand. But if technologists can overcome programming and other challenges, hardware accelerators could be the next evolution in computing performance, following transitions from vector chips in the '80s to RISC-based machines in the '90s to clusters this decade. "This is sort of the wild, wild West," says Dave Turek, VP of deep computing at IBM. "But the stars are probably aligned for its success."
Why now? For decades, chipmakers relied on increasing clock speeds--the number of instruction-executing ticks per second--as an engine of industry growth. As clock rates doubled every 18 months, chip speeds followed suit, leading to increases in application performance of 50% to 60% a year without adding more chips to a system. Shrinking transistor sizes, the ability to queue up more work in a chip's pipeline, and the regular introduction of more complex microarchitectures led to steady demand for new products.
Now, chip performance for widely used single-threaded software has slowed to gains of less than 20% each year. "The single-threaded juggernaut is basically over," says Steve Scott, CTO at Cray, the supercomputer company whose customers include Boeing and the federal government.
The Business of Going DigitalDigital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.