The superfast input-output speeds of cluster file systems could change the way companies approach storage
Building powerful supercomputers from off-the-shelf PCs, disk drives, and Ethernet cables running on the open-source Linux operating system has become more than a way to get high-performance computing on the cheap. Those clusters have upended the market for large systems over the last decade. But the ability to shuttle data between the computers and disks hasn't kept pace with advances in microprocessor and memory speeds, adding time and costs to important projects. Now an emerging class of file-system software for clusters stands to change the way companies buy storage.
Cluster file systems, including the open-source Lustre technology developed by the Department of Energy and commercially backed by Hewlett-Packard, speed input-output operations. The technology already is making a difference at universities, national labs, and supercomputing research centers, and it could make inroads into general business computing in coming years.
Cluster file systems are "incredibly fast," says Oak Ridge National Labs CTO Studham.
Photo by Brad Jones
"In terms of raw performance, it's incredibly fast," says Scott Studham, chief technology officer for the National Center For Computational Sciences at Oak Ridge National Laboratory and president of a Lustre user group. With Lustre, I/O speeds range from hundreds of megabytes of data per second to or from disk to 2 Gbytes per second per computer. And since results increase nearly in lockstep with the number of workstations attached, aggregate speeds in a cluster can reach dozens of gigabytes per second while reading from disk.
"Enterprise-class file systems won't do this," says Greg Brandeau, VP of technology at Pixar Animation Studios, which runs a cluster file system from startup Ibrix Inc. The system serves up 240 billion data requests a day from Pixar's 2,400-CPU rendering farm for the computer-animated film Cars, due next year. Pixar is for the first time using "ray tracing" techniques that lend its characters reflective chrome and more realistic shadows, but which place massive demands on CPUs and networks. "We've realized over the past six months that we're not doing enterprise-class computing anymore--we're a high-performance computing shop," Brandeau says.
This week, HP plans to release a second version of its Scalable File Share, a server and software package launched in December that uses Lustre to distribute storage serving in a cluster, much as IT shops have been doing with computing servers for the better part of a decade. Scalable File Share lets Linux machines in a cluster read data at up to 35 Gbytes per second and allows for up to 512 terabytes of total storage, double its previous capacity. "One of the keys is you now build the storage system using cluster technology," says Kent Koeninger, HP's high-performance computing products marketing manager.
Problems with scaling up traditional file systems have to do with the way computers manage data on disk. Instead of being cohesive wholes, computer files consist of blocks of data scattered across disks. File systems keep track of the blocks, assigning free ones to files as they need more space. When multiple computers vie for access to data, most file systems will lock a block in use by one computer, even if others are requesting it. When that machine is done, the block again becomes available to other nodes in the cluster. But as organizations add more machines to a cluster--sometimes hundreds or thousands--managing those data blocks takes up more of the system's CPU and networking bandwidth.
"At the end of the day, it translates into less application performance," says David Freund, an analyst at IT research firm Illuminata. "You've got a scaling problem." Lustre solves this problem by letting hundreds or thousands of servers share a file system by spreading management of blocks of data over multiple computers. Even though dozens of machines may be handling I/O chores, they look like one file server to the rest of the cluster. That translates into much higher I/O speeds than are possible using business-computing standards such as storage area networks or network-attached storage.
"Lustre solves a technology hurdle happening in the high-performance computing market that will happen in normal markets: Disk drives aren't getting faster at the rate that CPU and memory bandwidth are going up," Studham says. As users deploy their applications across many CPUs in clusters, reading data from disk, or writing it there, chokes performance. The problem has become so bad, he says, that his discussions with storage vendors focus on data speed, not size. "For the past 10 years, we've been negotiating dollar per gigabyte from our storage vendor," he says. "This year and next, it will be more about cost per bandwidth. This is the first time I've bought storage and said, 'I don't care how much you give me; I care about dollar per gigabyte per second.' We've just met that inflection point."
Clusters are becoming more important in science and business. According to a closely watched list of the world's 500 fastest supercomputers released in November by the University of Tennessee and the University of Mannheim in Germany, 296 of those systems are clusters. Storage also is getting more attention inside businesses, as federal regulations meant to prevent fraud are compelling companies to save more data. Sun Microsystems earlier this month said it would acquire Storage Technology Corp. for $4.1 billion in cash in a move meant to help it capitalize on that trend. If the emergence of Lustre and competing technologies at universities, national labs, and a small number of ambitious businesses takes hold more broadly, it could change the storage-buying equation.
"Lustre has gotten a remarkable amount of traction," says Chuck Seitz, CEO and chief technology officer at Myricom Inc., a maker of specialized networking equipment for clusters. The technology's speed and low cost have helped it carve a niche at sites such as Lawrence Livermore National Laboratory, Pacific Northwest National Laboratory, and the National Center for Supercomputing Applications.
The NCSA runs Lustre on its 1,240-node, 9.8-teraflop cluster called Tungsten, which runs programs for atmospheric science, astronomy, and other applications. "You don't want an $8 million computer sitting there on an I/O wait," says Michelle Butler, a technical program manager for storage at the supercomputing center. Keeping wait times short also means scientists working on grants from the National Science Foundation get charged for less computing time. "With apps five or 10 years ago, no one ever did I/O because of the wait times," she says. "Now, data is everything."
5 Top Federal Initiatives For 2015As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.