A new model of integrated computing may let companies tap resources on demand
At Pratt & Whitney, the calculus that shapes decisions can get pretty precise. The $7.7 billion aerospace-engineering division of the manufacturing conglomerate United Technologies Corp. employs hundreds of engineers to run complex computations that simulate airflow through jet engines and test stress on materials. Pete Bradley, the company's associate fellow for high-intensity computing, isn't about to ask his team to line up to access centralized computers. Instead, engineers solve problems on deadline by running jobs on a computational "grid" comprising 8,000 computer chips inside 5,000 workstations across three cities. "We no longer talk about things like, 'We saved X number of dollars'--it's just part of our business," Bradley says. "We couldn't live without it."
Pratt & Whitney's customers are using grid computing, too. NASA designers have been logging time on the Information Power Grid, a transcontinental computer system that links hundreds of servers and workstations at space administration research facilities from California to Virginia. Software at NASA Ames Research Center in Moffett Field, Calif., that simulates wing design can be linked with engine-simulation software at the Glenn Research Center in Cleveland, with the results delivered to one computer. It's a more holistic way of working, and it lets scientists create tests that weren't possible before.
Grid computing gives users a highly dynamic view of resources, says Lawrence Berkeley Lab's Johnston, who works with NASA on a grid computing project
"You can't put these simulations" on a single computer, says Bill Johnston, a department head at Lawrence Berkeley National Laboratory working with NASA on the project. "Sometimes they're too big, sometimes they're specialized code that needs to live near the experts. The grid gives you this highly dynamic view of resources."
In Bellevue, Wash., Boeing Co. researchers have written grid-computing software that runs statistical analysis jobs for air-and spacecraft design across an SGI supercomputer, a 16-CPU Linux cluster, and two Sun Microsystems servers--no mean feat. Up next: Dispatching jobs across data centers in California, Missouri, and Washington to help engineers meet deadlines. "For 10 years, we've been talking about distributed objects," says platform computing program manager Chuck Klabunde. "Grid computing looks at taking scientific and engineering applications, and treating them as distributed objects."
Enabled by still-dizzying advances in microprocessor speed, even faster increases in network bandwidth, and new software that provides a standard way for applications to tap remote computing resources, grid computing links far-flung computers, databases, and scientific instruments over the public Internet or a virtual private network and promises IT power on demand. The approach encompasses peer-to-peer networking, in which workstations and servers pool CPU cycles to run parallel jobs. But its reach is broader. Software provides a common interface for sharing CPU cycles, files, and data over a wide area network, giving developers a standardized layer to write to without knowing the specs of each machine that might be brought online. All a user has to do is submit a calculation to a network of computers linked by grid-computing middleware. The middleware polls a directory of available machines to see which have the capacity to handle the request fastest.
Grid computing also reflects changing ideas about the nature of software, who owns it, and where it resides. Instead of relying on a set of instructions that executes on one machine or on a small group designed to work in tandem, grid computing treats software as a service. It's closely related to vendors' distributed computing strategies, including IBM's WebSphere, Microsoft's .Net, Sun's Sun One, and the work derived from Hewlett-Packard's E-Speak Services.
"It's kind of the Holy Grail of computer science to get the sum of all the power of all the computers working together," Microsoft chairman Bill Gates says. Figuring out which programs can be parsed so that data can be shuttled across computers "is an issue that some of our smarter researchers are working on," he says.
Microsoft is developing grid-computing software for use with its products, as are HP, IBM, and Sun. One day, the vendors say, customers could lease computing power from hosted grids, saving money by paying for only the IT they need. Rick Hayes-Roth, HP's chief technology officer for software, says his company can "probably save customers 50% of their capital outlay" if they lease power from a hosted grid.
"The vision is very compelling," Gartner Dataquest analyst Jim Cassell says. Intel servers only operate at about 5% to 20% of capacity during the workday--and more or less zero at night. But Gartner Dataquest expects that by 2006, just 5% of companies that routinely use supercomputers will turn to grid computing for applications that aren't security-sensitive.
That's partly because the infrastructure for corporate grid computing isn't widely available. Products such as IBM's Enterprise Workload Manager, HP's Utility Data Center, and Sun's N1 suite, which let IT shops shift computers and disks among apps in virtual pools, are just starting to come to market. Businesses on the forefront of grid computing, such as BMW, Charles Schwab, General Motors, GlaxoSmithKline, J.P. Morgan Chase, Novartis, and Unilever, use homegrown software or software from startups such as Avaki, Entropia, Platform Computing, and United Devices. GM, for example, uses workload management tools from Platform and in-house job-routing software to chain hundreds of workstations together to run supercomputer-class problems in aerodynamics, fluid flow in engines, and heat dispersion.
The current tools deal mostly with dispatching jobs inside a single data center--not across several--and while symmetric multiprocessing systems can deliver data at about 1 Gbyte per second, Internet transmission speeds are far slower. "Grid products aim at a small number of users and a fixed pool of machines," says Pratt & Whitney's Bradley. What's more, he says, commercial grid software requires that users delete files and free up memory and processors when jobs get killed. But Pratt's engineers don't have the knowledge or the inclination for such things.
Despite grid computing's limitations, paying for more computing power during periods of peak demand may appeal to IT managers who can no longer stock up on hardware and software. Business investment in IT was down nearly 8% in the first quarter compared with a year ago, according to the Commerce Department's Bureau of Economic Analysis.
"What are my options to buy more horsepower?" asks Rich Vissa, an executive director who oversees IT for research and development at Bristol-Myers Squibb Co., a $19.4 billion New York pharmaceutical company. A $500,000 Linux cluster might deliver three times the current performance. Or, he says, "I could spend several million dollars to get a high-end SMP system from IBM or Sun. That might buy me a fivefold to tenfold boost. But if I can tap into the power of several thousand PCs, we can get a 100-fold performance increase. That was the carrot for us." The cost to build a grid varies, but it can run about $200 per PC for software licenses, plus a couple of dedicated administrators, and the price of getting apps to run in parallel. For a 500-node grid, that's about a quarter of a million dollars.
Last fall, Bristol-Myers began two grid-computing pilots, running software that analyzes the efficacy of potential drug compounds using the spare computing cycles of more than 400 desktop PCs. By the end of July, Bristol-Myers' R&D centers plan to deploy grid software across several thousand PCs to speed drugs to market. It's an especially critical move for the company, which is struggling to find new medicines to fill its development pipeline, has lost sales to generic drugmakers, and last year agreed to pay up to $2 billion for the rights to a cancer drug that ended up squelched by federal regulators. So far, however, business grid-computing experiments are mostly limited to projects that live behind company firewalls--useful, but not in the same league as the wide-area projects under way at supercomputer research centers and national labs. "Will we need that kind of power? I don't know yet," Vissa says. "This isn't a panacea to replace all high-end servers."
What grid-computing software does best--balancing IT supply and demand by letting users specify their jobs' CPU and memory requirements, then finding available machines on a network to meet those specs--isn't necessarily an advantage for business-computing tasks such as managing the flow of raw materials and finished goods in a supply chain or selling products through an E-commerce Web site. That means grids are unlikely to replace big symmetric multiprocessing systems for running apps dependent on serial logic and large databases, such as those from Siebel Systems Inc. and SAP. GM still needs to run its crash-test simulations on IBM and HP supercomputers with fast throughput among processors--something grids don't yet afford.
"Grid computing is good for large computations with small data transfer that don't require shared memory," says Manuel Peitsch, head of informatics and knowledge management at Novartis, a Basel, Switzerland, pharmaceutical company whose brands, including Gerber, Maalox, and Theraflu, generate $41 billion in annual sales. "It's not good when you have someone waiting behind a Web page for an answer."
Novartis has used grid-computing software from Entropia and United Devices across as many as 600 PCs to analyze how potential drug compounds might bind to target proteins, yielding new medicines. The company plans to load grid software on all its research staffers' PCs and replace more lab analysis with computer models, for less money than it would cost to buy new hardware. But Peitsch discounts the approach for jobs that involve collaboration or customer service. "What you don't have is predictable, 100% throughput," he says. "Grid computing is not for E-business."
Not yet, anyway. Computer scientists have dreamed for decades of large-scale computer resource-sharing. A classic 1960 paper, "Man-Machine Symbiosis," by MIT computer scientist J.C.R. Licklider, addressed the idea, as did '60s time-sharing experiments at MIT and elsewhere. In the '80s, DuPont investigated how it could link Cray supercomputers and scientific instruments to connect staff in its Wilmington, Del., headquarters with factory systems in Singapore. But microprocessors and networks weren't fast enough to pull it off, says former DuPont researcher David Dixon, now an associate director for theory, modeling, and simulations at Pacific Northwest National Laboratory in Richland, Wash. "Now it's 14 or 15 years later, and that might be possible."
The advent of high-speed gigabit networks in the early 1990s got computer scientists thinking about new applications. In 1992, Ian Foster, a senior scientist at Argonne National Lab and professor at the University of Chicago, and Carl Kesselman, a professor at the University of Southern California, began work on the Globus Project, a software-development effort funded by the National Science Foundation and the Department of Defense. The Globus Toolkit (version 3 was released in test form last month and should be finalized by year's end) has emerged as the de facto standard for connecting computers, databases, and instruments at universities and supercomputing centers, in part because it lets software developers deal with just one layer of APIs to execute jobs.
Sounds great, right? The only problem is, the grid is about where the Web was in 1994: academically interesting, attractive to progressive companies, but still missing the key ingredients necessary to achieve critical mass. Even Foster acknowledges that it's unlikely businesses will lease resources from large grids built by IT vendors anytime soon, given concerns about manageability, information sharing, and cross-platform security. Grids built with Globus authenticate users by issuing electronic certificates, but there's no guarantee businesses will want to advertise their computing resources on a meganetwork. Instead of linking computers at its bicoastal research sites over the public Internet to simulate protein behavior, San Diego biotech company Structural Bioinformatics Inc. plans to build a virtual private network--not exactly what info-utopian grid theorists have in mind. "We live and die by intellectual-property issues," says Kal Ramnarayan, VP and chief scientific officer at the company.
Another problem is that the field is populated with startup vendors whose survival businesses can't bank on. "A lot of the vendors that have good products tend to be small companies," says Priyesh Amin, a principal consultant at GlaxoSmithKline, the $29.8 billion-a-year British pharmaceutical company. "We come across a lot of bugs."
Several factors could help grid computing become more mainstream. Version 3 of the Globus Toolkit will include a software layer called the Open Grid Services Architecture, backed by an industry consortium called the Global Grid Forum that includes HP, IBM, Microsoft, Platform Computing, and Sun. The protocol would let applications call "grid services" to find computers, balance workloads, and transfer jobs using standard Web Services Description Language; more details will be hammered out at the forum's meeting next month. IBM says its DB2 Universal Database and WebSphere suite will accept calls written in OGSA by next year.
Last August, IBM ported the Globus Toolkit to its four computing platforms. This fall, IBM will begin installing hardware for the National Science Foundation's TeraGrid, a massive cluster of Intel Itanium 2 computers spread across four sites stretching from the National Center for Supercomputing Applications in Urbana-Champaign, Ill., to the San Diego Supercomputer Center at the University of California, and capable of a peak speed of 13.6 trillion floating point operations per second. IBM says grid computing also has applications for its life sciences, Wall Street, and oil-exploration customers, and can eventually be used to more efficiently deliver computing in "utility" pricing arrangements. "The next stage of the Internet is to turn it into a computing platform," Irving Wladawsky-Berger, IBM's VP of technology and strategy, said at a conference in San Francisco last month.
HP in April began integrating the Globus and OGSA protocols with its experimental Utility Data Center, which lets companies assign servers and disk drives to apps as needed. Sun this month released version 5.3 of its Grid Engine software, which lets IT managers assign policies and scheduling to one-site grids. And Microsoft has spent $1 million to port the Globus Toolkit to Windows. All of which is good news for companies ready to take the next step in high-end computing.
Ford Motor Co.'s Formula 1 Jaguar Racing unit last year began leasing cycles on HP machines from an off-site design house to beef up its simulations. "It's like using my bank--I don't want to know about finance or interest rates or investments," says Steve Nevey, a computer-aided engineering manager in Milton Keynes, England. "I just want to give someone my money and get a return. This is the same thing."
5 Top Federal Initiatives For 2015As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.