Wall Street-Style Power

With high-performance computing strategic and costly, companies push efficiency
Citigroup is in the middle of a major shift in its approach to high-performance computing, moving from clustered servers to a large-scale grid. Wachovia has built a service-oriented approach that delivers high-performance computing based on the revenue potential of a job, among other factors. These are two examples of how Wall Street is trying to get more efficient about how it delivers the supercomputing that has become critical to the industry.

Citigroup has about 10,000 CPUs globally in clusters doing high-performance computing. "We drew the line two years ago and said we weren't going to continue that route," says John van Uden, senior VP of capital markets and banking technology. The problem is that those clusters, across more than 50 data centers, aren't being used by everyone, so the setup doesn't improve server utilization. The grid project is driven by the need to reduce costs but still perform the same calculations.

Citigroup built its first grid--based on Platform Computing software and Hewlett-Packard servers--two years ago in a test that was deemed successful enough that it was expanded to other uses, primarily evaluating the risk of complex financial products such as collateralized debt obligations. Citigroup has 11 projects in various stages on the grid. About 2,000 cores operate on the grid today, and Citi hopes to have 7,000 by the end of the year, in two main sites in Texas and London.

What's hard about grids? "The emotional and political angles of trying to do this far outweigh the technical impact of trying to do it," van Uden says. "Pitching 1,000 to 4,000 boxes together is not difficult, as long as you have the space and the power. But trying to get a group of 25 applications to stop server hogging is demanding."

With monitoring tools in place, van Uden and Andrew Dolan, head of grid computing, find the grid is easier to manage than clusters. It was difficult to get the tools they used to manage all the company's servers to "manage 500 servers at one time," says Dolan. "But now that we have those tools in place, we'll forget we're managing 500 servers. It should be like we're managing two--a grid infrastructure and the actual compute nodes."


With such a high demand for high-performance computing in financial services, sometimes prioritization is just as important as clock speeds. That's why Wachovia built a service-oriented architecture to manage its high-performance computing.

It was built using IBM WebSphere DataPower SOA Appliances for routing and integration, along with Tibco software. The grid is managed using DataSynapse software, with computers based in New York; Charlotte, N.C.; and Philadelphia. It also uses high-end server appliances such as Azul Systems' Compute Appliances and TiGi's data throughput accelerators. To keep power costs down, in addition to using small, powerful appliances, Wachovia uses Verari Systems' BladeRacks, which have a patented vertical cooling rack and allow a direct power connection, so there's no need for a power distribution module.

A usage reporting tool from Evident details who has used how much computing resources for how long and for what purposes, and the effects on network bandwidth, storage, and system-level resources.

Wachovia has developed methodologies for splitting work into daytime and nighttime shifts, allocating capacity for critical work, and measuring value to the business before allocating resources. That means a Java executable determined to be revenue-generating might be moved to an Azul server appliance, which provides very fast--and expensive--processing.

What if it's a task that doesn't generate revenue and is costly, yet important, such as analyzing risk? "Welcome to our world of defining priorities," says Tony Bishop, a Wachovia senior VP. That's why the Wachovia team built a service management function based on the discipline of the Information Technology Infrastructure Library, where it creates a service contract based on strategic drivers of the business. So, since managing risk is critical to Wachovia's business, risk analysis processes receive the highest level of computing performance.

Editor's Choice
Brian T. Horowitz, Contributing Reporter
Samuel Greengard, Contributing Reporter
Nathan Eddy, Freelance Writer
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
Cynthia Harvey, Freelance Journalist, InformationWeek
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing