AMD Designed Bulldozer For Large Virtual Machine Loads

BlueLock, which provides infrastructure services from its data centers, likes AMD cores for computing in highly virtualized environments.
AMD's new Bulldozer chip is a set of pragmatic tradeoffs that get a new processor to market before the Intel juggernaut can launch its next generation Ivy Bridge chip and lock up more of the server market. Whether Bulldozer can clear AMD's path back to prosperity remains to be seen, but it's a strong attempt that may find favor with those running heavily virtualized environments.

It's been criticized as underperforming on expected clock speed and invoking a previously failed approach to Pentium data pipeline design. And the way AMD achieved 16 cores by glomming two eight-core chips together, eliminating one of the floating point engines while keeping the two integer engines, will strike some as an inelegant approach. Nevertheless, it seems clear to me that AMD dispensed with some chip design barriers to go for the sweet spot in today's server market--maximizing a server's capabilities to serve as a virtual machine host.

To test that notion, I turned to Aaron Branham, director of infrastructure at BlueLock, a supplier of infrastructure as a service to startups and enterprises. His clients run lots of ESX Server virtual machines in his data center--BlueLock is one of a handful of VMware vCloud partners. His is a heavily virtualized environment; he routinely runs 100 virtual machines on his multi-tenant servers.

Right now those servers are HP DL 585s with 12-core Opteron processors. With four CPUs per server, Branham has 48 cores per machine with his pre-Bulldozer generation of Opterons. He sees a direct correlation between cores and virtual machine concentration.

"With VMware virtual machines, the more cores and memory you have, the greater efficiencies you'll be driving," he said in an interview from his Indianapolis data center. According to the metrics in his shop, a virtual CPU consumes one quarter of an Opteron core. Even though he's running 100 VMs per server, he's only using 40% of his available CPU cycles. So even as heavily virtualized as his environment is, he's still got plenty of headroom.

The resource that's in short supply is memory. Branham's servers pack in 128 GBs of RAM per CPU, and his stats tell him on average he using 80% of it. So even though he's got CPU cycles to spare, he's close to his ceiling on memory use and can't increase his virtual machine count at this point.

Part of cloud economics is to operate efficient, low-cost infrastructure. BlueLock kept its costs down as it built out its infrastructure over the last two years by buying servers with the most cost-effective memory mix at the time for the cloud service. "We bought servers with a mix of 8 GB and 16 GB (direct inline memory modules or DIMMs)--16-GB DIMMS cost twice as much per gigabyte as 8-GB DIMMS," he said in an interview. Since his purchase, the price of 16-GB memory has dropped to the level of an 8-GB module. With all those untapped CPU cycles, Branham can upgrade his infrastructure by swapping out 8-GB memory modules for 16-GB modules, instead of buying new servers.

[Want to see how AMD arrived at its new 16-core architecture? See AMD Bulldozer Chip Wants To Flatten Intel. ]

But for same reason, the AMD Bulldozer chip will be interesting to BlueLock in the not too distant future. Bulldozer (formerly referred to as Interlagos by AMD) has strengths on both sides of the virtual-machine demand curve. The new chip will offer 16 cores per CPU, or 64 per four-socket server, plus each CPU can support up to 384 GBs of memory. That combination would allow BlueLock to run a lot more virtual machines, without expanding its data center floor space, cooling, or electricity use.

Branham cautions that to get to 384 GBs of memory in today's market, a server would need to be loaded with 12 DIMMS carrying 32 GBs each, the most expensive memory on the market. To put that amount in an HP server today would result in a $96,000 (12 X $8,039 per DIMM) bill just for memory. It's not BlueLock's goal to load up its data center with the most expensive servers but to use the most cost-effective servers for its heavily virtualized operation.

Branham says BlueLock's business is expanding rapidly. At some point, he knows he can buy new servers with a mix of expensive 32-GB memory and less expensive 16-GB memory, and start hosting more than 100 VMs per server. As memory costs continue to decline, that 384-GB ceiling per CPU looks inviting, room to continue to expand his business without frequent server upgrades.

Intel is by no means idle in this picture and is bringing its own strengths to virtualized hosts. Intel's next generation, Ivy Bridge, will run faster than AMD Bulldozer and may offer stronger floating point processing per core. It will be shipping by the end of the year, according to Tom's Hardware.

But for BlueLock's business of intensively virtualized, multi-tenant servers, AMD has held a consistent edge, Branham says. By moving to 16 cores ahead of Intel, even as Intel brings other strengths to market (Intel has shrunk circuits to 22 nanometers; AMD is still on 32 nanometer), AMD has stayed in the microprocessor competition by upping the ante on cores, data handling within the chip, and addressable memory.

Most chips in the x86 family often serve a dual purpose: they can be used in servers or power users' desktops. But intensive desktop applications won't thrive under Bulldozer. In this latest design, AMD abandoned some of the features that make an x86 chip good for end user machines. In a review on Ars Technica a month ago, Chris Foreman said, "Anyone building or buying a new PC has little reason to even consider Bulldozer."

By rearranging the components on the chip real estate in favor of virtualization, AMD has again stolen a march on Intel in the server competition, as it did with the original Opteron in 2003. Whether its pragmatic trade-offs are enough to hold that position for long is hard to say. But advanced implementors of virtualization, like BlueLock, are showing what can be done, once the chip and server architecture are optimized for this task. It's a lot more than what's been accomplished in the first phase of virtualization.

Charles Babcock is an editor-at-large for InformationWeek.