Virtualization Beyond Consolidation 2

As these case studies attest, virtualization may come easily at first, even leading to surprise gains -- but beware the pitfalls.
A funny thing has happened on the path to virtualization Nirvana: We've stopped, or at least greatly slowed, our progress toward highly virtualized data centers. Gartner says that just 16% of data center loads are virtualized, and our own survey shows ambitions for virtualization are actually backtracking. We set out to find some real-world virtualization "success stories," just to remind ourselves why we went down this road in the first place. Capital equipment savings and greater operational efficiencies have been the promise, but are they being achieved?

The answers, when answers are available, are varied. Some companies haven't slowed down their implementations long enough to measure the results they're getting. "We don't know what the savings are. We just know they're there," is a common response.

At the same time, our InformationWeek Analytics survey found that 35% of respondents say they expect to virtualize less than 25% of their data centers by 2011. That finding reflects either a less optimistic or more realistic assessment than survey respondents exhibited last year, when only 22% took that stance.

The reasons are legion. The ability to easily generate virtual machines tends to lead to a willingness to do more, and soon the IT manager finds virtual machine sprawl on his hands. As concentration builds up, performance and management problems emerge. Monitoring systems need to check not only whether a virtual machine is running, but also whether resources allocated to it match the VM's needs. In some cases, overallocation shortchanges operations elsewhere.

Then as the number of virtual machines per host server increases, I/O problems start to develop. I/O is the next chokepoint in virtualization. Cisco, working with VMware, built a network fabric to address this issue and entered the server market in 2009 with its Unified Computing System. Hewlett-Packard responded with the BladeSystem Matrix. Meantime, third parties had already spotted the issue. Xsigo, with its I/O Director, and others seek to virtualize the I/O and move it out of the hypervisor's virtual switch onto a hardware device, where packets are separated into their respective storage and network destinations, relieving the host server of work.

As we shall see, management tools are paramount once the virtual environment is generated. Our four case study examples ignore the 25% ceiling revealed in our survey; they're for the edification of those seeking much higher levels of virtualization in the data center and demanding a greater return on their virtualization investment.

Orchard Supply Hardware

When Moon Son, director of IT infrastructure at Orchard Supply Hardware, a California chain of 91 stores, became head of the company's data center in 2006, he realized immediately he would have to rebuild from the ground up around virtualization. His new employer had a shopping list of 33 projects it wanted him to undertake on top of an aging infrastructure, such as establishing two new financial systems and a PCI compliance system. For starters, he chose to phase out 30 servers, replacing them with new standalone and rack-mount models from Dell. He virtualized 13 host servers, and in the end, tripled the number of production systems to meet expanded company goals.

Orchard Supply had 45 physical servers, most of them near their end of life, Son recalls. His team rebuilt with Dell Power 2950 two-way servers, PowerEdge R710 rack-mount servers, and an Enterasys network switching fabric better able to handle the traffic. The team virtualized end user applications on Citrix XenApp Server, giving employees in stores Wyse Technology thin clients. Son then led a big push to "virtualize everything."

Orchard Supply is still short of that goal but is running 125 virtual machines on 13 servers. In the previous data center, that application count would have consumed 125 physical servers, each running a single application. Son has maintained the same total of 45 physical servers while tripling the number of system instances in production compared with three years ago. The two-way quad-core servers give the company lots of CPU cycles, so in many cases the IT infrastructure team has packed the servers with 32 GB or 48 GB of memory, increasing their purchase price from $7,000 to $10,000 to $11,000. To realize the gains of virtualization, you have to buy memory, Son says. Even so, he has spent $130,000 on Orchard Supply's 13 virtualized hosts versus the $875,000 it would have cost to buy 125 cheaper servers for standalone apps.

One way Son produced big savings was by moving to Microsoft per-CPU licensing for virtualized hosts. For example, he has spent $40,794 for Windows Server 2003 and 2008 licensing on his 13 virtualized hosts, each using two CPUs. If he had stuck to Microsoft Enterprise server licensing for 125 servers, he'd have spent $192,250.

Unlike many virtualization users, Son also put the company's Microsoft SQL Server databases into VMware virtual machines, then shifted from an enterprise/per-server license to per-CPU licensing, while at the same time reducing the number of SQL Server instances his team runs from 14 to eight. The switch put increased workloads on the company's eight virtualized database servers, but Son found after extensive testing that they could handle it. The savings: $22,500 ($30,000 for SQL Server licensing versus $52,500).

Son paid an average of $5,000 per host for Orchard Supply's 13 hosts running VMware's EXS Server under vSphere 4, adding an expense of $65,000, but overall, he says the company's virtualized environment cost a total of $265,794 for hardware and software versus $1,119,750 for a similar, nonvirtualized infrastructure. Orchard Supply's total savings after 3-1/2 years of intensive virtualization: $853,956.

"Now we can spin up a standard server in 10 minutes instead of three to four weeks," Son says. "It's worked out really well. Virtualization has taken the guesswork and human error out of server configuration."

Son wanted greater availability for Orchard Supply's time and attendance scheduling application, as any of the company's 5,000 employees "needs to be able to punch in or request time off at any time," he says. "This is one of our highest-availability applications. It has to be up 100% of the time."

In the former environment, 100% uptime wasn't possible, as the different servers developed their unique glitches and a small data center operations staff struggled to keep everything running. In the virtualized environment, VMware's vSphere 4 with VMware Service Manager provides availability management mapped to ITIL standards. Son's team can manage 13 of 45 physical servers through a vCenter console in "a centralized, single-pane view" of VM operations and resources.

Uptime improved to 99.51% last year. This year, it was up to two months of continuous 99.75%, and Son thinks his team can eventually get the environment to four nines, a big improvement over the pre-virtualized data center (though no metrics are available).

Orchard Supply has experienced no VM failures on any host, says Son, who attributes the high uptime to the automated configuration of virtual server machines via vSphere 4 tools. Son's team defines server profiles, then VMs are configured only to those specifications. For example, not all of the company's Dell servers were bought at the same time, so Son's team identifies groups with like characteristics, then defines policies that govern what can be run on them. A database server needs to be both CPU- and I/O-intensive, and the team can assign database VMs to the physical servers that are the best matches.

Likewise, the live migration of VMs needs to occur between identical chipsets. Within the x86 instruction set, there are slight variations even within the same generation of chips--say, the Xeon line--and moving VMs between slightly dissimilar iterations risks failure. Under the vCenter console, a vMotion command is reviewed to make sure the operator is moving like to like. It will show an alert if the operator tries to do otherwise. Son has closely documented the nature of each server, including the physical CPU, and defined policies saying what can be moved where. That may mean less flexibility than he'd prefer, but it also means many fewer interruptions and alerts, he says.

Virtualization helps reliability and uptime in another major way, Son says. Virtualizing servers not only allows for those machines to be consolidated, he notes, but it also allows for network interfaces to be consolidated, as fewer cables are needed to tie a group of servers to the network.

At Orchard Supply, an average virtualized host has 11 network connections, with two redundant connections to the iSCSI SAN , a vMotion connection, two dedicated backup connections, two service console management network connections, and two connections allowing communications between VMs on the server. That means 143 cables need to be connected for the 13 virtualized hosts and 125 virtual servers.

In the pre-virtualization days, 125 standalone servers would require 375 cables. A single virtualized host, on the other hand, runs 20 to 30 servers, which share the cabling of network interface cards and host bus adapters to network switches. Fewer cables reduces the chance of any downtime resulting from a cable getting bumped or dislodged under more crowded conditions.

More significantly, Son's team is rapidly expanding business services without increasing the server count. The environment is experiencing less downtime, and it has room to expand. "We oversized our virtualized infrastructure," Son says. "We can grow without buying more hardware or software."

To a seasonal business like Orchard Supply, he says, "virtualization makes our data center suitable for a flexible, cyclical enterprise."

Roswell Park Cancer Research Institute

Server I/O was a big consideration for Tom Vaughan, director of IT infrastructure, at Roswell Park, the nation's oldest cancer research institute, which serves 26,000 patients and supports $81 million in research grants in a seven-block campus in Buffalo, N.Y. Its applications include Lawson Software financials, a Cerner lab management system, 5,000 Microsoft Exchange mailboxes, and an extensive electronic medical records system that gathers 10 TB of medical images and research data a year.

Avoiding outages and supplying fast response times was a challenge for Vaughan and his staff of six. "We're pretty Spartan here," he says.

Like Orchard Supply, Roswell Park turned to VMware's vSphere 4 for help in managing virtualized parts of the data center. Vaughan found that heavy e-mail traffic and the need to juggle medical images and electronic patient records (Roswell doesn't keep paper patient records) and do patient billing and tracking presented heavy I/O demand for a traditional virtualized setting.

Part of the management challenge was outside Roswell's virtualized x86 servers. The institute still has a variety of systems and applications: its EMR system runs on an IBM AIX server; Exchange and other Windows applications run on multiple types of x86 servers; the Cerner Lab Management system runs on HP-UX; and the Lawson financials run on HP's OpenVMS. So in addition to deploying VMware virtualization, Vaughan is acquiring a more uniform environment with strong I/O characteristics that in the long run will be easier to manage. With two data centers on campus capable of backing each other up, Roswell invested in two HP BladeSystem Matrixes. Each blade is built with the same set of components; a patch to the operating system of one can become a patch used throughout the Matrix.

Vaughan has moved payroll onto the Matrix and will soon add the Exchange servers with their 5,000 accounts. However, many of the legacy non-x86 applications will take far longer to transition.

The Matrix systems consist of two c7000 blade enclosures, each occupied with 14 blades and virtualized as VMware environments. Vaughan's team arranged the hardware, networking, and storage so that they're distributed across the two campus data centers. Half of the blade servers run in one data center; an identical set in the other. Likewise, identical SAN units are running in each location and linked together. A failure in one location can be recovered and operations continued in the other.

Vaughan will eventually have 200 VMs running under 22 ESX Server hypervisors on the Matrix enclosures. "The goal is to automate everything as much as possible," Vaughan says, rather than administer systems manually one at a time.

Storage, networking, and servers are linked logically in the BladeSystem's management interface. A systems architect can design a system through a graphical user interface, moving icons around and setting parameters to create a template system. Those templates can then be activated as virtual servers "in six minutes to six hours, instead of six weeks," Vaughan says. Bandwidth for different types of networks can be assigned from the management tool.

The activation of the Matrixes as production systems will save Vaughan's small staff time and administrative headaches. The Matrix systems automatically produce auditable server logs and reports that help meet compliance requirements. To some extent they're self-regulating, instead of IT needing to painfully reconstruct server events to illustrate compliance in an audit.

Although Vaughan is concerned about the concentration of VMs that Roswell will eventually achieve on 28 blades, he says the BladeSystem is equipped to deal with heavy I/O from VMs. It's equipped with HP's Virtual Connect switch, which passes combined network and storage traffic from VMs off the server to switching devices outside the blade chassis. It's a "smart" pass-through, able to aggregate traffic from multiple virtual servers and move it to the appropriate network devices, whether Ethernet communications or Fibre Channel storage. A virtual infrastructure administrator can assign 10-Gb Ethernet or Fibre Channel capacity to each virtual machine in 100-MB increments, without needing to attach multiple cables to each physical server.

The Matrix can supply each blade with six network connections for its VMs while needing only two cables attached to the blade. Cabling multiple communications and storage networks to Roswell's rack-mount servers was previously a major headache, Vaughan says. As the number of VMs per server increased, so did the cabling. With his planned concentration and automation of VMs, "that wasn't going to work for us," he says. Instead of juggling cables, he's now "using a mouse" to establish network connections.

Vaughan is seeing a potential new value to virtualization that he hadn't anticipated. He's thinking of putting Roswell's SQL Server databases into VMs because their I/O capabilities will no longer be an issue. After a database stress test on the Matrix, he says: "From the I/O it does, SQL Server will have no problem running on the Matrix." Meantime, he'll get the automated compliance, load balancing, redundancy, and failover features that now characterize his other VMs.

Using virtualization and BladeSystem Matrixes, Vaughan estimates Roswell will save $2 million in operating costs over three years and avoid hiring staff he would otherwise have had to add.

Rooms To Go

Another "virtualize everything" proponent is Jason Hall, IT director of Rooms To Go, a retailer of coordinated furniture sets through 150 stores in Texas and the Southeast. Founded in 1991, it's now the fourth-largest independent furniture seller in the U.S., the company says.

As Rooms To Go grew, Hall expanded the data center, eventually filling five 42U racks with 165 rack-mount servers. Three years ago, he had planned to add three more racks, and once those were filled, move to a new data center. Instead, Hall put 200 virtual servers on one chassis holding nine HP blades, which reclaimed space. The chassis is only half full, leaving Rooms To Go plenty of room for expansion. Meanwhile, the data center's electricity consumption, including rooftop air conditioning consumption, has decreased by 20% to 25%, he says.

Unlike Windows-only shops, however, Rooms To Go had an additional 19 IBM AIX servers running point-of-sale, warehousing, distribution, finance, and customer service systems. Hall's team virtualized them as well, consolidating to two AIX servers. Now the problem is monitoring all those VMs and tracking each new one as it's created with an IT staff of six. The management system for VMware doesn't extend to the AIX VMs.

"We're at a point where we're 90% to 95% virtualized. We're moving toward being 100% virtualized. We won't make it," Hall says. "There's always one more application that won't be supported in a VMware environment." Some application vendors still refuse to extend support of their software running in a virtualized environment.

To get to such a high level of virtualization, Rooms To Go has had to make good use of VMware tools, such as Dynamic Resource Scheduler, which shifts resources among running VMs or shifts the VMs themselves to servers with more resources. As a result, it has gained much greater flexibility in meeting business demands.

End user applications have been virtualized on Citrix AppServer, running in 23 VMs on the new HP blades. Likewise, Exchange e-mail and SQL Server databases run in VMs. I/O for the VMs subsequently has become more of an issue, and Hall says his staff is studying how best to upgrade the underlying SAN.

Monitoring and managing both the IBM AIX and VMware/HP blade virtualized environments is "our biggest problem," Hall says. In addition to VMware monitoring tools, he has added CA's Virtual Assurance Pro for Infrastructure Management, capable of viewing both environments.

The monitoring tool ties into CA Spectrum Infrastructure Manager (for which an enterprise license starts at $200,000) and can provide root cause analysis of problems, whether a fault is occurring in a hardware component or software. It helps diagnose potential problems before they bring systems down, Hall says. As a result of virtualization and Hall's chosen toolset, Rooms To Go is achieving 50% to 60% CPU utilization rates, compared with much lower rates--under 15%--when each server ran one application.

Hall would like to push utilization higher, but he sees 80% as a ceiling he shouldn't cross. For when CPUs start reaching 80%, Hall has a policy telling VMware's Dynamic Resource Scheduler to kick in and start relieving the server by moving workloads. "Monitoring and automation in the virtualized environment help us run the data center," he says. "It makes it much easier to manage."

Owen Bird Law

Stephen Bakerman started virtualizing servers two years ago at Vancouver's Owen Bird Law, where he's the IT manager. He wanted to consolidate servers and gain new flexibility in meeting end user needs. Bakerman virtualized applications under Virtual Iron's hypervisor and delivered application services to remote users via Windows Terminal Services. His endpoint device of choice is the HP TC5545 thin client. On a hot day last summer, he found out just how valuable a move it was.

A cooling tower on top of the building died, robbing the data center of air conditioning for two days. Threatened with the need to shut the data center down, Bakerman's staff instead turned off all unnecessary applications and consolidated 13 VMs onto a single host. It was an emergency consolidation, one that Bakerman would have rather avoided on a server with only 16 GB of memory. That server was required to serve all the firm's attorneys and staffers.

But it was the key move that cut the data center's heat output from 17 physical servers to just four. The temperature still rose to an uncomfortable level but remained low enough for the firm to continue operating.

"Without virtualization, we would have had to shut down the entire network and tell everybody to go home," Bakerman says. Savings through plain old server consolidation had been the original goal; business continuity was an unanticipated benefit.

Still, Bakerman's plans to "virtualize everything" have hit a bump in the road. He initially built out Owen Bird's virtualized environment using Virtual Iron's hypervisor and tools, as they were half the price of VMware's and could do many of the same things at the time. Then in May 2009, Oracle acquired Virtual Iron.

Bakerman is waiting for Oracle VM Version 3.0, which is supposed to incorporate the Virtual Iron product line's characteristics, to see how compatible it will be with his current environment. He knows he's in for some disruption. In a Windows-oriented data center, Oracle VM would have to run on Unbreakable Linux, Oracle's Linux distribution, and Bakerman doesn't know that much about Linux.

Bakerman says that after the acquisition, he was cut off from any further Virtual Iron licenses while select other customers weren't. Oracle has suggested it will provide a migration path from Virtual Iron to Oracle VM 3.0, "but they can't guarantee anything," he says. "They tell us, 'We'll do our best.'" Bakerman had expected Oracle VM 3.0 in May; Oracle now says the beta version will be available in December.

Bakerman could switch to VMware or Microsoft's Hyper-V, but he isn't quite ready to go in those directions either. "I want to try out Oracle VM 3.0 before I make a final decision," he says, "but I haven't been happy with the way they've done things. I'm now leaning toward VMware."

Value in virtualization may come easily at first, even leading to surprise gains in business continuity and agility. But there are also unexpected chokepoints and pitfalls. Bakerman has experienced both sides.

Editor's Choice
Mary E. Shacklett, President of Transworld Data
James M. Connolly, Contributing Editor and Writer