Although the ability to leverage virtualized servers for disaster recovery has always been a core part of enterprise virtualization platforms, until recently it was relegated to the, "oh, by the way" section of most vendors' feature lists in favor of sexier benefits such as reducing server sprawl, cutting capital costs, and, of course, the ever-present green effect. To be clear, we're talking disaster with a capital D--fire, flood, or pestilence, not operational recovery from user error or file deletion. Most of us have daily data protection plans in place. Few have the mandates, funds, and spunky SAS 70 auditors to make sure our disaster-recovery (DR) strategies involve more than leftover hardware, sketchy testing, and a stale run book.
Now, better DR through virtualization has moved front and center. We're seeing specialized offerings from established players, like VMware's Site Recovery Manager. Citrix XenServer Enterprise and Platinum include Live Migration, and our tests of Microsoft Hyper-V show that, given sufficient memory, cutting a virtual machine over to new hardware is relatively painless. Other vendors have launched products or retooled existing applications with an eye to catching the virtualized DR wave. Novell's PlateSpin Forge and Double-Take Software's eponymous offering are examples.
What wave, you say? A review of our 50 most recent virtualization projects found that nearly 90% of clients that have virtualized their main systems have rolled some level of the technology into their DR plans.
One interesting trend: folks who virtualize their disaster-recovery environments but not their primary networks. Virtualization enthusiasts may scoff, but this makes sense on a few levels. It's much less expensive than building a standard DR site. It's a great way to introduce virtualization and build up internal skills without affecting the production network. And it skirts the pesky issue of vendors that don't officially support virtualized versions of their applications.
A midsize private school in Rhode Island took this route. After a disaster befell the institution, IT opted to fill a hole in its DR plan using a virtualization appliance rather than virtualizing the production network. The creator of an accounting application critical to the school doesn't (yet) support running the app in a virtualized environment. We proved that the software would run in a VM and set it up at the DR site. When the vendor eventually adds VM support, the school is a step ahead.
For organizations that already have virtualized their servers, the challenge is deciding on the size of the DR site configuration and setting a failover level. If you've just implemented virtualization, you've likely got a bit of a budget windfall (which will disappear quickly once the CFO figures out what's going on) and some unused but perfectly functional gear. So if your production environment is 275 virtual servers running on 30 physical hosts with 10 TB of data on a SAN, do you need the same capacity for your DR site? Or can you whittle down to fewer servers to save money?
How do you decide? You don't. This is one place IT must get business leaders involved. Push the DR plan back to the CEO and COO for clarification on anticipated usage. That will be a major factor in decisions on bandwidth, host servers, and storage configuration.
What happens if you undersize your DR site? A major benefit of virtualization and SAN usage is the ability to quickly expand the supporting hardware infrastructure. Let's say you have a 10-node host server cluster at the home office. Your DR site holds five servers that would support the main office in the event of a disaster. Your plan can include a provision that if a failover lasts for more than 48 hours, you would add host servers to improve performance and spread the load.