DR-As-A-Service Moves Closer To Reality

New cloud-aware disaster recovery automation software like that from VirtualSharp could free IT from complex runbooks and expensive recoverability exercises.
Virtualization and cloud services are rapidly changing the calculus of business continuity and disaster recovery planning. But just having virtualized applications and an Amazon Web Services account doesn't automatically buy you recoverability. Today's multitier enterprise applications, which often pull information from multiple storage repositories and databases, are complex beasts. Being able to replicate data to an S3 bucket or EBS volume and spin up some virtual machines in the cloud still leaves a lot of work integrating all the pieces before the application is actually usable--witness runbooks that might run hundreds of pages, cover scores of individual systems, and require dozens of IT staffers, each a specialist in a different area, to execute.

Runbook automation software is designed to address this complexity (for more details, see my research report on IT automation), but these systems never really spread beyond the realm of Fortune 500-size enterprises doing DR in dedicated data centers. Runbook automation software is complex, expensive, and designed for captive data centers, not cloud services. It's supremely ill-suited to SMBs moving more of their infrastructures to shared services.

A new generation of cloud-aware DR automation and recovery software aims to pick up where traditional runbook automation leaves off. One of the first examples of cloud-aware DR automation comes from VirtualSharp, with its ReliableDR product. The premise is simple, according to VirtualSharp co-founder and CEO Carlos Escapa, and boils down to providing automation and testing to ensure that "whatever is running in cloud A is also running in cloud B."

If only it were that simple.

To see why it's not, Escapa points to how DR exercises have traditionally been conducted: typically weekend-long marathons that resemble IT's version of a war game, involving a rented data center from a BC specialist like SunGard, lots of people toting laptops stuffed with complex runbooks and hauling disks and tapes to the temporary locale, dozens if not hundreds of people on call, and ultimately a lot of money. Escapa says it's not uncommon for a large enterprise to spend $1 million or more on a single test. In fact, he knows of one global financial services company that burned through $10 million testing recovery from a worst-case, “smoking hole” scenario. Clearly, in this context, DR testing is a once-a-year event.

Enter VM-based DR automation and the cloud. Right off the top, the cloud means you don't need to rent a data center to conduct a recovery test. Next, because VMs are nothing more than software-configurable servers, IT can automate server instantiation, application dependencies, and other runbook tasks. Escapa says VirtualSharp's product bootstraps the automation process by including templates and rules for many popular enterprise applications, including Exchange, SharePoint, and Oracle servers, along with the ability to read existing scripts written in Windows PowerShell.

But automating application failover using a standard configuration also enables the software to test and verify compliance with any number of usage scenarios. Furthermore, each failover test can be captured and saved to serve as a snapshot should IT wish to roll back application state to an earlier version, say after a software update gone awry. The upshot is that a DR exercise that once took a weekend can now be done and validated in a couple of hours. Escapa says it's not uncommon for companies with four-hour application recovery time objectives to perform recovery tests every two hours and checkpoints every three.

Automation doesn't just save time, either. Escapa claims that a single recovery test for a midsize all-VMware environment with about 100 machines that once cost $50,000 can be done for little more than half that--every day for an entire year.

So where does the cloud come in? So far we've just been extolling the virtues of virtualization and software automation. Although VirtualSharp's product was developed for large enterprises with multiple data centers, Escapa sees SMBs using cloud services as the next big DR software market, and his company would like to be the arms merchant for a new generation of recovery-as-a-service, or RaaS, products that automate DR to the cloud. So far it's signed up one service provider in Spain [PDF], but Escapa expects others, including U.S. companies, to follow later this year.

BC/DR automation software like that from VirtualSharp, IBM Tivoli, Sanovi, or VMware, coupled with virtualized applications, can greatly reduce recovery times and expense. However, they're far from being a fully canned SaaS application. You still need to configure storage arrays to replicate data to a cloud service, set up scripts or load balancers to remap DNS to an alternate site, and develop automation scripts. But the promise of DR-as-a-service is certainly getting closer to reality.

In our next column, we'll look at another piece of the DR-to-the-cloud puzzle--using hypervisors to virtualize data replication itself.