Hurricane Sandy Lesson: VM Migration Can Stop Outages
When a hurricane or other disaster threatens, why not just move critical systems out of the way? It can be done -- but not at the last minute.
Datapipe prepared its two data centers in Somerset, N.J., for the storm, making sure to top off the fuel tanks of the diesel backup generators and going the extra length of calling in a diesel fuel tank truck from its private contractor and parking it on premises. Daniel Newton, senior VP of operations, said other preparations on staffing and communications had been made and the data centers rode out the storm, experiencing only a couple of minor leaks from wind-driven rain.
The site kept a bank of emergency generators running as other sites lost their power supplies and the Somerset utility power showed fluctuations. Newton thought nothing of using a little diesel fuel. He had plenty to spare. Then "an unforeseen circumstance occurred" as his backup supply tanker fired up its engine and drove off. Its owner had been ordered to deliver fuel to hospitals, nursing homes and convalescent centers instead of standing by at Datapipe. The sheer scale of the storm had undermined the plan.
Newton said the site never suffered a power outage, so the issue became moot. But he had discovered a hole in the plan. Datapipe immediately plugged it "by procuring our own fuel truck."
In New York, the solution wasn't as simple. The unexpected happened and a storm surge washed over three blocks of lower Manhattan from Battery Park. Caught in that surge was 75 Broad Street, with Peer 1 Hosting and Internap data centers in the building. Steve Orchard, senior VP of development and operations, knew the building's fuel supply system was stocked up and his backup generators were on the second floor, well above any conceivable flooding. But he didn't allow for the reserve fuel tank's vent pipe, allowing air to enter the tank as fuel was pumped out. It was two feet above the ground, outside the building.
When the storm surge hit the neighborhood, it flooded the basement, disrupting the redundant pumping system's electrical supply and shutting down the pumps. That would have been a relatively simple problem to fix: bring in a new pump and move fuel from ground level to the second floor. But salt water had been able to enter and flow down the vent pipe into the building's reserve diesel supply and 10,000 gallons of precious fuel was contaminated. Orchard had two major issues to overcome with the building's engineers. They did so, rigging a new pump, fuel supply and "creative fabrication of piping and hoses" to start moving fuel to the second floor.
Orchard said there were many different workmen involved, figuring out how to disengage the fuel line from its current linkages and apply new fittings to allow it to connect to a fuel truck. They had to locate a small generator to provide power to let them do the work. Internap had to shut down late in the morning Oct. 30. It was up and running again before midnight, having been out of commission for less than 12 hours, thanks to "the creativity and resiliency" of the Internap staff and 75 Broad building engineers.
One thing that might have gone wrong didn't. When salt water got into the vent pipe, the redundant system of pumps in the basement stopped working at the same time, so the contamination didn't spread in the line. Instead of needing to flush the line and perhaps repair generators, Orchard only needed to connect the new source.
The best laid plans of many data centers had some aspect of disaster recovery go off the rails. The collected experiences might become an argument for disaster preparedness to move out of the realm of attempting to guarantee the physical integrity and continuous operation of a given data center to system transfer -- migrating mission-critical virtual machines out of the data center to another, outside of harm's way. That approach would be useful only not for hurricanes but for fires, floods and earthquakes.
The ability to move virtual machines, which started with VMware's VMotion capability, used to be limited to a new location in the same data center rack. It gained the capability to move across racks in one data center, then between data centers.
Internap, Datapipe and many other data center service providers now have high-level disaster recovery services that allow the movement of critical systems from one location to another. But SunGard, another provider of the services, warns you can't wait until the last minute to invoke them.
"If you don't have a subscription (for disaster recovery) with SunGard, we don't allow you to sign up on the fly. The owner can't call the insurance company to write a policy when the building's on fire," said Walter Dearing, VP of recovery services at the SunGard Availability Services unit.
In fact, such recovery systems still take forethought and planning to implement. The key issue is not whether you can place virtual machine duplicates in some other location. That's a cinch. The main problem is getting a synchronized and up-to-date data flow into those systems to allow them to keep running.
Few companies keep a hot mirrored system running in a remote location, receiving a real-time stream of data and ready to pick up where another leaves off in a few milliseconds. There are many ways to recover systems that are less expensive than a complete live duplicate, and each customer decides what level of recovery he must have.
Do they want to recover by digging week-old tapes out of a vault somewhere? Do they have snapshot backups that are only a day old? Do they have all the server logs they need to reconstruct transactions up until a point that is only a few minutes or a few seconds short of the point of failure?
"Tape is the cheapest backup. It's also the most error prone in terms of physically accessing the tapes and the data on the tapes," Dearing said, especially during weather like a hurricane. But even a site that is taking frequent snapshots of its data and replicating them to two or more remote locations will need data recovery systems in place to maintain data integrity and restart systems.
"Recovery facilities do not have a cookie cutter similarity," he noted. Its facilities in Carlstadt , N.J., and Philadelphia are prepared to handle recovery of much larger, more complex systems than its facility in Arizona, he said.
Even with a virtual machine recovery system, it must be tested frequently and rigorously, something many customers find hard to fit into busy schedules. Seamless failover based on virtual machines is possible today between sites, Internap, Datapipe and SunGard executives all agree. "But you can't do it without the proper due diligence," Dearing said.
Recent breaches have tarnished digital certificates, the Web security technology. The new, all-digital Digital Certificates issue of Dark Reading gives five reasons to keep it going. (Free registration required.)
Google in the Enterprise SurveyThere's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity products, and 69 percent cite Google Apps' good or excellent mobility. But progress could still stall: 59 percent of nonusers distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.