Eucalyptus 3 includes high availability as a feature, not an engineering project, with an architecture that was designed to eliminate single points of failure in the private cloud, said Marten Mickos, CEO. The update has mechanisms for automatic recovery from a component failure, whether it's a storage drive or a host server.
The Eucalyptus cloud was early out of the blocks with compatibility with Amazon APIs, so that a call to Eucalyptus storage can be transferred into EC2, where it will perform the same function with Amazon Web Services S3 storage. Critics said it was too hurried in seeking Amazon compatibility, but so far it's hard to see how that's been a bad bet. Amazon remains the dominant supplier of infrastructure as a service, and there's been no revolt against its insistence that workloads sent to it be in its own Amazon Machine Image virtual machine format.
If anything, Amazon compatibility is suspect mainly when Amazon shoots itself in the foot, such as when it generated a widespread service failure in its Northern Virginia data center April 22-24 when a mistaken network transfer froze up Elastic Block Store and Relational Database services.
As if sensing a change in mood, Eucalyptus has shifted its emphasis from full Amazon compatibility to sustainability of an Amazon-like private environment. If customers are building out private clouds in hopes of using them in conjunction with EC2, high availability is a sort of upfront reassurance that the hybrid approach won't go haywire, even if the public cloud momentarily does.
High availability "is now a standard request from all over our customer base. Nobody else has delivered this to market yet," said Mickos with the air of certainty he used to use in driving acceptance of the MySQL open source code database.
In effect, when Eucalyptus 3 generates a virtual server, it also creates a hot spare on some other physical machine. Component failures, such as a CPU burning out, can take a virtual server down in a fraction of a second. But a node controller in the Eucalyptus cloud is watching for such pending failures. When it detects one, the state of operation and active data are snatched away from the failed component and plugged into the hot spare, explained Rich Wolski, CTO of Eucalyptus, in an interview. Wolski is the head of the University of California at Santa Barbara project that produces open source versions of Amazon APIs.
Can such an approach lead to four nines or five nines of guaranteed uptime, 99.999% of the time?
Eucalyptus 3 gives its implementer the option of establishing more than one hot spare. A further guarantee of continuous operation would be to establish two, one in a geographic location more distant from the primary data center than first hot spare. That way, if a fire or other calamity destroyed two running versions of the virtual server, a third one would pick up its work.
"The more redundancy you have, the more availability you have," said Mickos. The base configuration is to establish one hot spare for each virtual server per rack, since the rack's power supply is a possible point of component failure. In that scenario, the spare would run on a neighboring rack. Likewise, another possible point of failure is the rack's network switch. A hot spare on a neighboring rack offers redundancy for it as well.
Several other young companies, such as Nimbula, are working on private cloud architectures that avoid Amazon's constraints and implement more general purpose APIs and an ability to work with multiple cloud hypervisors. But instead of moving away from Amazon, Eucalyptus 3 moves a little closer.
It also contains the ability to use information derived through the Amazon's identity and access management API so that user privileges in both the public cloud and on-premises are the same. And Eucalyptus 3 includes the ability to boot a virtual machine from an image stored in Amazon's Elastic Block Store. The EC2 cloud itself starts with a fresh image built from scratch, with nothing retained from the server's previous run. By being able to boot from EBS, customers don't need to redo all the configurations and adjustments they made to their virtual server to get it to run the way they wanted the last time.
"Customers want the image to persist in the state in which it was closed down" to maintain compatibility with their legacy applications. "That adds a lot of convenience," Mickos said.
InformationWeek Analytics has published a report on backing up VM disk files and building a resilient infrastructure that can tolerate hardware and software failures. After all, what's the point of constructing a virtualized infrastructure without a plan to keep systems up and running in case of a glitch--or outright disaster? Download the report now. (Free registration required.)