News

Amazon Cloud Outage Proves Importance Of Failover Planning

Charles Babcock
Editor At Large, InformationWeek

Contingency plans kept Bizo and Mashery up and running during the Amazon service outage, offering lessons to other cloud-based businesses.

In the aftermath of the Amazon cloud service outage last week, two San Francisco businesses that depend on Amazon's EC2, Bizo and Mashery, say it's possible to survive such a mishap without business disruption.

But in both cases, they had taken steps to protect their businesses. Bizo resorted to a practice that many observers were left wondering why Amazon itself hadn't adopted--the ability of a system in one data center to be shifted to another in a separate, geographic location.


More Cloud Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Amazon's recommendation is for a customer to generate an instance of a server running a workload in one availability zone of its data center to have a carbon copy, perhaps running at the same time, in another. An availability zone has never been precisely defined by Amazon, but they are distinct operating sections within a data center. One zone is believed to have power and telecommunications services separate from other zones.

The best protection against an outage, according to Amazon guidance, is to establish a mirrored instance, running the same logic and data as the original. But doing so adds to the cost of cloud computing. You're paying for two server instances instead of one. You must also pay by the gigabyte to move data from one availability zone to another.

Those who incur these charges believe they have set up protection for themselves in the event of an outage in their primary zone. But in the early morning hours of April 21, as the Amazon Elastic Block Store (EBS) and Relational Database Service (RDS) began to fail in one availability zone of Amazon's Northern Virginia U.S. East-1 data center, they faltered and also began to fail in the three others.

Oren Michels, CEO of Mashery, and Donnie Flood, VP of engineering at Bizo, know all about that set of failures. They had taken Amazon's recommended steps, but fortunately they were also able to take additional steps beyond Amazon's recommendations.

Flood said Bizo's Web-based business marketing platform uses both U.S. East-1 and Amazon's second North American data center in Northern California. As a matter of fact, Bizo uses two availability zones in each center to protect against an outage.

On April 21, Flood was on a trip and asleep in Denver when his phone started issuing alerts around 2:30 a.m. Rocky Mountain Time. Thirty-five minutes earlier, the RDS and EBS services that power the Bizo applications in U.S. East-1 had started having problems and the AWS Services Health Dashboard was about to issue its first notice of something going awry.

Flood couldn't at first believe that one set of failures was serious but the alerts continued to pour in with disturbing regularity. U.S. East-1 is an important data center to Bizo because it hosts more traffic there than in Northern California. As best as Flood could tell in the middle of the night, the problem that started in one of the data center's availability zones was spreading, impairing Bizo's operations.

"U.S. East is our main region. I was surprised by the spread of trouble into the additional zones. That goes against what is expected," said Flood in an interview.

Page 2: Rapid Failover Is Essential To Continuity
 1 | 2  | Next Page » 

Related Reading


Informationweek Discussions

Start the Discussion


InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Subscribe to RSS

Resource Links