Amazon Cloud Outage Proves Importance Of Failover Planning - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Cloud // Infrastructure as a Service
10:13 PM
Connect Directly

Amazon Cloud Outage Proves Importance Of Failover Planning

Contingency plans kept Bizo and Mashery up and running during the Amazon service outage, offering lessons to other cloud-based businesses.

In the aftermath of the Amazon cloud service outage last week, two San Francisco businesses that depend on Amazon's EC2, Bizo and Mashery, say it's possible to survive such a mishap without business disruption.

But in both cases, they had taken steps to protect their businesses. Bizo resorted to a practice that many observers were left wondering why Amazon itself hadn't adopted--the ability of a system in one data center to be shifted to another in a separate, geographic location.

Amazon's recommendation is for a customer to generate an instance of a server running a workload in one availability zone of its data center to have a carbon copy, perhaps running at the same time, in another. An availability zone has never been precisely defined by Amazon, but they are distinct operating sections within a data center. One zone is believed to have power and telecommunications services separate from other zones.

The best protection against an outage, according to Amazon guidance, is to establish a mirrored instance, running the same logic and data as the original. But doing so adds to the cost of cloud computing. You're paying for two server instances instead of one. You must also pay by the gigabyte to move data from one availability zone to another.

Those who incur these charges believe they have set up protection for themselves in the event of an outage in their primary zone. But in the early morning hours of April 21, as the Amazon Elastic Block Store (EBS) and Relational Database Service (RDS) began to fail in one availability zone of Amazon's Northern Virginia U.S. East-1 data center, they faltered and also began to fail in the three others.

Oren Michels, CEO of Mashery, and Donnie Flood, VP of engineering at Bizo, know all about that set of failures. They had taken Amazon's recommended steps, but fortunately they were also able to take additional steps beyond Amazon's recommendations.

Flood said Bizo's Web-based business marketing platform uses both U.S. East-1 and Amazon's second North American data center in Northern California. As a matter of fact, Bizo uses two availability zones in each center to protect against an outage.

On April 21, Flood was on a trip and asleep in Denver when his phone started issuing alerts around 2:30 a.m. Rocky Mountain Time. Thirty-five minutes earlier, the RDS and EBS services that power the Bizo applications in U.S. East-1 had started having problems and the AWS Services Health Dashboard was about to issue its first notice of something going awry.

Flood couldn't at first believe that one set of failures was serious but the alerts continued to pour in with disturbing regularity. U.S. East-1 is an important data center to Bizo because it hosts more traffic there than in Northern California. As best as Flood could tell in the middle of the night, the problem that started in one of the data center's availability zones was spreading, impairing Bizo's operations.

"U.S. East is our main region. I was surprised by the spread of trouble into the additional zones. That goes against what is expected," said Flood in an interview.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Why IT Leaders Should Make Cloud Training a Top Priority
John Edwards, Technology Journalist & Author,  4/14/2021
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Lessons I've Learned From My Career in Technology
Guest Commentary, Guest Commentary,  5/4/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll