Cloud // Infrastructure as a Service
10:13 PM
Connect Directly
IoT & Insurance: How Big Data May Affect Your Privacy & Premiums
May 06, 2016
Gwenn Bezard is co-founder and research director at Aite Group, where he leads the insurance pract ...Read More>>

Amazon Cloud Outage Proves Importance Of Failover Planning

Contingency plans kept Bizo and Mashery up and running during the Amazon service outage, offering lessons to other cloud-based businesses.

In the aftermath of the Amazon cloud service outage last week, two San Francisco businesses that depend on Amazon's EC2, Bizo and Mashery, say it's possible to survive such a mishap without business disruption.

But in both cases, they had taken steps to protect their businesses. Bizo resorted to a practice that many observers were left wondering why Amazon itself hadn't adopted--the ability of a system in one data center to be shifted to another in a separate, geographic location.

Amazon's recommendation is for a customer to generate an instance of a server running a workload in one availability zone of its data center to have a carbon copy, perhaps running at the same time, in another. An availability zone has never been precisely defined by Amazon, but they are distinct operating sections within a data center. One zone is believed to have power and telecommunications services separate from other zones.

The best protection against an outage, according to Amazon guidance, is to establish a mirrored instance, running the same logic and data as the original. But doing so adds to the cost of cloud computing. You're paying for two server instances instead of one. You must also pay by the gigabyte to move data from one availability zone to another.

Those who incur these charges believe they have set up protection for themselves in the event of an outage in their primary zone. But in the early morning hours of April 21, as the Amazon Elastic Block Store (EBS) and Relational Database Service (RDS) began to fail in one availability zone of Amazon's Northern Virginia U.S. East-1 data center, they faltered and also began to fail in the three others.

Oren Michels, CEO of Mashery, and Donnie Flood, VP of engineering at Bizo, know all about that set of failures. They had taken Amazon's recommended steps, but fortunately they were also able to take additional steps beyond Amazon's recommendations.

Flood said Bizo's Web-based business marketing platform uses both U.S. East-1 and Amazon's second North American data center in Northern California. As a matter of fact, Bizo uses two availability zones in each center to protect against an outage.

On April 21, Flood was on a trip and asleep in Denver when his phone started issuing alerts around 2:30 a.m. Rocky Mountain Time. Thirty-five minutes earlier, the RDS and EBS services that power the Bizo applications in U.S. East-1 had started having problems and the AWS Services Health Dashboard was about to issue its first notice of something going awry.

Flood couldn't at first believe that one set of failures was serious but the alerts continued to pour in with disturbing regularity. U.S. East-1 is an important data center to Bizo because it hosts more traffic there than in Northern California. As best as Flood could tell in the middle of the night, the problem that started in one of the data center's availability zones was spreading, impairing Bizo's operations.

"U.S. East is our main region. I was surprised by the spread of trouble into the additional zones. That goes against what is expected," said Flood in an interview.

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
4 Trends Shaping Digital Transformation in Insurance
Insurers no longer have a choice about digital adoption if they want to remain relevant. A comprehensive enterprise-wide digital strategy is fundamental to doing business today.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of April 24, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.