Cloud // Infrastructure as a Service
News
10/23/2012
12:17 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon Outage: Multiple Zones A Smart Strategy

Amazon Web Services loses one availability zone Monday at its most heavily used site.

Traffic in Amazon Web Services' most heavily used data center complex, U.S. East-1 in Northern Virginia, was tied up by an outage in one of its availability zones Monday morning. Damage control got underway immediately but the effects of the outage were felt throughout the day.

Customers were affected shortly after noon Eastern Time, when they were unable to access Amazon's Elastic Beanstalk scaling and Elastic Block Store service, which holds frequently accessed data used by hosted applications such as Salesforce.com's Heroku cloud platform, Pinterest, and news aggregator Reddit. Netflix, Github, Minecraft, Airbnb, FastCompany, and FourSquare also reported that they had been affected.

"We are currently experiencing degraded performance for EBS volumes in a single Availability Zone in the US-EAST-1 Region. New launches for EBS backed instances are failing and instances using affected EBS volumes will experience degraded performance," Amazon's Service Health Dashboard reported at 11:26 a.m. Monday.

Other services, such as Amazon's Relational Database Service, depend heavily on EBS.

Teacher forum and education site Edmodo.com noted that its servers were unavailable in a Twitter posting at 2:20 p.m: "Update: The site is still down. This is a server issue related to Amazon and we will update as soon as we have more info."

[ Want to learn more about how Amazon's availability zones work? See Inside One Amazon Customer's Zone Defense. ]

Sites that operate on a strict budget often take advantage of the minimal infrastructure costs associated with Amazon cloud services and operate in only one availability zone. But an outage in one zone can sometimes affect the availability of some services in others, as seen in the Easter weekend outage in April 2011.

Savvy customers, such as Netflix, who've made a major investment in use of Amazon's EC2, can sometimes avoid service interruptions by using multiple zones. But as reported by NBC News, some Netflix regional services were affected by Monday's outage.

The outage started as a slowdown in response times and an increase in error message rates in the Elastic Block Store service in one availability zone. The site hosts five different zones, or virtual data centers, each with an independent source of telecommunications power and backup power. Some customers keep recovery copies of their systems in a second zone to provide a failover mechanism if one availability zone goes down.

Okta, an Amazon EC2-based identity management service, uses all five zones to hedge against outages. "If there's a sixth zone tomorrow, you can bet we'll be in it within a few days. We make use of every possible zone. We need to be up at all times," said Adam D'Amico, Okta's director of technical operations. Netflix service architect Adrian Cockcroft and others have advocated in public forums that customers use more than one zone for their own protection.

The trouble for Amazon persisted through the day. At 9:30 p.m. Eastern, its Health dashboard reported, "We are seeing elevated errors rates on APIs related to describing and associating EIP addresses. We are working to resolve these errors. In addition, ELB is experiencing elevated latencies recovering affected load balancers and making changes to existing load balancers. These delays… will improve when that issue is resolved."

At 10:36 p.m. Eastern, it added, "…we expect ELB to recover more quickly now." Most problems were cleared up by 1:30 a.m. Tuesday.

Most IT teams monitor website performance. It's time to extend that vigilance to all critical applications. Also in the new, all-digital Application Early Warning System issue of InformationWeek: While Oracle and SAP wage a war of words, they're ignoring the wishes of customers like Procter & Gamble. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
cmclennan452
50%
50%
cmclennan452,
User Rank: Apprentice
10/30/2012 | 9:37:55 PM
re: Amazon Outage: Multiple Zones A Smart Strategy
Hi Charles, great insight. At Ilesfay (cloud based replication startup) weG«÷ve never gone down even though weG«÷ve been using AWS (all regions) since 2009. FYI: Here are some of our key principles for building resilient cloud applications: http://www.ilesfay.com/cms/def...
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.