Cloud // Infrastructure as a Service
News
12/27/2012
11:10 AM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Amazon Outage Scrooges Netflix, Heroku

Amazon Web Services suffers a second holiday services disruption and the fourth outage of 2012.

7 Dumb Cloud Computing Myths
7 Dumb Cloud Computing Myths
(click image for larger view and for slideshow)
What is it with Amazon Web Services (AWS) and holidays? First came the massive Easter weekend service outage of 2011. Now it's the Christmas Eve outage of 2012, which left millions of Netflix customers unable stream video on a high-demand night for movie viewing.

Netflix customer complaints lit up social networks on Christmas Eve, but the video service could only point a finger of blame at AWS, its cloud services provider. Amazon offered little explanation, but a "status history" report for 12/24 on the Amazon Service Health Dashboard shows "performance issues" affected Amazon's Northern Virginia data center.

The outage hit Netflix viewers from Canada to Brazil. It also affected Amazon's own Amazon Prime video-streaming service and Salesforce.com's Heroku cloud platform, which served up HTTP errors and ssl:endpoint unavailability messages during the outage.

Netflix reported that it was able to restore services to most of the affected consumers by late Christmas Eve. But that entailed a workaround that involved manually reassigning capacity to other Amazon data centers. Amazon reported that it took until the afternoon of Christmas Day to fix the problems at its Northern Virginia data center.

[ Want more on cloud foibles? Read Cloud Computing: Best And Worst News Of 2012. ]

The three specific AWS services affected were Amazon CloudWatch, EC2 and Elastic Beanstalk. CloudWatch provides monitoring for AWS cloud services and apps. EC2 is the Elastic Compute Cloud that provides on-demand compute capacity. Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling and application health monitoring. AWS offers built-in redundancy for all these services by way of multiple data centers and availability zones around the globe, but it's clear that provisions for automatic failover went down along with the CloudWatch and Beanstalk services.

The latest incident marks the fourth AWS outage in 2012. June 14 and June 29 disruptions were tied to power outages while a less-serious October 22 incident involved the vendor's Elastic Block Storage Service.

Amazon's Easter outage of 2011 still ranks as one of the service provider's worst disruptions, as multiple availability zones went down and some customers took days to recover. The outage was ultimately blamed on human error.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
PJS880
50%
50%
PJS880,
User Rank: Apprentice
1/6/2013 | 12:25:39 AM
re: Amazon Outage Scrooges Netflix, Heroku
It is pretty cheesy that Netflix did not fess up and held themselves accountable for the outage, instead of scapegoating AWS. On the other hand how many more times does this have to happen in order for AWS to not let it happen again. Yea sure they can hold some of the responsibility but Netflix should have made a better judgement call when not accepting the blame themselves.

Paul Sprague
InformationWeek Contributor
KLC
50%
50%
KLC,
User Rank: Apprentice
1/2/2013 | 7:25:07 PM
re: Amazon Outage Scrooges Netflix, Heroku
The article states: "Netflix customer complaints lit up social networks on Christmas Eve, but the video service could only point a finger of blame at AWS, its cloud services provider."

I disagree that Netflix can blame AWS. The big public cloud providers do not include a foolproof backup/high availability guarantee. It is up to the customers to architect, design and plan an effective backup solution. Granted, AWS misleads their clients with their "Availability Zones", but, as the repeated outages last year have shown, those are no guarantee. Netflix customers pay Netflix, NOT AWS. Netflix can only blame themselves for putting their business critical functions in a public cloud, especially given the track record of public cloud.

On a completely different tract, why are the big cloud providers having such struggles? Could it be that they have allowed themselves to become too big and unwieldy, so that the complexity has become their enemy? Maybe they should COMPLETELY isolate those availability zones, including all of the ancilliary support mechanisms, possibly even assigning dedicated support staffing to each. From what I have seen, most of the outages that have been publicized and explained have been examples of shooting themselves in the foot. Even the severe thunderstorm outage, where, for some reason, they didn't switch to their diesel backup generators BEFORE the storm hit. Are these pulbic cloud providers truly ready for prime time? Not in my book. I believe this is why there is so much interest in private clouds.
ANON1242240099751
50%
50%
ANON1242240099751,
User Rank: Apprentice
12/28/2012 | 3:47:39 PM
re: Amazon Outage Scrooges Netflix, Heroku
Thanks for pointing out those mistakes (no more eggnog for the copyeditors). They've been fixed.

Paul Travis
Managing Editor
InformationWeek.com
Darr247
50%
50%
Darr247,
User Rank: Apprentice
12/27/2012 | 8:35:11 PM
re: Amazon Outage Scrooges Netflix, Heroku
The service was "affected" not 'effected' (and impacted ain't thy bestest alt-text, neither).
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - July 21, 2014
Our new survey shows fed agencies focusing more on security, as they should, but they're still behind the times with cloud and overall innovation.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.