Cloud // Infrastructure as a Service
News
7/2/2012
04:14 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon Outage Hits Netflix, Heroku, Pinterest, Instagram

Amazon Web Services data center loses power because of violent electrical storms, knocking out many website customers.

Amazon's 7 Cloud Advantages: Hype Vs. Reality
Amazon's 7 Cloud Advantages: Hype Vs. Reality
(click image for larger view and for slideshow)
Severe thunderstorms rolled through northern Virginia Friday evening and Amazon Web Services U.S. East data center, its largest, lost power at 8:40 p.m. The power was restored about nine minutes later but Netflix, Instagram, Pinterest, Heroku and other companies that depend on its infrastructure were affected.

"We're currently experiencing technical difficulties and we're working to correct the issues. Thanks for your patience," said Instagram via Twitter at 8:16 p.m. Pacific Friday. Instagram is the mobile phone photo sharing service recently acquired by Facebook for $1 billion.

Heroku told its users two minutes later: "Our automated systems have detected potential platform errors. We are investigating."

Amazon's Service Health Dashboard posted its first notice at 8:21 p.m. Pacific, saying AWS was "investigating connectivity issues for a number of instances" in its northern Virginia data center.

[ Amazon experienced a power outage two weeks before this one. See Amazon Defended After June 14 Power Outage. ]

Cedexis, the cloud monitoring service, said service was impacted by the power outage beginning at 8 p.m. Pacific (or 3 a.m. Saturday Greenwich Mean Time). It affected Amazon's availability zone 1B of its Elastic Compute Cloud service. 1B is one of what are believed to be four zones in Amazon's U.S. East-1 data center (Amazon doesn't state how many zones are in each facility) and zone 1B compute service was completely down an hour later, said a Cedexis spokesman.

Basic EC2 service was out in the one zone for about two hours, according to Cedexis, although individual virtual server instances and some Elastic Block Store volumes were out for a longer period.

At 8:40 p.m. Amazon's dashboard acknowledged to customers that "a single Availability Zone has lost power due to electrical storms in the area. We are actively working to restore power." At 9:01 p.m. Pacific, Amazon acknowledged that the loss of an availability zone in U.S. East–1 had affected its Elastic MapReduce service as well.

Netflix relies primarily on Amazon infrastructure for its film delivery service. At 10:15 p.m. Friday, Netflix tweeted: "We're sorry for the outage and working to get your Friday streaming back to normal as quickly as possible. Thank you for bearing with us." Netflix uses more than one availability zone, and has recovery plans in case of the loss of an availability zone. Nevertheless, the power outage affected some customers.

That tweet went out in response to user comments that Netflix wasn't able to deliver its service. One user commented that his Netflix services had been interrupted in the midst of a cliff hanger movie scene. "I was watching my favorite show & you guys screwed up," tweeted Amar Chugg.

"I hope it's fixed by tomorrow. I use Netflix everyday and I'm not a happy customer right now," tweeted Brian Morin @Brianm123.

At 11:03 p.m. Pacific time, Mrs. Terrell tweeted, "Hurry up!"

At 1:11 a.m. Pacific time Saturday, Netflix was able to tweet: "Everyone should be back up shortly, if you aren't already. Thanks again for being patient. And awesome."

Heroku posted the notice at 9:33 p.m. Pacific that its engineers had moved customers out of the AWS infrastructure onto new servers. At 10:35 p.m. Pacific, its status page posted the following update: "We've restored the majority of internal services and are seeing a reduction in error rates, but many applications and databases remain offline. We are continuing to work to restore processes and databases."

Amazon's dashboard warned that Elastic Block Store storage service had also been affected by the outage. While it continued running, affected EC2 customers who may have thought their instances were writing to storage would find later that wasn’t the case, or that data updated in one EBS volume might not have been updated in another.

Due to the power outage, "some EBS volumes may have inconsistent data. As we bring volumes back online, any affected volumes will have their status in the "Status Checks" column in the Volume list in the AWS console listed as "Impaired." If your instances or volumes are not available, please login to the AWS Management Console and perform" six steps to recover the data.

Amazon's last post occurred at 8:38 a.m. Pacific Saturday. "We are continuing our recovery efforts for the remaining EC2 instances and EBS volumes. We are beginning to successfully provision additional Elastic Load Balancers," it said as it neared the end of its recovery process.

Extending core virtualization concepts to storage, networking, I/O, and application delivery is changing the face of the modern data center. In the Pervasive Virtualization report, we discuss all these areas in the context of four main precepts of virtualization. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
Kerry Lebel
50%
50%
Kerry Lebel,
User Rank: Apprentice
7/3/2012 | 4:13:02 PM
re: Amazon Outage Hits Netflix, Heroku, Pinterest, Instagram
The outage at EC2 is neither the first or the last. Amazon is a leader in reliability and if it happens to them, it can and will happen to any service provider. The point is that enterprises need to take responsibility for planning their contingencies and workarounds. As enterprises get more serious about higher-level workflow automation, they will spend less time bemoaning outages and more time abstracting their processes from specific infrastructures and application environments.
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.