Amazon EC2 Outage Hobbles Websites - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Infrastructure as a Service
News
4/21/2011
03:32 PM
Connect Directly
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Amazon EC2 Outage Hobbles Websites

Engine Yard, Foursquare, Hootsuite, Heroku, Quora, and Reddit were among the websites that suffered from slowed or disabled access.

Slideshow: Amazon's Case For Enterprise Cloud Computing
Slideshow: Amazon's Case For Enterprise Cloud Computing
(click image for larger view and for full slideshow)
Amazon Web Services' Elastic Compute Cloud, which offers computation as a service to thousands of businesses, and its Relational Database Service, began experiencing errors shortly before 2 a.m. PDT on Thursday at Amazon's US-EAST data center in Virginia and the service interruption has been ongoing for more than nine hours now.

The technical problems have slowed or disabled access to the websites of customers utilizing AWS US-East resources, including Engine Yard, Foursquare, Hootsuite, Heroku, Quora, and Reddit, to name a few.

Shortly before noon PDT on Thursday, Reddit displayed a notice saying the discussion site is "is in 'emergency read-only mode' right now because Amazon is experiencing a degradation. They are working on it but we are still waiting for them to get to our volumes."

Hootsuite, Foursquare, and Quora displayed similar messages, while Heroku was inaccessible.

Amazon did not respond to a request for comment.

Engine Yard, a Ruby on Rails cloud service provider, was affected by the AWS outage, but Mike Piech, VP product management and marketing, said in an interview that the company weathered the storm because its business revolves around adding value to Amazon's cloud. As a service provider itself, the company has been working to limit the impact of a possible outage on clients by utilizing multiple Amazon data centers.

Engine Yard has been running EC2 instances exclusively out of Amazon's US-EAST facility, but the company has been beta testing multi-region availability to mitigate the risk of an outage. The goal is to host EC2 instances out of AWS facilities on the West Coast, in Europe, and Asia. As a result of the outage, Engine Yard accelerated its availability in other regions to help affected clients.

Piech insisted that hardware problems happen and the incident has not affected his company's interest in working with AWS.

The outage lit up the AWS customer support forum. An individual posting under the name "elephantdrive," which also is the name of a cloud storage service running atop AWS, echoed the frustration expressed by many other forum users that communication about the outage has been inadequate.

"We certainly understand that no operational infrastructure will be immune from downtime," said the person posting under the name elephantdrive. "We just want some estimate as to when the issue will be resolved. The Health page describes a problem and steps to resolution, but provides no estimates. We need some information to try to make business decisions."

Indeed, the sentiment expressed by many AWS customers is that the issue isn't so much about downtime, which happens, as it is about inadequate communication about the downtime.

Yet not everyone was so sanguine about the cloud. Jimmy Tam, general manager of Peer Software, a data backup and enterprise collaboration company, argued in an interview that outsourcing IT infrastructure to cloud service providers isn't the right choice for a lot of enterprise customers.

He cited global network performance as a major issue. "The cloud can be good for offices that have great bandwidth, but a lot of areas in the world don't have that," he said.

Tam pointed to one of his company's clients, a swimwear company that creates its designs in Los Angeles and runs its production in China. Getting design files uploaded and downloaded can take hours, he said, owning to the large file sizes and poor network bandwidth. "The cloud doesn't have sophisticated design software," he said. "You design on the desktop."

Outages like the one experienced by AWS present problems too. "If the file is local, I'm not worried about lost Internet connectivity," he said. "If you have an outage, that means everybody who is connected to the cloud can't have access." And he also insisted that data loss remains a possibility.

And he pointed to the risk that cloud service providers may choose to discontinue certain services, as Iron Mountain recently did. That leaves IT teams scrambling to come up with alternatives. "The cloud in theory is great," he said. But I don't think any cloud provider has solved all of these issues."

As of 10:35 a.m. PDT, Amazon finally had some good news to share. "We are making progress on restoring access and IO latencies for affected RDS instances," the company said. "We recommend that you do not attempt to recover using Reboot or Restore database instance APIs or try to create a new user snapshot for your RDS instance--currently those requests are not being processed."

However, the outage looks as if it will trigger service credit under Amazon's 99.95% Service Level Agreement. With 8,760 hours in a year, AWS can be inaccessible for 4.38 hours annually under that agreement.

AWS's S3 service suffered an eight hour failure back in July 2008. At the time, the company said that "any downtime is unacceptable and we won't be satisfied until [AWS] is perfect."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
News
What Comes Next for the COVID-19 Computing Consortium
Joao-Pierre S. Ruth, Senior Writer,  11/24/2020
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Slideshows
Flash Poll