Amazon Web Services Apologizes, Explains Outage

The company said it will provide 10 days of service credit for customers using AWS resources in the affected region because of the multi-day outage.

Slideshow: Amazon's Case For Enterprise Cloud Computing
Slideshow: Amazon's Case For Enterprise Cloud Computing
(click image for larger view and for full slideshow)
Eight days after Amazon Web Services (AWS) experienced a major multi-day service outage in its East Coast region, the on-demand computing infrastructure company has published a detailed post-mortem and apologized.

The outage, AWS said, was triggered by "a network configuration change," which presumably means a specific manual mistake during some network adjustment. Human error, in other words.


More Cloud Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

The configuration change represented an effort to upgrade network capacity and involved a shift of network traffic off of one of the redundant routers in the primary Elastic Block Store (EBS). That shift, AWS explained, "was executed incorrectly." This left EBS volumes unable to find places to replicate their data and created a kind of endless loop, a "re-mirroring storm" as AWS put it.

For its failure, AWS apologized. "We know how critical our services are to our customers' businesses and we will do everything we can to learn from this event and use it to drive improvement across our services," the company said.

It also acknowledged that communication about such incidents can be improved, and promised to address communication gaps through increased staffing and better tools.

AWS has decided to award a 10-day service credit to all customers using EBS or the Amazon Relational Database Service (RDS) in the affected region, whether their operations were interrupted or not.

The outage affected a subset of customers using Amazon Elastic Compute Cloud (EC2, which provides on-demand computational processing), specifically those using certain Amazon EBS volumes associated with a group of data centers referred to as US East. Some customers using the RDS were also affected because RDS relies on EBS to store log files and database files.

Service problems began early Thursday, April 21, and were resolved by Monday, April 25, except for the data loss: On Monday, AWS acknowledged that 0.07% of the EBS volumes in the US East region were unrecoverable.

The outage slowed or shut down a significant number of prominent Internet businesses, including Engine Yard, Foursquare, Hootsuite, Heroku, Quora, and Reddit. Beyond that, it renewed doubts about the viability of cloud computing among skeptics.

Among those already sold on the cloud, the incident at least forced a re-evaluation of the risks of outsourced infrastructure and prompted further thought about disaster recovery planning.

In a blog post on Friday, Gartner Research VP Andrea Di Maio suggested that the outage makes clear that customers have to plan for imperfection. "While it is important to maintain pressure on service providers to improve their reliability footprint, the onus of developing or contracting reliable system stays with their clients, and there won't be any miraculous cloud that provides 100% uptime or that does not risk to fail meeting its own SLAs," he wrote.

In the new, all-digital issue of InformationWeek Government: More than half of federal agencies will use cloud computing within 12 months, our new survey finds. Security, ROI, and management challenges await them. Download it now. (Free registration required.)

Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Subscribe to RSS

Resource Links