Cloud // Infrastructure as a Service
News
8/22/2013
00:28 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon Outage Leaves Latency Mystery

A 49-minute outage at Amazon's retail operation appears to have slowed AWS services in Dublin, Ireland, speeded up others, according to monitoring service.

8 Great Cloud Storage Services
8 Great Cloud Storage Services
(click image for larger view and for slideshow)
The outage that beset Amazon.com's retail home page lasted longer than some observers first believed. Reports have put the outage variously at 15 minutes, 25 minutes or "just under a half an hour," as Forbes reported soon after the incident.

In fact, it lasted 49 minutes, according to a monitoring service at Compuware, the owner of CloudSleuth cloud service monitoring and the Gomez Web application performance monitoring system, now part of the Compuware APM service. Despite inquiries, Amazon.com spokesmen have remained silent on the cause of the incident Monday and its duration. News reports have been sketchy.

Compuware staff double checked the Monday incident that saw the Amazon.com home page going dark with customers getting an "Oops" message around noon Pacific time and not becoming available again until 49 minutes later. Only North American users appear to have lost service. Europe and other parts of the world were unaffected, contrary to an earlier report.

Amazon Web Services continued as usual, suggesting something went wrong with Amazon.com's ecommerce software. Retail operations depend on the same infrastructure as Amazon Web Services for cloud users, although the two were once separate. Both emanate from the same data centers. One AWS cloud service, the AWS service management console for customers, became inaccessible about the same time as the Amazon.com home site. The console worked for those already logged in, but non-logged-in customers were denied access, according to Amazon's own Service Health Dashboard during a 47-minute period between 11:45 a.m. and 12:32 p.m., a near match for Compuware's observation of the retail Amazon.com outage. But unlike the retail site, the user management console was also inaccessible to users in Europe, Asia-Pacific and South America as well as North America, according to the AWS health dashboard.

One of people who noticed it was inaccessible was Forrester Research's lead cloud analyst, James Staten, who tweeted: "can't manage #AWS from the console -- outage. 12:58 p.m. Pacific."

With no explanation from Amazon forthcoming, it's hard to know what these seeming unrelated events mean. But another interesting set of facts come from a second online cloud monitoring service, Cedexis Radar.

[ Want more on how Amazon's built-in protective procedures disrupted AWS operations? See Post Mortem: When Amazon's Cloud Turned On Itself. ]

Cedexis Radar was able to observe increased latency, or a slowdown in throughput, at Amazon Web Service's primary European traffic center in Dublin, Ireland. The slowdown built up to about 60 milliseconds of added response time, not crippling but a noticeable, unwanted increase to most cloud services.

At the same time, Cedexis Radar also recorded a speed up in AWS operations at all other Amazon data center sites, such as Amazon West in northern California and Oregon and Amazon South America. The cloud service slowdown in Dublin started at about the time of the Amazon.com outage in North America, built to its peak seven hours later, then tapered off five hours later at midnight. The curve marking the time period for this latency build up closely matched curves showing the speedier responses at the other Amazon sites.

How could Dublin's slowdown occur when response times appear to have improved at all the other sites? Cedexis did not list every Amazon site, such as Hong Kong, Australia and Japan; it lumped them together into Asia Pacific. But the pattern held for the major regions listed: The improvements in latencies at Amazon sites -- except Dublin -- show a more muted curve but one of similar length. The sites show their shortest latencies about 7 p.m. Pacific, then return to normal by midnight.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
ChrisMurphy
50%
50%
ChrisMurphy,
User Rank: Author
8/23/2013 | 1:45:09 PM
re: Amazon Outage Leaves Latency Mystery
It's a stark reminder of the challenges of ecommerce and customer-facing systems, which more business-to-business companies see as key to their future as digital businesses. I read a great nugget of advice from a CIO yesterday for an upcoming article we're working on, about a big ecommerce initiative: "Don't approach it as something that you'll get done quickly and go on to your next project."
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:11:59 AM
re: Amazon Outage Leaves Latency Mystery
For the record, InformationWeek's initial presentation of this piece suggested in the subhead that AWS "data centers" slowed with the outage. On the contrary, they sped up slightly, about a 10 millisecond reduction in their response times. Only one showed increased latencies of 60 milliseconds, Amazon's Dublin, Ireland, site, according to Cedexis. Just to be clear.
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:05:05 AM
re: Amazon Outage Leaves Latency Mystery
Amazon.com would be careful in formulating an explanation if it's caused down time for Fruit of Loom and 499 other web stores that it hosts, as Veteran Engineer suggests. Amazon's statement would bear on the amount of responsibility it's willing to assume for their downtime. Hmmm..
Engineer Veteran
50%
50%
Engineer Veteran,
User Rank: Apprentice
8/22/2013 | 10:54:26 PM
re: Amazon Outage Leaves Latency Mystery
One point I think folks are missing. This just wasn't Amazon.com. Their entire web ecommerce processing was down so it impacted 500+ sites that are hosted by Amazon webstore (MTV, Fruit of the Loom, Fiskars, etc, etc) Their ecommerce platform is sold as a cloud service.
OtherJimDonahue
50%
50%
OtherJimDonahue,
User Rank: Apprentice
8/22/2013 | 4:55:16 PM
re: Amazon Outage Leaves Latency Mystery
It seems crazy to me that Amazon wouldn't at least be forthcoming on the time it was out. Agree with Lorna that it's probably still investigating the root cause--but not being open about the period of time is just going to fuel speculation.
MarciaNWC
50%
50%
MarciaNWC,
User Rank: Author
8/22/2013 | 3:08:50 PM
re: Amazon Outage Leaves Latency Mystery
Amazon's silence is frustrating; even if it's still trying to figure out the root cause it seems the company could offer up some details about the outage and provide some advice to enterprise IT managers.
Laurianne
50%
50%
Laurianne,
User Rank: Author
8/22/2013 | 2:57:44 PM
re: Amazon Outage Leaves Latency Mystery
When was the last time Amazon's retail operation was down for that long?
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
8/22/2013 | 2:18:44 PM
re: Amazon Outage Leaves Latency Mystery
How likely is it that Amazon just doesn't yet know for sure what the root cause was? In such a fantastically complex infrastructure, such analysis must surely take time.
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.