Cloud // Infrastructure as a Service
News
8/22/2013
00:28 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon Outage Leaves Latency Mystery

A 49-minute outage at Amazon's retail operation appears to have slowed AWS services in Dublin, Ireland, speeded up others, according to monitoring service.

VMware Vs. Microsoft: 8 Cloud Battle Lines
VMware Vs. Microsoft: 8 Cloud Battle Lines
(click image for larger view and for slideshow)
Amazon has declined to comment on these operational details, so observers are left to speculate. One possible explanation is that Dublin serves as a backup site to Amazon's Ashburn, Va., service site. A problem in Northern Virginia, Amazon's most heavily trafficked site, leads to work being shifted east to Dublin, and the impact showed up in Dublin's AWS cloud services, such as EC2 and S3. They remained running but slowed with the higher latencies. Meanwhile, all Amazon's U.S. sites, including Ashburn, Va., and the two U.S. West sites showed a slight speed up during the Amazon.com North American outage. So did other Amazon sites around the world.

Part of the explanation has to be the most obvious fact: With Amazon.com retail down, the firm's data centers were freed of one of their major workloads -- retail operations -- and applied more networking and processing power to the remaining cloud services work.

The exception, of course, is Dublin, where the cloud work slowed as the retail trouble developed. That fact suggests Dublin shares in load balancing with Ashburn, or possibly is the primary backup if something goes awry with services in Ashburn. That's a hunch, not a conclusion or anything clearly established by the facts.

But one thing does seem clear. There appears to be a relationship between the efficiency of AWS cloud services and the health of Amazon.com retail. When there's trouble with Amazon retail, that relationship might make the cloud services faster or slower, depending on which data centers are backing each other up or in other ways dependent on each other's operations.

At first glance, the 49-minute outage of Amazon.com retail Monday would appear to be completely unrelated to the higher latencies in Dublin that rose and fell over a 12-hour period. But as Amazon's 2011 Easter outage showed, once something goes wrong in a cloud data center, automated corrective actions kick in that in themselves impose a heavy processing burden. What was termed "a re-mirroring storm," meant to fix the seeming disappearance of customer data sets, tied up systems and crippled services far longer than did the human error that set off the storm in the first place.

Some similar event, less drastic in nature, caused Amazon's all-important retail portal to go dark for 49 minutes. For unexplained reasons, that appears to have affected Amazon's Dublin operations by imposing a latency penalty, which slowed its cloud services.

On such slender evidence, enterprise IT managers are trying to make decisions on the safest ways to deploy their workloads to the Amazon cloud. A forthright explanation by Amazon of the outage, now three days old, would help them with that task.

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
ChrisMurphy
50%
50%
ChrisMurphy,
User Rank: Author
8/23/2013 | 1:45:09 PM
re: Amazon Outage Leaves Latency Mystery
It's a stark reminder of the challenges of ecommerce and customer-facing systems, which more business-to-business companies see as key to their future as digital businesses. I read a great nugget of advice from a CIO yesterday for an upcoming article we're working on, about a big ecommerce initiative: "Don't approach it as something that you'll get done quickly and go on to your next project."
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:11:59 AM
re: Amazon Outage Leaves Latency Mystery
For the record, InformationWeek's initial presentation of this piece suggested in the subhead that AWS "data centers" slowed with the outage. On the contrary, they sped up slightly, about a 10 millisecond reduction in their response times. Only one showed increased latencies of 60 milliseconds, Amazon's Dublin, Ireland, site, according to Cedexis. Just to be clear.
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:05:05 AM
re: Amazon Outage Leaves Latency Mystery
Amazon.com would be careful in formulating an explanation if it's caused down time for Fruit of Loom and 499 other web stores that it hosts, as Veteran Engineer suggests. Amazon's statement would bear on the amount of responsibility it's willing to assume for their downtime. Hmmm..
Engineer Veteran
50%
50%
Engineer Veteran,
User Rank: Apprentice
8/22/2013 | 10:54:26 PM
re: Amazon Outage Leaves Latency Mystery
One point I think folks are missing. This just wasn't Amazon.com. Their entire web ecommerce processing was down so it impacted 500+ sites that are hosted by Amazon webstore (MTV, Fruit of the Loom, Fiskars, etc, etc) Their ecommerce platform is sold as a cloud service.
OtherJimDonahue
50%
50%
OtherJimDonahue,
User Rank: Apprentice
8/22/2013 | 4:55:16 PM
re: Amazon Outage Leaves Latency Mystery
It seems crazy to me that Amazon wouldn't at least be forthcoming on the time it was out. Agree with Lorna that it's probably still investigating the root cause--but not being open about the period of time is just going to fuel speculation.
MarciaNWC
50%
50%
MarciaNWC,
User Rank: Author
8/22/2013 | 3:08:50 PM
re: Amazon Outage Leaves Latency Mystery
Amazon's silence is frustrating; even if it's still trying to figure out the root cause it seems the company could offer up some details about the outage and provide some advice to enterprise IT managers.
Laurianne
50%
50%
Laurianne,
User Rank: Author
8/22/2013 | 2:57:44 PM
re: Amazon Outage Leaves Latency Mystery
When was the last time Amazon's retail operation was down for that long?
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
8/22/2013 | 2:18:44 PM
re: Amazon Outage Leaves Latency Mystery
How likely is it that Amazon just doesn't yet know for sure what the root cause was? In such a fantastically complex infrastructure, such analysis must surely take time.
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and community news at InformationWeek.com.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.