Cloud // Infrastructure as a Service
News
8/22/2013
00:28 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon Outage Leaves Latency Mystery

A 49-minute outage at Amazon's retail operation appears to have slowed AWS services in Dublin, Ireland, speeded up others, according to monitoring service.

VMware Vs. Microsoft: 8 Cloud Battle Lines
VMware Vs. Microsoft: 8 Cloud Battle Lines
(click image for larger view and for slideshow)
Amazon has declined to comment on these operational details, so observers are left to speculate. One possible explanation is that Dublin serves as a backup site to Amazon's Ashburn, Va., service site. A problem in Northern Virginia, Amazon's most heavily trafficked site, leads to work being shifted east to Dublin, and the impact showed up in Dublin's AWS cloud services, such as EC2 and S3. They remained running but slowed with the higher latencies. Meanwhile, all Amazon's U.S. sites, including Ashburn, Va., and the two U.S. West sites showed a slight speed up during the Amazon.com North American outage. So did other Amazon sites around the world.

Part of the explanation has to be the most obvious fact: With Amazon.com retail down, the firm's data centers were freed of one of their major workloads -- retail operations -- and applied more networking and processing power to the remaining cloud services work.

The exception, of course, is Dublin, where the cloud work slowed as the retail trouble developed. That fact suggests Dublin shares in load balancing with Ashburn, or possibly is the primary backup if something goes awry with services in Ashburn. That's a hunch, not a conclusion or anything clearly established by the facts.

But one thing does seem clear. There appears to be a relationship between the efficiency of AWS cloud services and the health of Amazon.com retail. When there's trouble with Amazon retail, that relationship might make the cloud services faster or slower, depending on which data centers are backing each other up or in other ways dependent on each other's operations.

At first glance, the 49-minute outage of Amazon.com retail Monday would appear to be completely unrelated to the higher latencies in Dublin that rose and fell over a 12-hour period. But as Amazon's 2011 Easter outage showed, once something goes wrong in a cloud data center, automated corrective actions kick in that in themselves impose a heavy processing burden. What was termed "a re-mirroring storm," meant to fix the seeming disappearance of customer data sets, tied up systems and crippled services far longer than did the human error that set off the storm in the first place.

Some similar event, less drastic in nature, caused Amazon's all-important retail portal to go dark for 49 minutes. For unexplained reasons, that appears to have affected Amazon's Dublin operations by imposing a latency penalty, which slowed its cloud services.

On such slender evidence, enterprise IT managers are trying to make decisions on the safest ways to deploy their workloads to the Amazon cloud. A forthright explanation by Amazon of the outage, now three days old, would help them with that task.

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
8/22/2013 | 2:18:44 PM
re: Amazon Outage Leaves Latency Mystery
How likely is it that Amazon just doesn't yet know for sure what the root cause was? In such a fantastically complex infrastructure, such analysis must surely take time.
Laurianne
50%
50%
Laurianne,
User Rank: Author
8/22/2013 | 2:57:44 PM
re: Amazon Outage Leaves Latency Mystery
When was the last time Amazon's retail operation was down for that long?
MarciaNWC
50%
50%
MarciaNWC,
User Rank: Author
8/22/2013 | 3:08:50 PM
re: Amazon Outage Leaves Latency Mystery
Amazon's silence is frustrating; even if it's still trying to figure out the root cause it seems the company could offer up some details about the outage and provide some advice to enterprise IT managers.
OtherJimDonahue
50%
50%
OtherJimDonahue,
User Rank: Apprentice
8/22/2013 | 4:55:16 PM
re: Amazon Outage Leaves Latency Mystery
It seems crazy to me that Amazon wouldn't at least be forthcoming on the time it was out. Agree with Lorna that it's probably still investigating the root cause--but not being open about the period of time is just going to fuel speculation.
Engineer Veteran
50%
50%
Engineer Veteran,
User Rank: Apprentice
8/22/2013 | 10:54:26 PM
re: Amazon Outage Leaves Latency Mystery
One point I think folks are missing. This just wasn't Amazon.com. Their entire web ecommerce processing was down so it impacted 500+ sites that are hosted by Amazon webstore (MTV, Fruit of the Loom, Fiskars, etc, etc) Their ecommerce platform is sold as a cloud service.
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:05:05 AM
re: Amazon Outage Leaves Latency Mystery
Amazon.com would be careful in formulating an explanation if it's caused down time for Fruit of Loom and 499 other web stores that it hosts, as Veteran Engineer suggests. Amazon's statement would bear on the amount of responsibility it's willing to assume for their downtime. Hmmm..
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
8/23/2013 | 1:11:59 AM
re: Amazon Outage Leaves Latency Mystery
For the record, InformationWeek's initial presentation of this piece suggested in the subhead that AWS "data centers" slowed with the outage. On the contrary, they sped up slightly, about a 10 millisecond reduction in their response times. Only one showed increased latencies of 60 milliseconds, Amazon's Dublin, Ireland, site, according to Cedexis. Just to be clear.
ChrisMurphy
50%
50%
ChrisMurphy,
User Rank: Author
8/23/2013 | 1:45:09 PM
re: Amazon Outage Leaves Latency Mystery
It's a stark reminder of the challenges of ecommerce and customer-facing systems, which more business-to-business companies see as key to their future as digital businesses. I read a great nugget of advice from a CIO yesterday for an upcoming article we're working on, about a big ecommerce initiative: "Don't approach it as something that you'll get done quickly and go on to your next project."
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.