Cloud // Infrastructure as a Service
News
8/22/2013
00:28 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

Amazon Outage Leaves Latency Mystery

A 49-minute outage at Amazon's retail operation appears to have slowed AWS services in Dublin, Ireland, speeded up others, according to monitoring service.

VMware Vs. Microsoft: 8 Cloud Battle Lines
VMware Vs. Microsoft: 8 Cloud Battle Lines
(click image for larger view and for slideshow)
Amazon has declined to comment on these operational details, so observers are left to speculate. One possible explanation is that Dublin serves as a backup site to Amazon's Ashburn, Va., service site. A problem in Northern Virginia, Amazon's most heavily trafficked site, leads to work being shifted east to Dublin, and the impact showed up in Dublin's AWS cloud services, such as EC2 and S3. They remained running but slowed with the higher latencies. Meanwhile, all Amazon's U.S. sites, including Ashburn, Va., and the two U.S. West sites showed a slight speed up during the Amazon.com North American outage. So did other Amazon sites around the world.

Part of the explanation has to be the most obvious fact: With Amazon.com retail down, the firm's data centers were freed of one of their major workloads -- retail operations -- and applied more networking and processing power to the remaining cloud services work.

The exception, of course, is Dublin, where the cloud work slowed as the retail trouble developed. That fact suggests Dublin shares in load balancing with Ashburn, or possibly is the primary backup if something goes awry with services in Ashburn. That's a hunch, not a conclusion or anything clearly established by the facts.

But one thing does seem clear. There appears to be a relationship between the efficiency of AWS cloud services and the health of Amazon.com retail. When there's trouble with Amazon retail, that relationship might make the cloud services faster or slower, depending on which data centers are backing each other up or in other ways dependent on each other's operations.

At first glance, the 49-minute outage of Amazon.com retail Monday would appear to be completely unrelated to the higher latencies in Dublin that rose and fell over a 12-hour period. But as Amazon's 2011 Easter outage showed, once something goes wrong in a cloud data center, automated corrective actions kick in that in themselves impose a heavy processing burden. What was termed "a re-mirroring storm," meant to fix the seeming disappearance of customer data sets, tied up systems and crippled services far longer than did the human error that set off the storm in the first place.

Some similar event, less drastic in nature, caused Amazon's all-important retail portal to go dark for 49 minutes. For unexplained reasons, that appears to have affected Amazon's Dublin operations by imposing a latency penalty, which slowed its cloud services.

On such slender evidence, enterprise IT managers are trying to make decisions on the safest ways to deploy their workloads to the Amazon cloud. A forthright explanation by Amazon of the outage, now three days old, would help them with that task.

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
8/22/2013 | 2:18:44 PM
re: Amazon Outage Leaves Latency Mystery
How likely is it that Amazon just doesn't yet know for sure what the root cause was? In such a fantastically complex infrastructure, such analysis must surely take time.
Laurianne
50%
50%
Laurianne,
User Rank: Author
8/22/2013 | 2:57:44 PM
re: Amazon Outage Leaves Latency Mystery
When was the last time Amazon's retail operation was down for that long?
MarciaNWC
50%
50%
MarciaNWC,
User Rank: Author
8/22/2013 | 3:08:50 PM
re: Amazon Outage Leaves Latency Mystery
Amazon's silence is frustrating; even if it's still trying to figure out the root cause it seems the company could offer up some details about the outage and provide some advice to enterprise IT managers.
OtherJimDonahue
50%
50%
OtherJimDonahue,
User Rank: Apprentice
8/22/2013 | 4:55:16 PM
re: Amazon Outage Leaves Latency Mystery
It seems crazy to me that Amazon wouldn't at least be forthcoming on the time it was out. Agree with Lorna that it's probably still investigating the root cause--but not being open about the period of time is just going to fuel speculation.
Engineer Veteran
50%
50%
Engineer Veteran,
User Rank: Apprentice
8/22/2013 | 10:54:26 PM
re: Amazon Outage Leaves Latency Mystery
One point I think folks are missing. This just wasn't Amazon.com. Their entire web ecommerce processing was down so it impacted 500+ sites that are hosted by Amazon webstore (MTV, Fruit of the Loom, Fiskars, etc, etc) Their ecommerce platform is sold as a cloud service.
cbabcock
50%
50%
cbabcock,
User Rank: Author
8/23/2013 | 1:05:05 AM
re: Amazon Outage Leaves Latency Mystery
Amazon.com would be careful in formulating an explanation if it's caused down time for Fruit of Loom and 499 other web stores that it hosts, as Veteran Engineer suggests. Amazon's statement would bear on the amount of responsibility it's willing to assume for their downtime. Hmmm..
cbabcock
50%
50%
cbabcock,
User Rank: Author
8/23/2013 | 1:11:59 AM
re: Amazon Outage Leaves Latency Mystery
For the record, InformationWeek's initial presentation of this piece suggested in the subhead that AWS "data centers" slowed with the outage. On the contrary, they sped up slightly, about a 10 millisecond reduction in their response times. Only one showed increased latencies of 60 milliseconds, Amazon's Dublin, Ireland, site, according to Cedexis. Just to be clear.
ChrisMurphy
50%
50%
ChrisMurphy,
User Rank: Author
8/23/2013 | 1:45:09 PM
re: Amazon Outage Leaves Latency Mystery
It's a stark reminder of the challenges of ecommerce and customer-facing systems, which more business-to-business companies see as key to their future as digital businesses. I read a great nugget of advice from a CIO yesterday for an upcoming article we're working on, about a big ecommerce initiative: "Don't approach it as something that you'll get done quickly and go on to your next project."
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.