Cloud // Software as a Service
News
12/14/2009
08:07 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Amazon IDs Cause Of Data Center Outage

The failure of two power components at a Virginia data center affected some EC2 operations on December 9th, Amazon Web Services says.

Apparent Networks set up the monitoring service because it wanted to illustrate what its PathView Cloud could do for companies making use of cloud computing. It said it maintains 20 accounts in the data center that experienced the outage and six of them went down. Apparent Networks spokesmen were careful to say they have no way of knowing if their experience applied to the data center as a whole.

By using a network path to monitor the data center, Apparent Networks can see something that Hyperic's systems management system, Cloud Status. It tracked its own pinging and command traffic to a router in Northern Virginia where it stopped short of the virtual server that Apparent was running there. Amazon is known to operate a data center near McLean, Va., but company officials don't name specific locations in communications. Likewise, the Amazon Service Health Dashboard avoids naming locations beyond a region in which it might have several data centers. In this case it referred only to the US-East-1 region.

If a user of Apparent Networks PathView Cloud found evidence of a service outage, that user could match up that information with Amazon's own CloudWatch service or Hyperic's CloudStatus to see how his individual virtual machines were performing and learn more, noted Javier Soltero, CTO of management products at SpringSource, a unit of VMware.

"On the whole, Amazon is extremely consistent," said Soltero. That consistency isn't simply in operating data centers but in its willingness to report incidents to customers through the service dashboard. In this instance, however, "we saw a gap between the actual outage" and when the service notices started to appear. The gap was 34 minutes long, if Apparent Networks outage times are right, which is either a short time or an unbearably long time. Your view of the gap depends on whether you were running time-sensitive workloads or non-sensitive workloads, if you were an EC2 customer in the data center affected.

Amazon's incident notice language is also location non-specific. Customers can't tell from the notices whether they have a virtual machine running where the incident is taking place. They must either subscribe to Amazon's CloudWatch or a third party service, such as PathView Cloud or Cloud Status, that's looking at the cloud from the outside.

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
8 Steps to Modern Service Management
8 Steps to Modern Service Management
ITSM as we know it is dead. SaaS helped kill it, and CIOs should be thankful. Hereís what comes next.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.