Cloud // Infrastructure as a Service
09:38 PM
Connect Directly
Core System Testing: How to Achieve Success
Oct 06, 2016
Property and Casualty Insurers have been investing in modernizing their core systems to provide fl ...Read More>>

Amazon Cloud Outage Causes Customer To Leave

Online dating site dumps EC2 following June outages, but other customers say they can architect against failure in the Amazon infrastructure.

"Build it and they will come. Build it right, and they will stay."

That's the slogan of Okta, an identity management service built on top of Amazon Web Services' EC2 cloud. Part of EC2 went down during a power outage that affected its Ashburn, Va., data center June 29. But Okta built its Amazon-based service across more than one availability zone, and it didn't experience any downtime.

In the aftermath of the outage,, an online dating service, also pondered those words. It had tried to build its application right, using two availability zones as Amazon advised, and it still faced an onslaught of customer complaints in the midst of the outage. "We received nearly a thousand complaints," a level of disruption that had never seen before, said CEO Brandon Wade in an interview.

Instagram, Quora, Heroku, Pinterest, Hootsuite, and Netflix customers also complained of their service not being available in Twitter posts and online forums.

[ Learn more about Amazon Web Services' June 29 outage. See Amazon Outage Hits Netflix, Heroku, Pinterest, Instagram. ]

For, the two hours of downtime proved to be the last straw. It had just experienced two hours of downtime in a preceding AWS outage June 14. The two hours that lost during the latter Friday evening event was during a normal period of strong user activity. Singles that frequent the site were setting up dates for the weekend. In many cases, they needed to access a phone number that becomes available when two parties agree to meet--and that crucial information disappeared at the worst possible time.

Wade didn't wait for Amazon's post mortem or an in-depth analysis from consultants. He moved his systems, running on 10 virtual servers in Amazon's infrastructure, onto equipment his firm bought and installed at a co-location service in the firm's hometown of Las Vegas. "Amazon is a very reputable company ... But we can't have all of these outages. For us, it's a big deal."

Wade will seek a second co-lo site in Las Vegas so that the loss of his servers at one site will not result in his website going off the air. It's a physical implementation of the logical architecture he tried to achieve on Amazon.

His firm used two availability zones in the Ashburn data center. Amazon is terse in its definition of an availability zone, but users understand them as distinct, logical data centers. Each zone has its own communications and power supply. One zone can go down and the others are supposed to continue operating.

Wade doesn't have a technical explanation for what happened to his website. Part of it depended on Zone 1B, and Cedexis, a cloud monitoring service, identified 1B as the zone that suffered the outage as a result of violent electrical storms in the northern Virginia area. But his operations were also in a second zone, and he knows they became inoperative as zone 1B went down.

"During the outage, my IT manager was not able to launch new instances in the availability zone that was supposedly not affected by the outage. Also, during the outage, he was unable to move any of the read-replicas of the master database ... So while only one zone may have been impacted, it appears other issues related to the AWS software led to our entire website being offline for the duration," said Wade in an email responding to follow up questions from InformationWeek.

Both June outages "rendered a total blackout of our website," he wrote.

This experience can be contrasted with Okta's, which continued running through Amazon's June 14 and 29 outages. It advertises that its online identity management service is continuously available through what it calls its zero downtime architecture. Eric Berg, Okta's VP of products, says in a blog that it's architected so that "any individual component can fail at any time and will simply be routed around to one of several other active systems."

That's necessary, Berg said in his blog post, Own Your Own Availability, because "downtime for our customers is unacceptable. As a result, we have made the software and operational investments necessary to provide a reliable service on top of AWS," he wrote.

It's not that Amazon's infrastructure is foolproof. It can and will fail, he warned. "Service providers need to make software and operational investments that allow their services to continue to run. Those investments are the responsibility of the vendor," Berg lectured in the blog.

Wade, who earned his bachelor's degree and MBA at MIT and once served as a Booz Allen consultant, uses the same reasoning to reach a different conclusion.

"While you can watch a movie tomorrow if you miss it today, dating is all about the serendipity of meeting the right person at the right time. If an online dating service is not available, a user may lose the chance to meet his or her soulmate forever." will no longer use EC2 because of "Amazon's unpredictable data center issues," he said.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Multicloud Infrastructure & Application Management
Multicloud Infrastructure & Application Management
Enterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.