Online dating site Whatsyourprice.com dumps EC2 following June outages, but other customers say they can architect against failure in the Amazon infrastructure.
"Build it and they will come. Build it right, and they will stay."
That's the slogan of Okta, an identity management service built on top of Amazon Web Services' EC2 cloud. Part of EC2 went down during a power outage that affected its Ashburn, Va., data center June 29. But Okta built its Amazon-based service across more than one availability zone, and it didn't experience any downtime.
In the aftermath of the outage, Whatsyourprice.com, an online dating service, also pondered those words. It had tried to build its application right, using two availability zones as Amazon advised, and it still faced an onslaught of customer complaints in the midst of the outage. "We received nearly a thousand complaints," a level of disruption that Whatsyourprice.com had never seen before, said CEO Brandon Wade in an interview.
Instagram, Quora, Heroku, Pinterest, Hootsuite, and Netflix customers also complained of their service not being available in Twitter posts and online forums.
For Whatsyourprice.com, the two hours of downtime proved to be the last straw. It had just experienced two hours of downtime in a preceding AWS outage June 14. The two hours that Whatsyourprice.com lost during the latter Friday evening event was during a normal period of strong user activity. Singles that frequent the site were setting up dates for the weekend. In many cases, they needed to access a phone number that becomes available when two parties agree to meet--and that crucial information disappeared at the worst possible time.
Wade didn't wait for Amazon's post mortem or an in-depth analysis from consultants. He moved his systems, running on 10 virtual servers in Amazon's infrastructure, onto equipment his firm bought and installed at a co-location service in the firm's hometown of Las Vegas. "Amazon is a very reputable company ... But we can't have all of these outages. For us, it's a big deal."
Wade will seek a second co-lo site in Las Vegas so that the loss of his servers at one site will not result in his website going off the air. It's a physical implementation of the logical architecture he tried to achieve on Amazon.
His firm used two availability zones in the Ashburn data center. Amazon is terse in its definition of an availability zone, but users understand them as distinct, logical data centers. Each zone has its own communications and power supply. One zone can go down and the others are supposed to continue operating.
Wade doesn't have a technical explanation for what happened to his website. Part of it depended on Zone 1B, and Cedexis, a cloud monitoring service, identified 1B as the zone that suffered the outage as a result of violent electrical storms in the northern Virginia area. But his operations were also in a second zone, and he knows they became inoperative as zone 1B went down.
"During the outage, my IT manager was not able to launch new instances in the availability zone that was supposedly not affected by the outage. Also, during the outage, he was unable to move any of the read-replicas of the master database ... So while only one zone may have been impacted, it appears other issues related to the AWS software led to our entire website being offline for the duration," said Wade in an email responding to follow up questions from InformationWeek.
Both June outages "rendered a total blackout of our website," he wrote.
This experience can be contrasted with Okta's, which continued running through Amazon's June 14 and 29 outages. It advertises that its online identity management service is continuously available through what it calls its zero downtime architecture. Eric Berg, Okta's VP of products, says in a blog that it's architected so that "any individual component can fail at any time and will simply be routed around to one of several other active systems."
That's necessary, Berg said in his blog post, Own Your Own Availability, because "downtime for our customers is unacceptable. As a result, we have made the software and operational investments necessary to provide a reliable service on top of AWS," he wrote.
It's not that Amazon's infrastructure is foolproof. It can and will fail, he warned. "Service providers need to make software and operational investments that allow their services to continue to run. Those investments are the responsibility of the vendor," Berg lectured in the blog.
Wade, who earned his bachelor's degree and MBA at MIT and once served as a Booz Allen consultant, uses the same reasoning to reach a different conclusion.
"While you can watch a movie tomorrow if you miss it today, dating is all about the serendipity of meeting the right person at the right time. If an online dating service is not available, a user may lose the chance to meet his or her soulmate forever."
Whatsyourprice.com will no longer use EC2 because of "Amazon's unpredictable data center issues," he said.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.