Amazon Defended After June 14 Cloud Outage
Latest Amazon Web Services outage prompts complaints from critics, customers. But Amazon supporters say incident simply shows need to use multiple availability zones.
Heroku, Quora, Parse, and Pinterest were among the sites affected by the outage, along with many small companies that rely on Amazon as their source of compute power instead of a traditional data center. The latter prospect produced the wry comment: "With AWS having an outage, thousands of startups with literally dozens of customers are affected," tweeted Laurie Voss, technical lead at social media analytics firm, awe.sm, in San Francisco.
Amazon itself kept to a few cryptic comments on its Service Health Dashboard. It cited a power outage, but didn't say whether its cause lay with an electricity supplier or a failed component inside its facilities. It did say the outage affected part of one availability zone. Some U.S. East customers have discussed four availability zones being available at the Nothern Virginia site. An availability zone has separate power and communications facilities, so that an outage in one doesn't spread to others.
Amazon recommends that customers who wish to avoid an outage run applications in two availability zones as a high-availability best practice. That would have protected customers in the June 14 outage. But it proved less than a bullet-proof guarantee in Amazon's bigger, Easter weekend outage in April, 2011. In that incident, later termed "a remirroring storm," service freeze-ups in one zone affected the availability of those services in other zones, according the Donnie Flood, VP of engineering at Bizo, a business information site caught in the service collapse.
[Want to learn more about this Amazon service incident? See Amazon Web Services Hit By Power Outage.]
Nevertheless, keeping an active back up copy in a second availability zone worked this time for Control Group, a New York custom application building firm that hosts its customers' apps on AWS EC2. "There's a little bit of overhead to that," conceded Dave Rocamora, VP of DevOps (development/operations) at the firm.
But Control Group embeds the automatic deployment of an active backup system in its new applications. Upon deployment to EC2, the backup will be established in a different availability zone unless the customer turns it off.
Rockamora said his firm is producing production systems for 20 different customers, including e-commerce transaction, video distributing, and HIPAA-compliant health care applications. He estimated "90% of them are active/active," using two availability zones and this practice saw all his customers through the June 14 outage.
The Amazon incident prompted a seller of software for on-premises private cloud, Piston Cloud, to engage in a bit of one upmanship: "These very public 'glitches' underscore the fact that private cloud is best--in terms of cost, security, scalability and innovation--every time," said Joshua McKenty, CEO of Piston, which produces on-premises software based on OpenStack open source code.
But that and a blog on the Piston site June 15 prompted some rejoinders: Jeff Sussna, principal at IT service consultancy Ingineering.IT tweeted: "Amazon rivals should not throw stones. FUD hurts the public cloud industry, not just the vendor."
Netflix chief cloud architect, Adrian Cockcroft, whose company is one of Amazon's largest users, tweeted June 15: "What part of 'Availability Zone' do people not understand? Exactly the failure mode we expect and plan for..."
Private clouds are more than a trendy buzzword--they represent Virtualization 2.0. For IT organizations willing to dispense with traditional application hosting models, a plethora of pure cloud software options beckons. Our Understanding Private Cloud Stacks report explains what's available. (Free registration required.)