How Netflix, Zynga Beat Amazon Cloud Failure

When Amazon Web Services crashed in April, Netflix and Zynga kept operating because they designed their systems to accommodate that possibility.
Slideshow: Cloud Security Pros And Cons
Slideshow: Cloud Security Pros And Cons
(click image for larger view and for full slideshow)
"You can architect for failure. Individual servers, storage arrays can fail and you can still stay up," said Selipsky. When asked what lessons Amazon had learned from the Easter outage, he said it had introduced more separation of systems to prevent a failure of one service from interfering with others.

But he stubbornly maintained that the service tie up in Northern Virginia was not as great an incident as has been reported and not an EC2 cloud outage. "The incidents you're talking about were fairly contained. We had five regions worldwide. One availability zone in that one region was affected," he said.

Then Selipsky added: "We've taken steps to separate systems, to decouple things that were coupled to each other. We've repaired software glitches and taken other steps to make sure it doesn't happen again."

Another speaker, Allan Leinwand, CTO of infrastructure engineering at Zynga, said firms must understand how to fit the Amazon infrastructure into what they want to do. In Zynga's case, it's built a private cloud--the Z cloud--that is similar to EC2, and it can move the operation of its games between the two. Zynga launched Cityville in the Amazon cloud In November 2010, allowed it to pick up steam, then as growth slowed, brought it back inside to its Z cloud. It's most recent offering, CastleVille, was launched the same way.

"We love the public cloud. Amazon has done exceptional job," said Leinwand. "But Amazon is a four-door sedan. I love four-door sedans. I drive one. But maybe your application needs a fast sports car or a Winabago or an 18-wheeler. In the Amazon cloud, a four-door is what you get."

The key, Leinwand continued, is understanding the needs of your application. Zynga has games that may grow slowly for several weeks after launch, then reach a critical mass that causes them to add millions of users in a short time. The scalable Amazon infrastructure is good for hosting that expansion.

But Zynga can gear its own infrastructure to share services across games and employ other cost savings in the private cloud that can't be matched by a single game running in the public cloud. "I think the public cloud is something I would absolutely leverage," he advised. "But after you've established your application in the cloud, think about how you can change the op-ex into cap-ex"--build out your own infrastructure in a way that becomes the most efficient way to run the application, he said.

Amazon's Selipsky was put on the spot when asked whether Oracle CEO Larry Ellison was right when he charged that's applications are unsafe because they run on Salesforce's multi-tenant infrastructure. Was that the case, he was asked by Matt Marshall, editor in chief of VentureBeat, or was that Oracle FUD.

Amazon has numerous Oracle offerings in its product catalogue, such as the Oracle database and Oracle applications but they too run in Amazon's multi-tenant architecture. Selipsky paused, then said: "Oracle is a great part of our offerings. ... All of the Oracle suite runs on AWS. I'll make that positive statement."