Zynga's hybrid cloud strategy helps it cope with the risk of unpredictable demand for new online games.
Most of that activity happens inside the data center space Zynga controls, its Z Cloud. Companies are comfortable with private clouds inside their own data centers. Public clouds make them nervous--jitters only amplified by the 12-hour outage of Amazon's EC2 on April 21 (see story, "When Amazon's Cloud Turned On Itself").
Zynga is a big user of EC2 and hasn't spoken about the impact the outage had on its operation, or particular steps Zynga took to avoid losing its online games. "We are very big fans of Amazon Web Services and think highly of EC2 services," Leinwand says.
Still, as Leinwand watched well-known services and websites go down, he says, "I was surprised how much trust people put into an architecture they didn't understand."
Customers of a public cloud service, regardless of the supplier, "have to get as much insight as possible into the architecture and understand where it's weaknesses are," he advises. "They need to find the spots that need more redundancy."
Zynga, as a supplier of social games, is in a different position from an equities trading firm, for example, where even a few seconds of downtime can be devastating. But it still has plenty of incentive to keep its services up. Its players are nothing if not passionate, and they're not afraid to light up the community forums if a feature's not working as promised.
"On a given day, if we need to we can deploy a thousand servers."
-- Allan Leinwand, Zynga's CTO of infrastructure engineering
Unlike Amazon, Zynga has linked its internal data centers in different geographic locations together, so they can back one another up. Amazon does failover with different zones inside its data centers, but not across its data centers. If properly designed, say, with an understanding that Amazon's availability zones inside a data center are not entirely insulated from one another -- that a failure of a service in one may lead to failures in all -- a public cloud customer is better informed to craft a failover plan that gets it out of a catastrophic event and into an alternative data center.
Leinwand won't disclose in detail Zynga's failover plans, but it appears its strategy is to be able take its systems off EC2 and revive them elsewhere if there's ever a failure that threatens operations. As it leases space in various wholesale data centers, Zynga may have the option of keeping space and servers in reserve, or leasing additional servers on short notice from some other occupant of the data center by private agreement. Wholesale data center suppliers are also sometimes managed service suppliers as well, with the ability to shift capacity toward a large and valued customer. Leinwand wouldn't specify what route Zynga took.
Couldn't that alternative simply be another Amazon geographic location? Yes, Leinwand says, but shifting the part of an application that's caught in a failure somewhere else, and leaving parts that are still running in the original location, might not work either.
"For certain applications, the latencies of operating back and forth across the U.S. would have ill effects," he says. "A Web service talking to a database service or an analytics service -- you don't want to spread them apart" to the extent required by using data centers on either side of the country, such as Amazon's Virginia and California data centers.
Leinwand watched the effects of the April 21 outage with great interest, and instead of sounding like he has all the answers, he says, "I'm trying to learn from it." But the fact that Zynga came through the incident unscathed let him say, "I'm pretty proud of how we use the public cloud and Z Cloud."