Zynga found it could do the same amount of work in its private cloud as it had been doing on Amazon EC2--but with only one-third the number of servers. It's a startling statistic, and how they did it bears explanation.
The comparison is oranges to oranges, when you consider that Zynga runs one virtual machine per physical server, whether the server is running on Amazon's EC2 or in Zynga's zCloud. Unlike Amazon, however, Zynga engineered its zCloud servers so they were optimized for different roles within its gaming software infrastructure--database access, Web server, or game logic execution. Of necessity, Amazon EC2 servers are general purpose machines, designed to run a wide variety of workloads, not do one job supremely well.
As recently as a year ago, Zynga, the producer of such popular online games as Farmville, Mafia Wars, and most recently CastleVille, was heavily dependent on Amazon Web Services for servers to host its players' activity. Eighty percent of player activity still took place on Amazon servers in January 2011. By January 2012, the figures had flipped, with 80% of game activity taking place in-house and 20% on Amazon.
[ Want to learn more about Zynga's "reverse cloud-bursting" approach to cloud computing? See Lessons From Farmville: How Zynga Uses The Cloud. ]
"In mid-2010 we realized we were renting what we could own," recounted Allan Leinwand, Zynga's infrastructure CTO, in his keynote Wednesday at Cloud Connect, a UBM TechWeb event in Santa Clara, Calif. Up until then, Zynga had found it difficult to project what its data center needs would be, given the rapid launch of successive games. The launch of Farmville added 25 million active users to the Zynga roster in five months. Rather than build data centers ahead of demand and having them sit idle until the demand materialized, it shifted more and more of its operations through 2009 and 2010 onto Amazon.
Zynga architects conceived of zCloud as an Amazon-like infrastructure in Zynga-owned or -leased data centers, governed by one management interface. It took just six months from conception to execution to get zCloud up and running, Leinwand told a nearly full auditorium at the Santa Clara Convention Center.
Zynga's Nov. 14 launch of CastleVille was the first game in several years that was launched inside Zynga; launching games had become primarily an Amazon-hosted event. CastleVille, where you adopt a role in building a castle fantasyland, proved to be another success. "CastleVille was launched solely in the zCloud, and it reached five million users in six days," recalled Leinwand in an interview before his address.
Leinwand was one of several CTOs tapped by Facebook to serve on its Open Compute Project to establish specifications for energy-efficient servers. He said zCloud servers follow the recommendations of the project, which included server cooling innovations, without precisely matching its design. Zynga doesn't build its own servers the way Facebook and Google do. It buys them through OEMs, who produce the exact types of servers it wants.
With only one VM per virtual machine, Zynga can adapt CPU, memory, and I/O to the type of task the server will undertake, then combine the various optimized sets of servers in its zCloud. Zynga is unusual in being able to do this because its games have many elements in common and in some instances, different games are using the same underlying application logic, even though their features vary. In one sense, it's an enterprise with one application, and its data center has been geared to run that application.
Leinwand said Zynga had carefully studied its existing operations and measured server performance to find where constraints lay before undertaking zCloud. "Our efficiencies weren't quite there as we started doing these game rollouts," he admitted. His team developed tools to measure what was going on in different elements of the game stack--its PHP execution, its memory mappings, CPU usage by game function, storage I/O rates, delivery of network packets per second. What they found was eye opening.
"We thought the main flow of traffic through the data center was from east to west; it turned out to be from north to south. We found lots of areas where we could improve," Leinwand told the crowd.