Netflix is showing that you can replace your data center with the cloud. Last year, the video delivery company moved most of its production systems, except the initial capture of customer credit card data, into Amazon's EC2.
"We are just about 100% in the cloud," Adrian Cockcroft, cloud architect at Netflix, said Monday during a workshop session on managing application performance in the cloud at the opening of Cloud Connect 2011, a UBM Techweb event.
Netflix decided to move into the Amazon Web Services EC2 cloud because it saw its growth accelerating so rapidly that it faced a staggering task in building data centers to keep up. Cockcroft flashed a scene from an unidentified movie showing two men fleeing a building that was exploding.
"That's what was about to happen to our data center if we continued managing it ourselves," he said to laughs from a crowd that filled a ballroom at the Santa Clara, Calif., Convention Center.
The company has seen an explosive growth in Web site traffic and new customers, thanks to a free iPhone Netflix app that launched in 2010 and the firm's ability to download movies and TV shows to popular game console machines like the Wii, the Xbox 360, and the PS3 media server, added last year.
In the last quarter, Netflix's business has grown by 37% -- or over 5 million customers, Cockcroft said. "We've stopped building our own data centers. We couldn't predict where we were going to be" by the time it got a new generation of data centers built.
"We want to use clouds. We don't have time to build them," he said.
Netflix gained experience in the cloud by using it in 2009 to encode the digital versions of movies that it was adding to its library. It used thousands of instances, or virtual machines, in EC2 to execute the task as a batch job "and get more movies online," he said. Netflix is also a user of petabytes of Amazon's S3 storage.
It also began moving its Web site into EC2, a process that is largely completed, he said.
At no time did Cockcroft express dissatisfaction with Amazon as a cloud service provider, although he did acknowledge several challenges. Netflix chose Amazon over other cloud providers based on its size, ability to grow its services quickly, and its cloud feature set.
Among the challenges was Amazon's EC2's Elastic Load Balancer, which he termed as possessing "too many limits" for the scale at which Netflix wished to operate.
In addition, SimpleDB is good for dozens or hundreds of gigabytes of data, but Netflix needs it to handle terabytes at a time. That need "was beyond its sweet spot," although Netflix found SimpleDB living up to its name as an easy-to-use system.
Netflix built "a large tier" of data cache management in front of SimpleDB to handle more data. It used the open source system memcached to do so.
Cockcroft also said the performance of EC2's Elastic Block Store storage system for running applications was "slow and too inconsistent" for Netflix's purposes and it substitutes its own system built with the NoSQL system Cassandra.
The move to the cloud prompted the abandonment of many former standard practices in the data center. Netflix, for example, no longer uses a change management database because they work best with relatively static systems that change infrequently. The fluid, rapidly changing alignment of virtual machines in the cloud means Netflix had to impose a new method of managing changes.
Monitoring tools need to work differently in the cloud than in the data center, and don't transfer easily from one setting to the other. "Monitoring tools don't like hundreds of instances to appear in a few minutes," he noted, although the cloud tends to work that way.
In the multi-tenant cloud, where servers are shared by different customers, monitoring tools also miss the fact that a customer's workload that keeps the server extra busy can delay a time-shared hand-off to another workload by a tick or two of the system clock. As this happens repeatedly, one customer is penalized by the activity of another, resulting in a 1%-2% loss of paid-for time.
When many fellow tenants on a server are doing this to a Netflix application, Netflix can detect the "stolen time." If it reaches 30% of its paid-for time, it kills its own job and restarts it on a fresh server, a practice that a later speaker highlighted as Netflix "throwing the server back to Amazon" rather than tolerating second-rate performance.
At the same time, capacity planning is easier in the cloud. If you are off on your projections, it only takes a short while to spin up more virtual servers in EC2. On the other hand, a customer like Netflix can complicate matters for the cloud supplier. Cockcroft drew another round of laughs when he noted that, "Amazon sometimes calls to ask us what we think we're going to need."
Netflix is a unique cloud customer. Its need to store petabytes of video products and stream large data sets to customers quickly, on demand, gives it a unique set of requirements that few are ever going to match. The fact that it's found Amazon EC2 a suitable replacement for its own data centers indicates the cloud will be able to take on some of the most strenuous enterprise tasks in the future.
Netflix likes concentrating on its core business and expanded delivery methods instead of building more data centers. It hopes growth continues to accelerate the way it has in recent months, and that Amazon EC2 will enable it to keep up. "We don't see why anyone would want to do it any other way now," Cockcroft concluded.