Cloud // Platform as a Service
Commentary
3/25/2013
02:00 PM
Connect Directly
LinkedIn
Google+
Twitter
RSS
E-Mail
50%
50%

How Netflix Is Ruining Cloud Computing

A laser focus on Amazon Web Services and seeming disregard for next-gen best practices could spell lock-in, and derail real IaaS competition.

On March 13, Netflix announced $100,000 in prize money for the developers who do the most to improve its open source tools for controlling and managing application deployments on cloud computing. Before spearheading this contest, Netflix's cloud architect, Adrian Cockcroft, released many internal Netflix tools as open source. Currently, 8 cloud-architecture-specific tools are available from Netflix, and Cockcroft has been very open in sharing his and Netflix's knowledge in public forms.

In theory, all of this should be wonderful. In reality, however, it's likely to leave cloud computing with an enormous hangover of subpar practices and architectures for years to come. Netflix is the poster child for "Cloud Computing v1.0" and demonstrates both the enormous benefits and troubling problems. Cloud Computing v1.0 is a strictly an Amazon Web Services affair -- it was first, and no other provider had the core features necessary to build comparable applications (think multiple availability zones and EBS with snapshots and quick restores). So it makes sense that Netflix embraced AWS; it saw huge benefits in being able to deploy and scale its service using the interfaces and architectures that were possible when AWS launched.

But Netflix has also suffered repeatedly at the hands of Cloud Computing v1.0 with four outages in 2012 alone, which certainly points to the possibility for some improvement in the high availability of its service. Of note, the Christmas Eve outage is perhaps most troubling from a "v1.0" perspective, as it was solely the result of Netflix's reliance on a less-necessary AWS service for load balancing, which could have been handled in any number of other ways to increase server availability.

[ Check out our new InformationWeek cloud computing comparison of 13 top PaaS vendors: Cloud Computing Comparison: PaaS Providers. ]

The reason the Netflix contest is likely to leave organizations worse off is because it thoroughly embraces this "Cloud Computing v1.0" mindset, both from an "AWS-is-the-only-vendor" standpoint as well as from an architectural standpoint. While it's arguable that there still isn't (quite yet) another infrastructure-as-a-service (IaaS) vendor that has a thoroughly tested core feature set, unless you just walked out of the tattoo parlor with "#AWS" on your shoulder, you know it won't be long. And all companies running on AWS should be looking forward to the rise of additional IaaS vendors, like those in our IaaS buyer's guide, for two reasons: higher availability and price competition.

Every cloud architect should know that it's only a matter of time before organizations have applications deployed across the world on many different IaaS providers in many different data centers, based on request volume and location in combination with a market for computing resources that changes price constantly. Locking yourself down to AWS today, for greenfield cloud architectures, would be the equivalent of deciding to develop an iPhone-only application when you know you'll have to support iPads, Android and others in the future.

In addition to the annoying AWS-centrism of the Netflix contest, there's a deeper problem: Some of Netflix's tools embrace a cloud architecture that was fine in the days of Cloud Computing v1.0 but that will look increasingly suspect as time goes on. I know that it's hard to throw out code and systems that are working fine, especially when they still look pretty good -- and often, squeaking out a bit more time is the right internal decision for an individual company. But instead of just wringing out the last bits of value, Netflix is throwing significant money at the rest of the world, asking everyone to embrace and extend their tools and code that are not particularly good practices for future cloud architectures.

Perhaps the best example of a bad-practice Netflix tool is Aminator. Aminator helps you build Amazon Machine Images (AMIs) easily, based on a "base" AMI and a package of code. "I must have produced about 25,000 Ubuntu AMIs," raved one excited early user. There's just one problem: It's hard to understand when this would ever be a good idea. Several years ago, spawning tons of images would have been a somewhat acceptable way to roll out a revised version of an application (due to application code, operating system, and/or server software). But today we have widespread adoption of configuration management tools like Chef and Puppet that make massive AMI creation a subpar practice at best. Amazon Web Services itself recently rolled out a service called OpsWorks, which would be a significantly better way to handle deploying applications -- it uses Chef.

There are other less-bad tools, but many bear the mark of having to architect around a number of issues that have since been more or less resolved; it's a bit like an open source project that relies heavily on SOAP instead of being RESTful. For example, Edda, which figures out what cloud resources you're using at AWS, just seems like something that had to be built because no one properly set up how resources should be requested and deployed. And Asgard, a very cool tool from 2010 for managing a variety of different applications across AWS, would be a hard sell as a best-of-breed tool today compared with other open source options, notably Scalr and Chef.

This is not to say that all of Netflix's open source cloud tools fit into this mold. Denominator is a great DNS manager (because it's multi-cloud), and Simian Army is a fabulous, ground-breaking idea for testing cloud architectures (it is, unfortunately, AWS-only).

There's a possibility that the Netflix contest will help lead the world toward Cloud Computing v2.0 and beyond by embracing multi-cloud architectures that use orchestration and configuration management in optimal ways. However, I am skeptical on both fronts. Cockcroft's public comments suggest little interest in using other cloud vendors. A good chunk of the prize money is in AWS credits, and Amazon's CTO is a judge; all this points to a very AWS-centric mindset. Moreover, the fact that Netflix just released Aminator last week indicates to me that Netflix is happy to roll out whatever tools they've built, regardless of whether they fit in with a best-practices modern cloud architecture.

But please, Netflix, prove me wrong. Embrace a less proprietary, more highly available, more standardized cloud -- and put Google's Urs Hölzle on the panel while you're at it. #UrsForNetflixJudge

Cloud Connect returns to Silicon Valley, April 2-5, 2013, for four days of lectures, panels, tutorials and roundtable discussions on a comprehensive selection of cloud topics taught by leading industry experts. Join us in Silicon Valley to see new products, keep up-to-date on industry trends and create and strengthen professional relationships. Use Priority Code MPIWK by March 30 to save an extra $200 off the advance price of Conference Passes. Register for Cloud Connect now.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 2 / 5   >   >>
middleageman
50%
50%
middleageman,
User Rank: Apprentice
3/28/2013 | 6:17:33 PM
re: How Netflix Is Ruining Cloud Computing
"I'm pointing out that this roots you in 2008, and ignores everything that AWS has developed in response to the needs of their customers in the following five years." I read this wonderful dialog and thank you for it. I agree that portability and cross platform tolerance it tantamount to success for us, the users. for you, the producers, kudos for even considering our needs above our $'s. (which follow anyway) now, about self driving cars and regional outages, which regions? i need to know.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/28/2013 | 2:23:15 AM
re: How Netflix Is Ruining Cloud Computing
Your attitude toward Python here is quite different from how you appear to approach it in other spaces (the Netflix techblog article, Pycon, etc); and it's very interesting to me in light of the rigid adherence to AWS over all other IaaS options, seemingly forever. Where is the same level of skepticism and evaluation on AWS that you appear to have toward programming languages? Again, no one is addressing Adrian's comments that he is uninterested in any other IaaS vendor period/no exceptions. If that's not the case, then it would be wonderful to hear. Otherwise, it just stands out as a very different attitude and decision than what you're doing elsewhere in your technology choices.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/28/2013 | 2:06:38 AM
re: How Netflix Is Ruining Cloud Computing
Well, talk about emotive.

No one has addressed the points raised by my citing Adrian's quotes. I take from that that I am right, and Netflix simply refuses to consider any other vendor because of the same religious adherence to AWS that you're accusing me of having.

Ultimately, my piece raises two issues: (1) Single-vendor adherence, and (2) what's the best reference architecture for cloud deployments. As I said above, no one has disputed the adherence to AWS that I read from Adrian's quotes. And you and Adrian (and others) have said that Netflix's cloud architecture is probably not a good reference architecture for the cloud. So from a fundamental standpoint, we do not disagree. (You seem to want to argue amongst the trees; I'll concede the trees to you--my concern is with the forest).

One last point: you ask me to defend my claim that baking AMIs adds complexity--and immediately thereafter, you explain why it adds complexity (additional step, reaping, etc). Systems with fewer steps and less complexity are generally less fragile than those with more steps. The bake v. no bake decision (as I think Adrian addresses elsewhere on this page) is something that should be made on a per-use-case decision. The idea that baking is *always* superior to not baking is not true. (Again, this appears to be undisputed by Adrian).
bmoyles
50%
50%
bmoyles,
User Rank: Apprentice
3/27/2013 | 9:30:36 PM
re: How Netflix Is Ruining Cloud Computing
You keep insisting that pre-baking AMIs is somehow adds complexity but have yet to articulate how. Please explain how pre-staging content and configuration on a fixed image prior to launch is any more complex than maintaining push/pull infrastructure and servers to dynamically turn running instances into services. I have no extra infrastructure to maintain to support configuration management other than version control. I have a process that reaps old, unused images. I have, at the end of a bake, an entire OS image that represents a point-in-time configuration of my application that I can refer to for triage. The absolute minimum bake script would be maybe 50 lines of bash? This is more complex....how?

Think of the rate at which the average enterprise is releasing their applications. How many AMIs is a smaller, slower-moving organization really going to produce in reality? Given the cost of S3 storage, is that really a significant cost? If it is...well... *baking is not for them.* Period. Full stop. End of story.

Do people managing 20-node clusters not have the need to roll out identical images too? Do smaller organizations have no need to dynamically manage capacity? What is it about baking that makes you think it's specifically intended for large launches? Every developer at Netflix uses the process, whether their autoscaling group is a single node or 1000 nodes.

Also please explain why you believe that the Netflix platform *depends* on baking in any way, shape, or form. It does not. We choose to bake and feel it is the best thing for us, but it's not required by any component in our infrastructure.

There is nothing whatsoever stopping folks from using Asgard to launch N instances that then have configurations applied to them post launch by Puppet, Chef, Saltstack, cfengine, etc. They just have to wait longer for the instances to become ready to take on traffic than they would if they baked ahead of time. That's the *major* difference.

The Netflix way of "adopting the cloud" is not sub-par in any way *for Netflix.* It's not for everyone, but it works and it works very well. I still fail to see how Netflix or any other organization providing open source software is at all responsible for any enterprise that blindly adopted tooling, process, policy, or anything without evaluating whether or not it made sense in the context of their business. The assertion that Netflix, by virtue of releasing its software and encouraging folks to help improve, extend, and *port it to other clouds*, is somehow beholden to organizations who have no ability to decide whether a solution is the right one is patently absurd.

Op ed and blogging does not disavow one from journalistic integrity. Each and every one of your points could have been addressed by engaging anyone at Netflix. Little things, like ensuring that someone tweeting about 25,000 cowsay AMIs isn't on the team that produced the tooling, or confirming that pre-staged AMIs are mandatory for playing in the ecosystem... Failing to follow through on that before posting a manifesto of FUD will be far more damaging to cloud computing than any contest could ever hope to be.

At this stage, though, we might as well be talking about guns or abortion or right- versus left-wing politics, as your insistence on regurgitating falsehoods underscores the fact that there is no shaking your preconceived notions and biases. I'm going to get back to the herd, 25,000 cows is a lot of cows.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/27/2013 | 8:58:40 PM
re: How Netflix Is Ruining Cloud Computing
Right--because you have the right to use the AWS API.

Do you really not see a difference between the NetflixOSS approach to interacting with the cloud compared to the Scalr approach?
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/27/2013 | 8:57:02 PM
re: How Netflix Is Ruining Cloud Computing
Agreed. This would be an excellent forward to have to people looking at using the NetflixOSS tools.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/27/2013 | 8:55:43 PM
re: How Netflix Is Ruining Cloud Computing
There is no IaaS provider live with an AWS API layer that supports it with an SLA because of the legal issues. I'm not just making this up; it's a significant issue that most cloud consultants who work for large organizations are worried about (e.g., DoD).

When I was in college, every college student I know used Napster to pull down music. That didn't make that activity legal either.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/27/2013 | 8:40:41 PM
re: How Netflix Is Ruining Cloud Computing
Sorry--replace DynamoDB with SimpleDB. Same point: proprietary to AWS.
jemison288
50%
50%
jemison288,
User Rank: Ninja
3/27/2013 | 8:39:29 PM
re: How Netflix Is Ruining Cloud Computing
Well, in calling a contest "Fix the Cloud", it's a bit like saying "One True Way" (a better name might be, "Improve Our Proprietary Toolset"). From a future-looking standpoint, the Netflix cloud tools are less flexible than tools (like Scalr) that adopt abstraction layers for interacting with IaaS and that anticipate the future needs of working with multiple IaaS vendors (which, again, is just a matter of *when*, not *if*).

At the end of the day, what I'm most concerned about is promoting best-practices cloud implementations, because bad initial implementations will hurt cloud adoption significantly. Netflix's architecture and plans may be ideal for Netflix, but they're not an ideal reference architecture because so much focus in the Netflix architecture is on supporting Netflix's specific use case. For example, the focus on baking AMIs: it adds complexity (an additional step, additional storage costs, additional management requirements, additional testing, additional shipping of data cross region/cloud) because, for Netflix, launching 500 identical VMs at a time is common. But that's not common for the average cloud architecture. The additional complexity of baking AMIs--which is essentially required by the Netflix architecture--is unnecessary when your primary focus is high availability across multiple clouds/regions using a small number of each VM. This is similar to the use of something like an in-memory database: there are use cases where it's essential, but requiring an in-memory database in a general-purpose reference architecture doesn't make much sense.

I have never argued that Netflix wasn't providing a useful service--I am arguing that with great power comes great responsibility, and Netflix (and its defenders) seems uninterested in thinking about novices using its tools to adopt the cloud in ways that will be sub-par.
bmoyles
50%
50%
bmoyles,
User Rank: Apprentice
3/27/2013 | 6:12:17 PM
re: How Netflix Is Ruining Cloud Computing
So if someone wholesale adopted the entire Netflix stack without doing any sort of evaluation or due diligence to see whether the workflow it promotes makes sense within the context of their business and the applications it requires, Netflix should be responsible for that situation? Why is Netflix any more responsible than, say, the Apache Foundation for someone adopting Cassandra when MySQL was a better fit, or VMware selling vSphere to an organization whose applications don't lend themselves to virtualization, or PHP for existing at all? (Given that the 25,000 instances of cowsay was lost on you, I will point out: that last comment is sarcasm, and PHP, despite my personal feelings towards it, is used to solve many problems and is the right tool for many folks)

These tools are out there to promote what we think is *one* good way of doing things, and for our needs, they have worked phenomenally well. There are some organizations who may benefit from the whole stack, others may just need a piece, and other still might not find anything that fits.

Furthermore, this assertion that our process and tooling is only beneficial at large scale is a bit absurd. Does your infrastructure have the need to dynamically expand and contract capacity based on demand? Our tools *might* be a fit. Do you need the ability to build graceful degradation into your services? Our tools *might* be a fit. Do you need the ability to launch multiple instances of a given application and manage those instances as a unit? Our tools *might* be a fit.

Are they the One True WayGäó? Of course not, that would be silly to assert. Are they representative of a possible way that might work for your organization? Absolutely. Is there *something* in the suite of released tools that might benefit a variety of enterprises? I think so, but that call is ultimately up to the enterprise.
<<   <   Page 2 / 5   >   >>
Google in the Enterprise Survey
Google in the Enterprise Survey
There's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity ­products, and 69 percent cite Google Apps' good or excellent ­mobility. But progress could still stall: 59 percent of nonusers ­distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.