Comments
How Netflix Is Ruining Cloud Computing
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 3 / 5   >   >>
adrianco
50%
50%
adrianco,
User Rank: Apprentice
3/27/2013 | 5:02:38 PM
re: How Netflix Is Ruining Cloud Computing
Both ways of creating images are valid and tooling should be used to automate every step of the build process. There should be no hand crafted images. Every image should be traceable to the bits it was built from. That's the best practice of cloud.

The chef at runtime approach works fine at small scale but breaks horribly at large scale. When you have lots of developers changing things at once you want to build using the latest bits and freeze that build for test and deployment. For availability you would need multiple distributed Chef servers, but you then have to guarantee that they are always in sync, which is one of the hard problems of distributed computing. Avoiding that problem has value.

Baking AMIs is wasting a cheap resource, and we have tooling to clean up the leftovers. These are implementation choices that should be made appropriately to the situation. The NetflixOSS PaaS is interesting to many enterprises who do have large scale problems, and who find that other PaaS solutions are currently optimized for startups and small scale applications.

If you only have 10s of instances, NetflixOSS is likely to be overkill. If you have 100s it becomes useful, with 1000s it's essential and 10,000s it's probably the only game in town at the moment. With 100,000s you are Facebook or Google anyway...
gregdek
50%
50%
gregdek,
User Rank: Apprentice
3/27/2013 | 3:33:23 PM
re: How Netflix Is Ruining Cloud Computing
"Unfortunately, the only company with authorized rights to the AWS API (other than AWS) is Eucalyptus, so what you make sound so easy in your response is so fraught from a legal perspective that you're not going to find any other provider doing it."

Except, of course, that other providers *are* doing it right now, and have been doing it for years. Cloudstack has Cloudbridge. Red Hat has Deltacloud/Aeolus. And it's all open source. Sure, we're moving faster down this path at Eucalyptus -- but it's the exact same path.
jemison288
50%
50%
jemison288,
User Rank: Moderator
3/27/2013 | 10:23:33 AM
re: How Netflix Is Ruining Cloud Computing
Thanks for the response. Unfortunately, the only company with authorized rights to the AWS API (other than AWS) is Eucalyptus, so what you make sound so easy in your response is so fraught from a legal perspective that you're not going to find any other provider doing it. Perhaps if AWS were less proprietary and more willing to contribute to the overall community, they would allow providers to implement it as their own APIs. Perhaps you could help put that pressure on them? Because once the AWS API can be used as an open standard, then Netflix's tools will instantly have a much bigger audience.

On AMIs: Your assertion that using Chef/Puppet for every launch "is not a good idea" assumes that you've got a lot of VM launches that will be identical (machine, cloud). Again, don't look at this from "what does Netflix do internally"; look at it from "general enterprise cloud adoption". By using an AMI-centric model of the world, you're (a) adding overhead to each release, (b) creating a management/cleanup/storage situation that you would not have otherwise, and (c) requiring yourself to treat launches in different regions/clouds differently--including verifying that you have the right images in the right places properly baked and ready to go. In contrast, using Chef/Puppet on every launch avoids every single one of those problems, and thus gives you much more flexibility. The cost of Chef/Puppet on each launch is that it adds (d) overhead (time, bandwidth) and (e) some additional level of fragility (how much depending upon where you're pulling files).

Your assertion that using Chef/Puppet on each launch is "not a good idea" shows how Netflix-centric your world is. For many, many people, Chef/Puppet on every launch is a much better business and technological decision than rolling AMIs for each release because the pain of (a), (b), and (c) is greater than the pain of (d) and (e). In fact, the fragility of (a) + (b) + (c) can be significantly greater than the fragility of (e).

Ultimately, this is not a referendum on "how Netflix should run its cloud architecture". This is a referendum on whether Netflix should have a responsibility to the cloud computing world to help novices understand best practices in running clouds versus running a contest that is more likely to promote sub-par use of the cloud.
jemison288
50%
50%
jemison288,
User Rank: Moderator
3/27/2013 | 10:06:50 AM
re: How Netflix Is Ruining Cloud Computing
Again, the issue is not about whether Netflix's current business decisions work for Netflix; the issue is about whether Netflix's tools and contest are beneficial for the enterprise that is considering how to move to the cloud. Running "at scale" for Netflix is thoroughly unlike running "at scale" for the vast majority of enterprise cloud need.

I'm not advocating a "lowest common denominator" as much as I'm advocating a fundamental set of best practices that one should master before getting into questions like, "how to launch 5,000 VMs in six continents within 10 minutes." If someone came to you wanting to know good software development practices, wouldn't you want to start with the basics of using code repositories, code review, style guides, and a discussion of waterfall v. agile? Before you started talking about how to manage a team of 500 developers? Yet on the cloud side, you act like it's unimportant whether Netflix does the former. As someone who would like to see much more enterprise cloud adoption, I see it as very important.
Joe Sondow
50%
50%
Joe Sondow,
User Rank: Apprentice
3/27/2013 | 5:43:23 AM
re: How Netflix Is Ruining Cloud Computing
Auto Scaling Groups.

I'll put aside the inflammatory, hyperbolic headline of the editorial for a moment, and talk about Auto Scaling Groups. Let's see how many times I can mention Auto Scaling Groups. Somebody count for me please.

At the core of Asgard's functionality is the Auto Scaling Group.

When Eucalyptus asked what they need to do in order to run Asgard against a Eucalyptus server, I told them they need to implement Auto Scaling Groups, and stub out a few other unimportant Amazon services Asgard currently expects to call. A few months later, they came back and said they were done. I asked if they implemented scaling policies. Yep. CloudWatch metrics? Yessir. Scheduled actions? You bet. Great! Let's finish making this thing flexible enough to use a Eucalyptus server. Someone still needs to add configurability to Asgard for regions, endpoints, instance types, application provider, and cloud API authentication. Cloud prize, anyone?

When OpenStack support consultants ask me how they can run Asgard against OpenStack, I tell them that first OpenStack needs to support the concepts that make Asgard useful, specifically Auto Scaling Groups. If you want to use Asgard without Amazon and without a cloud that has Auto Scaling Groups, then I really have to ask why. That's like using a food processor to open an envelope; you might get it to work, but to what end? There's maybe one screen in Asgard that might be useful for launching an instance without an Auto Scaling Group, but we don't use that screen much. Instead, I recommend choosing some implementation of Auto Scaling Groups, either through Scalr, Amazon, Eucalyptus or RightScale. The Auto Scaling Group serves to name and version a cluster, while associating it with an owner, and guaranteeing that the instances are homogeneous. The important part is the named group of instances of a single immutable image. The dynamic scaling part is gravy, although it does save you a lot of money.

As a partial substitute for the AWS Console, Asgard serves seven purposes for corporate Amazon customers, listed on the Netflix tech blog post where I first announced Asgard. (Google asgard tech blog). The purposes are: (1) Hide the Amazon keys, (2) Auto Scaling Groups, (3) Enforce conventions, (4) Logging, (5) Integrate systems, (6) Automate workflow, (7) Simplify REST API. When and if Amazon adequately addresses all seven of those issues in their own console, then I will gleefully recommend that Netflix deprecate Asgard and start using the AWS console instead. Then I'll go write some movie-related software instead. However, I'm not holding my breath. Amazon has a lot of other things to consider beyond supporting the cloud model Netflix has chosen. My prediction is that Asgard will remain a reasonable option for customers of cloud providers that have Auto Scaling Groups, starting with Amazon.

Is the publicity of Asgard putting pressure on cloud providers to implement both Auto Scaling Groups and usable graphic interfaces for configuring those Auto Scaling Groups? I hope so. That's one of the reasons I wanted to open source Asgard. If nobody can figure out how to use Auto Scaling Groups, then no one will use them. Then Amazon is less likely to add them to their console and less likely to augment them to be more useful, and Google is less likely to implement them. Auto Scaling Groups are great. Let's use them. Let's tell more cloud providers to provide them.

Will another company do as Eucalyptus did, and clone enough parts of the Amazon API to get free benefit from our tools? That would be good. Remember, Eucalyptus did most of that work before Amazon even talked to them. If cross-cloud-provider portability is your focus, my advice would be to add to Eucalyptus' open source implementation and make it plug into a dozen other cloud vendors the way it plugs into any data center. Personally I'm more interested in using so many isolated AWS regions that I don't need to worry about any one AWS system having a problem.

Now, let's talk a little more about AMIs.

Relying on a Chef/Puppet configurator for every production instance launch is not a good idea. It's a really bad idea. I don't why anyone would regard deploy-time configuration as something new and good, while regarding pre-baked image launching as something old and bad. It's the other way around. You might be used to the idea of deploy-time configuration, but it's still a bad idea. It's an unnecessary risk. The point of Aminator is to give people a robust way to stop thinking in that old school way. I want people to start using Chef at build time, not deploy time. Use Chef with Aminator to create a complete image of the latest version of your application. Then know with certainty that every instance of that AMI will be identical in the development, test, staging, and production environments, in multiple redundant regions across four continents, even if the network fails during instance startup, even if the Chef server is getting upgraded or is falling over one day, even if a second deployment of the image happens months later. All the instances will be homogeneous within an Auto Scaling Group, all the time, even at large scale.

For the past 9 months, Aminator was the missing piece in the story of Asgard's ease of use. Now that there is a convenient way to produce a new AMI for each software build, it should be easier for people to use Asgard and Auto Scaling Groups for deployments without needing to rely on a highly available production deploy-time Chef server. If these resiliency concepts can be offered by more cloud providers, so much the better. I don't think that's ruining the cloud. I think that's promoting good patterns for tomorrow's cloud.
bmoyles
50%
50%
bmoyles,
User Rank: Apprentice
3/27/2013 | 3:27:20 AM
re: How Netflix Is Ruining Cloud Computing
This is bananas...

"And unfortunately, your first happy user of AMInator (on Twitter, at least) made over 25,000 Ubuntu AMIs with it--can you tell me why that would ever be a good architectural decision? AMInator strikes me as a tool like PHP or a GOTO statement--there are places where you should probably use them, but it's hard to argue that they should be part of any kind of "best practices" decision."

No, that was me. Not an aminator user, but one of the aminator *authors*. Feel free to verify that both my Twitter and Github accounts align and feel free to observe aminator's commit history. Heck, look at my Twitter profile, where it is very clear that I...work for Netflix. If you took *anyone's* offhand comment about creating *twenty five THOUSAND* AMIs (let alone one of the people working on the tooling to do so), all with the application 'cowsay' as their primary component as being anything other than a joke, I don't know what any of us can do for you to help you understand motivations or intentions (unless you are aware of any large-scale talking ASCII cow clusters, in which case I stand corrected).

"One of the reasons why Netflix is now choosing Python is because the generalized Python developer writes consistent and good code. (We chose Python for the same reasons you did). But to someone who has no idea what a good cloud deployment looks like, the way AMInator sits out there--you're going to see a lot more people like the guy super-psyched to have built 25,000 AMIs over Twitter."

We do not choose technologies based on what prospective developers *might* do with it, like write "consistent and good" code (and the assertion that Python, by some magical virtue, makes good programmers is hogwash. Good developers write consistent and good code regardless of the language, and many people write bad Python with ease.) We choose technologies that fit the job and the situation. Given that a) aminator was intended to be run ad-hoc or from some other automation, b) there is a fair amount of Python experience amongst Netflix employees, and c) languages such as Python (and Ruby and Perl and ...) lend themselves to rapid iterative development, we felt it was the right tool for the job. While I personally enjoy using Python, there is nothing about aminator that couldn't have been done with Ruby, Perl, TCL, PHP, Java, Groovy, Scala, or heck, bash (which is what aminator's ancestor was developed with).

Aminator was built to be modular, and any of its 5 major components (at this point) can be replaced to work with any system you can conceive of. There's no reason a set of plugins couldn't be developed that produced images for Windows on Azure, or local disk images for use with VirtualBox. What we provided was a framework and *our implementation* which *naturally* services our needs. Folks are free to use it as-is, or they can take what we have and replace parts with what works for them. I really hope they do, too. I'd love to see it produce Windows images, FreeBSD images, and so on.

More documentation on how to use and extend aminator is on its way, and Netflix staff is in #netflixoss on irc.freenode.net fielding questions as they come in. You too are welcome to join and ask questions before posting articles, in case that wasn't clear :)
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
3/27/2013 | 1:26:58 AM
re: How Netflix Is Ruining Cloud Computing
I agree with Joe Emison when he upholds cross cloud mobility and multi-cloud tools as the ultimate goal. I agree with Adrian Cockcroft when he pursues innovation and fresh ideas for the AWS context... in which Netflix currently and for the foreseeable future operates. Each has a different purpose behind his argument, and so their points are to some extent sailing past each other without registering and certainly without scoring, I don't like to see too much judgmental-ness applied to other people's architecture. The judgement should be applied to our own efforts and let the other fellow pursue his initiative to the max, even if it's initially seemed to fail to meet one or more of our ultimate, far sighted standards. The cloud is young and it's impossible to say how some seemingly small or narrow-minded effort might grow legs and lead to long term gains for everybody. With that said, both sides have presented a case well and I've learned from this debate. Charlie Babcock, senior writer, InformationWeek
adrianco
50%
50%
adrianco,
User Rank: Apprentice
3/27/2013 | 12:41:59 AM
re: How Netflix Is Ruining Cloud Computing
Correction: The only DynamoDB support in NetflixOSS is contributed code by someone who was not working at Netflix at the time (a former employee). Netflix mostly uses Cassandra.
gregdek
50%
50%
gregdek,
User Rank: Apprentice
3/27/2013 | 12:25:32 AM
re: How Netflix Is Ruining Cloud Computing
"The fact that only one out of ten prizes involves portability, and the fact that you take such an expansive view of portability to include adding language support to an existing tool (which has NOTHING to do with cloud portability!), shows that you really think that cloud portability unimportant to Netflix."

The fact that Adrian encourages us at Eucalyptus and our friends at Cloudstack to tackle the portability problem head-on shows otherwise.
adrianco
50%
50%
adrianco,
User Rank: Apprentice
3/27/2013 | 12:11:57 AM
re: How Netflix Is Ruining Cloud Computing
You're using some emotive language here: "as long as you continue to force Netflix to use new and expanded Amazon-provided services over other options".

What actually happens is that Netflix has various problems to solve, and we do the usual make/buy evaluations, and sometimes we make it ourselves (e.g. Asgard over the AWS Console or other options) and sometimes we get vendors to build them. Part of that process is to work with AWS to build things we would like to use, but are of general use, so we don't want to build them ourselves. Part of the reason for releasing NetflixOSS is to make explicit to other cloud vendors the feature set and options that we have found useful, to see if they are also interested in responding.

You said "Perhaps AWS will release their API to the world and allow all businesses to use it openly, but they haven't yet, and so it's a very risky move to bet an architecture on AWS and any vendors (e.g., Eucalyptus) that AWS will bless."

For a start there are very many vendors who have implemented parts of the AWS API, and Eucalyptus have licensed the API so that they have access to the test suites. Nothing stops other vendors from doing the same.

It would only be a risky move to bet on an architecture that might go away. Betting on the industry leader with the dominant ecosystem looks like the lowest risk option to me.

Still, everyone is free to ignore NetflixOSS, there are plenty of other cloud architectures. But Netflix is innovating and scaling faster than the competition because we are also leveraging the innovation and scale of AWS.
<<   <   Page 3 / 5   >   >>


The Business of Going Digital
The Business of Going Digital
Digital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 20, 2014
CIOs need people who know the ins and outs of cloud software stacks and security, and, most of all, can break through cultural resistance.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.