re: How Netflix Is Ruining Cloud Computing
Auto Scaling Groups.
I'll put aside the inflammatory, hyperbolic headline of the editorial for a moment, and talk about Auto Scaling Groups. Let's see how many times I can mention Auto Scaling Groups. Somebody count for me please.
At the core of Asgard's functionality is the Auto Scaling Group.
When Eucalyptus asked what they need to do in order to run Asgard against a Eucalyptus server, I told them they need to implement Auto Scaling Groups, and stub out a few other unimportant Amazon services Asgard currently expects to call. A few months later, they came back and said they were done. I asked if they implemented scaling policies. Yep. CloudWatch metrics? Yessir. Scheduled actions? You bet. Great! Let's finish making this thing flexible enough to use a Eucalyptus server. Someone still needs to add configurability to Asgard for regions, endpoints, instance types, application provider, and cloud API authentication. Cloud prize, anyone?
When OpenStack support consultants ask me how they can run Asgard against OpenStack, I tell them that first OpenStack needs to support the concepts that make Asgard useful, specifically Auto Scaling Groups. If you want to use Asgard without Amazon and without a cloud that has Auto Scaling Groups, then I really have to ask why. That's like using a food processor to open an envelope; you might get it to work, but to what end? There's maybe one screen in Asgard that might be useful for launching an instance without an Auto Scaling Group, but we don't use that screen much. Instead, I recommend choosing some implementation of Auto Scaling Groups, either through Scalr, Amazon, Eucalyptus or RightScale. The Auto Scaling Group serves to name and version a cluster, while associating it with an owner, and guaranteeing that the instances are homogeneous. The important part is the named group of instances of a single immutable image. The dynamic scaling part is gravy, although it does save you a lot of money.
As a partial substitute for the AWS Console, Asgard serves seven purposes for corporate Amazon customers, listed on the Netflix tech blog post where I first announced Asgard. (Google asgard tech blog). The purposes are: (1) Hide the Amazon keys, (2) Auto Scaling Groups, (3) Enforce conventions, (4) Logging, (5) Integrate systems, (6) Automate workflow, (7) Simplify REST API. When and if Amazon adequately addresses all seven of those issues in their own console, then I will gleefully recommend that Netflix deprecate Asgard and start using the AWS console instead. Then I'll go write some movie-related software instead. However, I'm not holding my breath. Amazon has a lot of other things to consider beyond supporting the cloud model Netflix has chosen. My prediction is that Asgard will remain a reasonable option for customers of cloud providers that have Auto Scaling Groups, starting with Amazon.
Is the publicity of Asgard putting pressure on cloud providers to implement both Auto Scaling Groups and usable graphic interfaces for configuring those Auto Scaling Groups? I hope so. That's one of the reasons I wanted to open source Asgard. If nobody can figure out how to use Auto Scaling Groups, then no one will use them. Then Amazon is less likely to add them to their console and less likely to augment them to be more useful, and Google is less likely to implement them. Auto Scaling Groups are great. Let's use them. Let's tell more cloud providers to provide them.
Will another company do as Eucalyptus did, and clone enough parts of the Amazon API to get free benefit from our tools? That would be good. Remember, Eucalyptus did most of that work before Amazon even talked to them. If cross-cloud-provider portability is your focus, my advice would be to add to Eucalyptus' open source implementation and make it plug into a dozen other cloud vendors the way it plugs into any data center. Personally I'm more interested in using so many isolated AWS regions that I don't need to worry about any one AWS system having a problem.
Now, let's talk a little more about AMIs.
Relying on a Chef/Puppet configurator for every production instance launch is not a good idea. It's a really bad idea. I don't why anyone would regard deploy-time configuration as something new and good, while regarding pre-baked image launching as something old and bad. It's the other way around. You might be used to the idea of deploy-time configuration, but it's still a bad idea. It's an unnecessary risk. The point of Aminator is to give people a robust way to stop thinking in that old school way. I want people to start using Chef at build time, not deploy time. Use Chef with Aminator to create a complete image of the latest version of your application. Then know with certainty that every instance of that AMI will be identical in the development, test, staging, and production environments, in multiple redundant regions across four continents, even if the network fails during instance startup, even if the Chef server is getting upgraded or is falling over one day, even if a second deployment of the image happens months later. All the instances will be homogeneous within an Auto Scaling Group, all the time, even at large scale.
For the past 9 months, Aminator was the missing piece in the story of Asgard's ease of use. Now that there is a convenient way to produce a new AMI for each software build, it should be easier for people to use Asgard and Auto Scaling Groups for deployments without needing to rely on a highly available production deploy-time Chef server. If these resiliency concepts can be offered by more cloud providers, so much the better. I don't think that's ruining the cloud. I think that's promoting good patterns for tomorrow's cloud.