InformationWeek: At Cornell University, your research centered on large-scale, distributed systems. What's that like in the real world of Amazon?
Vogels: In my eyes, Amazon is probably the world's largest distributed system. That might seem strange if we compared it to a number of other things like large Internet sites that are out there, but Amazon is unique in that there are many different software patterns that are active; it's not just request/reply, or indexing, or massive caching; there's large workflow pieces to it, dissemination pieces, content delivery. So, the many different pieces in the Amazon architecture make it a very large -- I won't say complex, I don't know if complex is the right word -- very diverse platform. For us, there are two principles that were important, especially in the context of reliability. Every second of downtime has financial impact. Reaching availability that -- Jeff Bezos recently had a great term for it, "indistinguishable from perfect" -- is as close to 100% as possible is essential for us.
So there are two main principles that we use internally. One is isolation, driven by the service-oriented architecture. There's no direct database access except for the pieces of software that run on the service, and the service has a hardened API. That is the only way that services or software pieces can interact with each other.
The second piece, and it's a more fundamental architecture piece, is these software components are loosely coupled, where the interaction between them is there's no tight connection or dependency between different pieces, which means if failure happen or if overload occurs, it's easy for software components to switch to other components that aren't faulty or provide better availability. We do that at a micro level and at a number of higher abstraction levels, even to the point our systems are designed to withstand complete data center failures. We have a rule internally in the e-commerce space that we should be able to lose a complete data center without the SLA to the customer getting violated. So isolation and loosely coupled are the two building principles that we use to construct the overall architecture.
InformationWeek: Opening the architecture to the rest of the world seems to add a level of complexity. You backed away from the term complexity, but it seems highly complex to do this for thousands of customers, all of whom have their own idea of what they're going to do there.
Vogels: We have quite a bit of experience with opening up Amazon to the outside world. We allowed merchants onto the platform, and we opened up all of the data that lived inside the Amazon e-commerce platform for outside developers to use. The enterprise services platform is another one, dealing with all of these large platforms such as Target and Marks & Spencer. Having such a multitenant platform has taught us a lot about how to guarantee performance and reliability while serving multiple masters. There are many other pieces of Amazon that we've opened up to the outside world, even the fulfillment center. You can go to an Amazon fulfillment center and sell your goods and use a web service call to say, 'Mail that package to that customer.'
Whether there is a big difference between the external customers and the hundreds of internal services we have that can be seen as small businesses within the overall Amazon frame, I don't think there is that much of a difference. Given the huge diversity within Amazon's software architecture, each of these services has different requirements, whether it is storage, compute infrastructure, queuing. We've been surprised by how customers have started to use, for example, S3 for everything from a backup system -- the write once, read never approach -- to a content delivery network or for software distribution, and every style of computing in between. Has that made it more difficult? I haven't seen anything in terms of software architecture or in the way that we do operations that has demonstrated to me that we were not well prepared.
InformationWeek: Describe your job as CTO.
Vogels: My role has changed a bit. I once wrote a blog post about the different CTO roles, and I described four different patterns. One is that of an infrastructure manager, and one is that of a tech visionary and operations manager. I'm not in that part, those are more CTOs that work together with a CIO and manage operations and a data center and things like that. There are two other roles. One is what I call a big thinker who thinks about the bigger patterns: What is the technology that you need to develop? And, at Amazon, what are the kinds of things that we need to develop for our customers and how do we drive customer-oriented development and customer-oriented architectures into the way that we do things? When I started at Amazon, I was asked to be this big thinker, thinking about what are the principles, what is the strategy? But now that we deliver more and more technology services to the external world, my role has shifted to becoming a more external-facing technologist, someone that talks to our customers, understands what their needs are, then takes that back and drives our internal road map based on the needs of our customers. The other side of it is helping our customers become successful on our platform. What are the kinds of things that Amazon can do to make sure that the transition into the cloud is as seamless as possible? I still help teams internally with their architectures, but mainly based on feedback from our customers.
InformationWeek: What are some of the possibilities for cloud computing beyond some of the obvious Web applications?
Vogels: When customers move over to the cloud, they start thinking about how they can automate their environments even more than they do now. Customers that find higher degrees of automation turn out to be the ones who gain the most benefit from moving to the cloud. There's a range of things that customers do to achieve that. For example, the Indy 500 guys moved into Amazon EC2 and the feedback was that on people cost alone they save about 50% because they could go from a hand-managed environment to EC2 where they could automate the scale-up and scale-down. That's a story that I hear frequently, and I like to believe that that's one of the key advantages.
I also see quite a few of our customers, given the times that we're in, asking how they should prepare for when things get better again. Rapidly scaling at the moment may not be their primary concern, but they do see that there will be a time in the future when that will need to happen again, and they're taking a look at whether their architectures are horizontally scalable and can exploit the advantages that cloud computing can give them. For the next year to two, those will be the focus of a lot of our customers: How they can automate their operations more so that they can have completely hands-off operation. And how they can make sure that their systems are horizontally scalable so that they really benefit from the cost effectiveness that the cloud has to bring.