With cloud systems, architecture matters most when things go wrong. Cloud vendors achieve architecture through a series of small decisions on which resources to offer and which trade-offs to make. Sometimes those trade-offs can lead to customer nightmares.
Customers of the cloud storage service provided by Nirvanix, which said last month that it's going out of business, are discovering how hard it can be to pull data out of cloud storage after that data has trickled in over a long period of time. Rumors of Nirvanix's demise leaked out only four weeks before the company's closure date, and a company statement a week later set off a scramble for the exits.
The size of the pipes providers use to move data into and out of their cloud is an architectural decision, and Nirvanix had maximized the inflow of data with lesser provisioning for outflow. Nirvanix's architecture reflected the prejudices of its business. In the end, the third-party vendors hosting Nirvanix bailed customers out by making 10-Gbps cross-switch lines available or otherwise using their expertise to relocate customers to another provider.
Most cloud vendor decisions won't end as dramatically as in the Nirvanix case. But it's still important for customers to understand the architecture of their infrastructure-as-a-service providers, as it can help IT teams build applications and design workloads that exploit the architecture efficiently. Such efficiency will be all the more important as companies spend more on IaaS in the coming years. "You have to understand how a public cloud system works. People have a tendency not to think about it," says David Linthicum, senior VP of Cloud Technology Partners, a Boston consulting firm.
To understand the architectural differences among IaaS providers, we'll examine the subtle differences between Amazon Web Services and Google Compute Engine services, contrast those services with Microsoft Azure, and then look at growing alternatives to those three services.
IaaS's Core Architecture
IaaS vendors don't talk about their architecture in great detail, viewing it as a competitive advantage. But at the most fundamental level, infrastructure clouds have scale-out architectures, rather than the scale-up architectures favored by most enterprises. Cloud vendors add servers, network capacity and disks in regular increments, and their cloud management software automatically discovers and integrates the fresh physical resources. Those cloud resources are connected by straightforward Layer 3 networking, usually commodity 10-Gbps Ethernet switches.
Amazon Web Services and Google Compute Engine are "95% semantically and architecturally equivalent," says Randy Bias, CEO of private cloud software vendor Cloudscaling. "From a pure architecture perspective, they are extremely similar," says Bias, who architected Korea Telecom's cloud and several other OpenStack- and CloudStack-based clouds.
There are important differences between the two, but first let's look at what's similar, as that offers a point of comparison for others.
Both Amazon and Google try to limit "failure domains" in their architectures, Bias says. If a piece of hardware fails, their cloud management software provisions new virtual servers and shifts the workloads. If an application or operating system stalls, other applications, databases and Web server systems keep running.
Both vendors use "sophisticated virtual machine scheduling capabilities" to evenly pack VMs onto hosts that are designed to run specific types of virtual servers well, he says. They both shun conventional elements of enterprise architecture, such as using a matching pair of high-performance servers to achieve high availability, or implementing storage area networks outside the server rack, since doing that would impose latencies. Amazon and Google don't use commercial software packages for change management databases, configuration, provisioning virtual servers or managing the rest of a workload's life cycle, relying instead on homegrown and open source code, both of which scale at little cost, Bias says.
Amazon and Google organize their geographically dispersed data centers into independent "regions," with subregions inside each data center. (Amazon calls the subregions inside data centers "availability zones.") Each zone is designed to limit the spread of failures, by having their own power supply and communications links, for instance.