03:02 AM
Connect Directly

Google Compute Engine Leverages Third Party Support

RightScale, MapR, and Puppet Labs bring key features, a larger ecosystem to Google's new infrastructure-as-a-service.

Google I/O: 10 Awesome Visions
Google I/O: 10 Awesome Visions
(click image for larger view and for slideshow)
Google's newly launched Compute Engine infrastructure-as-a-service (IaaS) has some limitations, but it's already attracting support from third parties who can address its areas of weakness. This speedy early development of a supportive ecosystem is a good sign for Google IaaS's prospects.

For example, Google's IaaS is designed to run KVM workloads, which works fine for the many startups and independent developers who have built their applications on Google App Engine and are not running them in any virtual machine. They can submit their job to Compute Engine, select a type of server, and let it provision a KVM virtual machine for them.

But many established businesses are already running virtualized applications and are ready to move a discrete workload, configured the way they want, to public IaaS. Amazon Web Services' EC2, for example, prefers tasks submitted in its Amazon Machine Image format, but accepts VMware virtual machines and converts them. Microsoft's Windows Azure is designed to run Hyper-V, but can accept both VMware and Citrix XenServer jobs.

For Google Compute Engine to do the same, it will need the help of third party RightScale, which provides a job configuration front end than can translate between the different virtual machine formats. RightScale has produced over 40,000 Linux and Windows server templates that can be browsed by a customer. After selecting one, it sets the operating system and application combination that can be submitted to a particular cloud service, including Amazon EC2 and, now, Google's Compute Engine.

[ How does Google's new IaaS compare to Amazon Web Services? See Google Compute Engine: Hands-On Review. ]

Michael Crandell, CEO of RightScale, said his firm looked over the features of Google's pending IaaS and was impressed with its speed of booting up servers and its ability to establish encrypted, private-line communications between virtual machines. The virtual machines may be in different geographic locations, but the connections between them "appear as a local area network, from the system adminstrator's point of view," he said in an interview.

In addition, data written to storage by Compute Engine is also automatically encrypted, giving its operation an additional security feature that Crandell thinks will be attractive to future cloud users. So RightScale signed up to support its KVM infrastructure, and customers who might otherwise be turned off by the use of KVM may go through RightScale--for a fee--to have their workloads targeted to Compute Engine.

Likewise, performing analytics on big data is one of the cloud's attractions. Google, as the inventor of Big Table and MapReduce, should be able to attract big data users in the long run. But it helps that MapR, an implementer of analytics on open source Hadoop, has a system ready for use on Compute Engine.

MapR is a commercial implementation of the Apache Software Foundation's Hadoop. At Google I/O, a 1-TB sort, or TeraSort, job was completed in 80 seconds on a Compute Engine cluster of 1,256 nodes and 1,256 disks, at a cost of $16.

Although he didn't tie the project to MapR or Hadoop, Google senior VP Urs Holzle, one of the primary architects of Google's search data centers, appeared at Google I/O June 28 to illustrate how Compute Engine can be used for Hadoop-style parallel processing of a big data problem.

In this case, the big data problem was finding characteristics and attributes in what is known about cancer patients and associating those findings with specific genes or gene mutations that are known in the human genome. An algorithm used by the Institute for System Biology in Seattle sifts through the human genome, looking for associations between what is known about specific genes and the attributes of cancer patients.

The search for associations with cancer is extremely complex. On the institute's own 1,000-node cluster, it was able to find one association for every 10 minutes of processing. When the problem was shifted onto a 10,000-core Compute Engine cluster (1,250 servers, eight cores per server), the institute was able to discover one every few seconds. "This port required little effort because Google Compute Engine offers an environment that is similar to the institute's own cluster," said an institute report titled "Behind the Compute Engine Demo at Google I/O."

Holzle showed a visualization of the research in his June 28 Google I/O keynote. The human genome was represented as a circle, with patient attributes located in sub-circles inside. Whenever a connection was discovered between an attribute and a gene, or one gene with another, a line was drawn between the two. As the rate of data accumulations was illustrated with the 10,000-node cluster, one line followed another every few seconds.

Holzle said that problems such as this could be run on much larger clusters because they do not require intensive I/O. He then illustrated the pace of discovered associations when the problem was run on a 600,000-node cluster that spanned several Google data centers, possibly the three that now make up Compute Engine. The data visualization showed lines being drawn across the circle at such a rate that it was being quickly filled in.

Such a use of cloud compute power to solve big data problems may help find a cure for cancer, Holzle told the 6,000 developers in attendance, as they applauded the demonstration. "You benefit from our decade of experience in building and running" data centers in the cloud, he said.

The third independent software vendor with support for Compute Engine on the day of its announcement was Puppet Labs, supplier of Puppet open source code and the Puppet Enterprise product based on it. Puppet is a configuration and deployment engine that builds a neutral stack of an operating system, an application, and the application's dependencies, then formats them for a particular cloud target. It has 300 configuration models with which users may configure cloud workloads, said Teyo Tyree, co-founder of Puppet, in a June 28 blog post.

It can support on-premises clouds based on VMware virtualization. It can also support deployments to Amazon's EC2 and Google Compute Engine. With Puppet Labs, as with RightScale, users of VMware virtualization on premises can get automated assistance in configuring workloads for the Google infrastructure.

Google benefits by having experienced third parties ready to announce their support for Compute Engine on the day it was announced. Their support tends to broaden the customers who can consider using it, and it presents simplified means of getting workloads to the Google cloud. It would have taken Google months or even more than a year to bring the tools and polished interfaces to do the job on its own.

At the same time, these three third parties--and no doubt others in an emerging Google ecosystem--were eager to appear by Google's side, even it was late in getting to the IaaS party. "Google has invaluable experience and insight about what it takes to operate infrastructure services," said Puppet Labs in its announcement of Compute Engine support. Google may benefit from Puppet's value add, but Puppet and other third parties are clearly happy to be standing at Google's side.

Expertise, automation, and silo busting are all required, say early adopters of private clouds. Also in the new, all-digital Private Clouds: Vision Vs. Reality issue of InformationWeek: How to choose between OpenStack and CloudStack for your private cloud. (Free with registration.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Email This  | 
Print  | 
More Insights
Copyright © 2019 UBM Electronics, A UBM company, All rights reserved. Privacy Policy | Terms of Service