One of the biggest lessons so far is that it's hard to know precisely what your systems are doing in a public cloud environment. Yes, Amazon.com's CloudWatch and services like it will tell whether your workloads are operating, but they don't tell how well apps are performing, such as if they're choking on I/O overload.
Even when there's an outright failure, it can be hard to get the information you need. When part of an Amazon data center in northern Virginia suffered primary and backup power failures on Dec. 9, it took 34 minutes before the news was posted on Amazon's Service Health Dashboard. Amazon acknowledged the outage and offered updates, but it was up to customers to assess the impact. To know whether their workloads were down, companies had to subscribe to CloudWatch--not everybody does--or a service such as VMware's CloudStatus or Alternate Networks' network monitoring, or check directly if a failover activated backup servers, which Amazon encourages you to set up for each workload.
Cloud computing service providers, like their customers, are in learning mode during this break-in period. They include infrastructure veterans Amazon, Google, IBM, Microsoft, and Terremark; outsourcers such as CSC; telecom giants Verizon and AT&T; and newcomers Cloud.com, Engine Yard, Heroku, and many others. If they're candid, they acknowledge they're venturing into unexplored terrain.
Microsoft, which started charging for its Azure cloud services in February, admits it still has work to do, especially in the area of cloud monitoring tools. Enterprise early adopters are in a position to shape how vendors build out these services. A private cloud appliance recently launched by Microsoft is being co-developed with eBay, which will initially use it for the relatively low-volume Garden by eBay service, where it tests partner's ideas and new applications, and eventually for basic auction services, says VP of technology James Barrese.
Even if companies are only testing cloud services, they should explore the inevitable problems that go with an emerging technology, as well as the potential competitive advantages. InformationWeek sought out early adopters to gauge how they're doing in both respects.
Know What Customers See
Jason Spitkoski, CTO of startup Schedule Bin, has a lot riding on cloud computing, a commitment that was seriously tested last month. Schedule Bin will use public cloud infrastructure to offer online applications that let businesses schedule employee work shifts or track teams as they execute tasks. Spitkoski looked at Amazon's EC2 but opted for Google's App Engine cloud service, which he says is better suited for making changes or adding new apps. Schedule Bin's applications, due to go live this summer, are built in Python, making them a good match for App Engine, which runs Python and Java apps.
Google App Engine has proved a solid platform, though its underlying Datastore system, which provides storage for Web apps, went through a very rocky stretch. In May, Datastore suffered three service interruptions, one lasting 45 minutes. In early June, Google stopped charging for Datastore, acknowledging that, since April, latency in retrieving data had become 2.5 times greater than in January to March.
By mid-June, Spitkoski was worried--Schedule Bin's beta customers had noticed a slowdown. "We have demos with potentially large customers soon, and I'm concerned that the apparent slowness will be brought up," he said in an e-mail interview.
Spitkoski considered how much to talk about the cloud infrastructure with customers; he wanted to put the focus on Schedule Bin's features, not on the cloud service that enabled them, or any problems Google is having. "We want to keep the demo straightforward and to the point, which means we don't want to get into fuzzy discussions about clouds, or what we think Google is thinking," he said.
Given Datastore's performance problems, Spitkoski was on the bubble, contemplating switching cloud providers at a time when most of Schedule Bin's development work was complete. "The timing couldn't be worse if we were forced to suspend our customer efforts and focus instead on switching cloud providers," he said.
Fortunately, the crisis passed. By the end of June, a relieved Spitkoski found that Google had "vastly improved" Datastore's performance. "I'm now completely satisfied with Google's App Engine," he now says.
Satisfied, but wiser about overly relying on a single cloud vendor. "I will always have an eye open for alternatives," Spitkoski says.
Enterprise IT teams could face a similar dilemma--getting deep into cloud development before realizing they bet on the wrong vendor. The better approach is to keep researching alternatives.
On the other hand, it can also make sense to work closely with a cloud services provider. With enterprise customers in particular, there's a chance to mold a cloud offering in a way that suits your company's needs if you have a good relationship with the vendor and the resources to collaborate.
Cloud Is Under Construction
Kelley Blue Book is one of the high-profile customers of Microsoft's Azure cloud service, having tested it as a way of handling traffic spikes on the Blue Book Web site, which offers used-car pricing information. If Azure made it possible for Kelley to abandon its second data center, the company could potentially save at least $100,000 a year, says Andy Lapin, Kelley's director of enterprise architecture.
But Lapin doesn't think he can see clearly enough into how well his applications are running in Azure. Microsoft released monitoring and diagnostic APIs late last year, but workload monitoring systems from Microsoft and other vendors are still lacking, he says. Lapin halted the planned off-loading of Web traffic spikes until better tools come along.
Lapin also warns that some concepts that sound a lot like those used in enterprise IT operations may be different when applied to the cloud. For example, the description of Azure's Table Storage service sounds familiar. The service "offers structured storage in the form of tables," according to Microsoft Developer Network documentation.
When Lapin sees that, he thinks in terms of relational database tables, but that's not how Table Storage works. It "isn't really like a big, flat table," he says. "You really only get indexing on a single column so, while you can query any column, performance optimization is very different from using SQL Server."
Google App Engine's Datastore tables have similar limitations. In general, cloud systems don't do joins across tables or deep indexing, the kind of ad hoc information sorting that relational databases specialize in.
If that drawback sounds like a minor thing, listen to Oliver Jones, an experienced Microsoft .Net developer who writes about coding on his Deeper Design blog. In March, about a month after Azure came out of beta, Jones shared his initial experience with Azure Table Storage: "It looks fairly full featured. However, it is not. At almost every turn I have ended up bashing my head against a Table Storage limitation. Debugging these problems has been a bit of a nightmare." Investigate such warnings before turning your development team loose on it.
One selling point for Azure is the presumed degree to which it will work with Microsoft products already in place. Microsoft says all SQL Server queries will translate to run on SQL Azure, its cloud database. In general, however, be careful about assuming compatibility between existing enterprise systems and those in the cloud.
Beware Assumptions
Innovest Systems is a supplier of software as a service for trust accounting and wealth management firms. It provides online decision-support and accounting for companies with a total of $250 billion in investments, including Mitsubishi UFJ, so availability and reliability of its services are critical.
To deliver its SaaS apps, Innovest previously managed its own hardware in a co-located facility run by an outsourcer, Savvis. Between 2004 and 2008, Innovest migrated its production environments to Savvis-hosted virtual servers. More recently, this platform has evolved into what Savvis calls its Dedicated Symphony cloud service, a form of private cloud computing where servers in an otherwise multitenant cloud are reserved for one customer.
A dedicated cloud made a lot of sense to Ray Umerley, chief security officer at Innovest. "We had always struggled with co-located services. We had to maintain hot backups on standby," he says. "Whenever anything stopped, somebody had to go over there and change a tape or a drive." With the Symphony service, Innovest designed the facilities that it wanted down to the specific policies in the firewall protecting the servers, and Savvis installed them and ensured that they ran.
Over the course of their six-year relationship, Innovest had built a close partnership with Savvis and concluded it could trust private-cloud-style operations to its outsourcing partner. A big step was the move from co-location services, in which Umerley and other technical people had to periodically adjust equipment at the Savvis facility, to letting Savvis technicians take over that function. Teams from the two companies covered myriad operational details so that Innovest could guarantee to its customers that their data was being handled in a way that met strict regulations.
Despite all the preparation, just weeks before the switch-over the teams realized they had overlooked a fundamental detail: Innovest ran Windows applications and Savvis-hosted Windows servers, but the version supported by Savvis was Windows Server 2003, while Innovest apps were still on Windows 2000. With 15 days till deployment, Innovest's IT team swung into high gear and migrated the key applications to Windows 2003.
Both parties knew each other's operations well and thought they were practicing the utmost due diligence as they approached the transfer date. The version of Windows Server involved was something so obvious that everyone assumed it would be the first issue considered, not the last. Since Innovest's launch of Symphony services, everything has run fine, and Umerley gives Savvis high marks for offering visibility into its architecture, engineering, and security.
When making a move into the cloud, "know your provider well," Umerley cautions. That means scrutinizing its security practices, and knowing how the provider keeps its data handling in compliance with regulations that govern your business. Umerley recommends being open and putting the pressure on the vendor to spot potential problems. "Be sure to state: 'Here's what we have. Tell us what we will have to change,'" he says.
Watch Where The Data Goes
Don't necessarily write off cloud computing just because sensitive data is involved. But watch that data closely.
Manpower CIO Denis Edwards is eager for his IT teams to experiment with cloud development, to speed up development and cut costs. But he also has a clear policy about data governance: Developers don't have blanket approval to move data into the cloud.
A project with a certain data set may get the OK through Manpower's data governance process. However, if that project expands to more data, it requires a new approval. Don't let sensitive data creep into the cloud as a project's scope expands.
On the other hand, don't assume that the cloud's a nonstarter just because there's sensitive data involved. Some of the most interesting applications will be those where sensitive data stays on-premises yet gets shared or used in some way through cloud services.
That's happening at Lipix, a nonprofit formed two years ago for the purpose of easing the exchange of information among healthcare providers on Long Island in New York. In one year, CTO Mark Greaker has used CSC's CloudLab to establish a central index of patient records being used at 22 of the 25 competing hospitals in the region. The index now covers about 1 million patient records, but the records stay within the hospitals.
How does that work? Greaker has an edge server in each hospital linked to Lipix's index, which resides on servers in the CSC cloud. Greaker adds a hospital every two to three months to the master index, and when he does, he goes to a CSC portal and commissions a virtual server with the CPU, memory, and storage that the hospital needs. The index tells a doctor where a patient's record is, and the doctor can see a read-only version over a messaging system. About 1,000 of Long Island's physicians are using the system, and Greaker has a $9 million grant to reach another roughly 2,000 within three years. With employees focused on establishing the system, not racking hardware, "I've been surprised how quickly we've been able to design and build it," he says.
Platforms Matter
Russell Taga is a VP of engineering at Howcast, a startup trying to capitalize on the explosion of Internet video by specializing in the "how-to" niche. Howcast keeps its catalog of videos in a cloud service run by Engine Yard; 90% of the content is made by contributors not employed by Howcast. It also links to videos elsewhere on the Web, including YouTube.
Howcast builds Ruby-based apps, which proved to be an important factor in choosing its cloud vendor. The startup's Web applications let people search for, create, and edit videos. For it to succeed, however, Taga believes his firm must make it easier for people to produce and air high-quality how-to videos, so he's focused on developing better Ruby-based online applications to aid amateur video makers.
Engine Yard employs leading Ruby developers such as Yehuda Katz, and Howcast is able to tap into that expertise. Taga's company started out as a Java shop, but found it took too long to build and revise code in Java. Many of his developers were familiar with Ruby on Rails as a framework supported language that allows frequent apps changes. Engine Yard meets the table stakes requirement of a cloud provider: "They're stable and keep us up and running," Taga says. At the same time, "they're in touch with the latest software," serving as a trusted adviser as it pushes Ruby in new directions.
Shape The Cloud's Future
Don't like what you see in the cloud? Change it.
Amazon, Google, Microsoft, and others show a keen desire to address unmet needs. The environment's changing fast. Amazon says new services such as Elastic Block Storage and new types of servers such as Cluster Compute Instance came from feedback from developer customers. In the weeks that we researched this story, App Engine's Datastore problems got ironed out enough that Schedule Bin's Spitkoski went from doubtful about his future with Google to being an enthusiastic endorser of App Engine--though one who now stays open to alternatives.
EBay has just emerged as a strong partner of Microsoft's in shaping an internal cloud appliance suitable for building private clouds. The online auction site is looking several years into the future, toward making its IT infrastructure--which handled about $60 billion of auctions last year--easier to manage and more scalable. EBay wants more cloud-like characteristics in its data centers, so resources can be managed as a unit of pooled, virtualized servers and storage.
Yet eBay doesn't see a public cloud infrastructure as viable for its computing needs in the near term. "A lot of today's [public] cloud isn't capable of operating in the mission-critical space," such as transaction processing, says technology VP Barrese.
With Microsoft as its cloud partner, eBay gets someone else to build that environment, while keeping open the option of hybrid environments--and not necessarily only from Microsoft Azure. "There's a lot of potential for Microsoft to set a cloud standard," says Barrese.
Early cloud implementers, even on a much smaller scale, should heed Barrese's assumption that any cloud supplier is a close business partner, looking for and able to accept direction from the customer. There are a lot of mistakes being made and lessons being learned. Vendors are as new at delivering cloud services as customers are at using them, and may prove surprisingly malleable to committed customers. "This is a journey," says Barrese. "We're still in an early day of cloud computing."
Charles Babcock is editor at large for InformationWeek and author of the book "Management Strategies For The Cloud Revolution."