Cloud // Infrastructure as a Service
11:26 AM
Dave Methvin
Dave Methvin
Connect Directly
Repost This

Go Cloud Or Go Home

Cloud computing can be diluted into uselessness when mixed with immature technologies and poor practices.

At first glance, this past week was a disaster for cloud computing PR. A problem--make that a meltdown--in an East Coast Amazon Web Services data center caused hundreds of websites to be down for at least a full day, and sometimes more. It wasn't exactly a "Yay, cloud!" moment.

Although we won't know all the details until Amazon Web Services gets out of crisis mode and has an opportunity to publish a post-mortem, it seems that the problem started with a service called Elastic Block Storage (EBS). Amazon's description is that "Amazon Elastic Block Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 [Elastic Compute Cloud] instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage." In essence, EBS lets you attach a "portable" hard disk to your virtual server without needing to have it physically attached to that server.

Initially, it might appear that this could be a classic Single Point of Failure (SPoF) where EBS was the culprit. One of the problems with cloud computing today is that mere mortals have a hard time knowing all the places where a SPoF can occur in the cloud. From the outside it may appear that you've covered all the bases as far as redundancy is concerned, but it often isn't that easy. The more virtual and indirect the environment, the worse the problem gets. Let me give you an example.

Years ago when I did software development in the telecommunications business, a customer came to our company looking for a backup data connection for their options trading firm. We were glad to provide one, and things went well for several months as they rarely used the capacity for anything more than testing. Then one day the customer's primary connection on AT&T went down when a backhoe ripped through the fiber-optic cable, so they switched over to us. But our connection was down too. It turns out that we had bought capacity from AT&T -- their supposedly redundant line was going through the very same fiber as their main connection! But that wasn't visible to the customer.

Although the Amazon problem indeed seems to have started with a failure of just the EBS service in one data center, early information seems to be that this resulted a cascading widespread failure in Amazon's data centers, caused by congestive collapse. As Amazon customers noticed that their servers were failing, they were in the dark about exactly why the failures were occurring. So they tried starting new instances, moving their data to other zones in Amazon's network, and all kinds of activity that only added to the congestion in the network. So now the problem was not just EBS, but the traffic jam caused by people trying to get around EBS failures.

Despite this turbulent April shower in the cloud last week, the industry can't give up on cloud computing. As the largest provider of cloud services, Amazon was the most likely to fall victim to a problem like this. Perhaps it's an architectural problem with EBS; if so I'd expect that Amazon will determine that in the post-mortem and come up with changes or procedures to make sure the problem doesn't happen again. It doesn't make sense for most companies to be in the business of running data centers and managing PCs full of precious data that must be backed up to prevent catastrophe. Companies should be able to focus on their own lines of business and manage information, not computers. Cloud computing can help companies do that.

1 of 2
Comment  | 
Print  | 
More Insights
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.