Cloud // Platform as a Service
News
5/3/2011
08:16 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

VMware Cloud Foundry Suffers Service Outage

As the beta development platform was recovering from a minor power supply problem, human error worsened the setback.

11 Epic Technology Disasters
(click image for larger view)
Slideshow: 11 Epic Technology Disasters
Assembling a distributed computing architecture in the cloud isn't easy to do. The more resources you try to bring together, the more that can go wrong. No, you're not hearing about Amazon's recent EC2 outage again. This time it's VMware.

VMware recently launched a development platform as a set of services in its CloudFoundry.org, a new developer's hosting service. On April 25, the Cloud Foundry experienced service disruption. In trying to recover later that day, it suffered an outage that continued into April 26.

Not only is VMware finding it more difficult than anticipated to keep a cloud up and running, it's sharing another experience with its much bigger and better established fellow cloud supplier: It's refusing to talk about the mishap other than what's presented in an official and carefully presented blog.

"The Cloud Foundry blog is the best resource for you, which details the account of the outage. VMware is continuing to keep that updated regularly to maintain transparency with the community," was the response from VMware to a request for more information Tuesday.

The blog cited is Dekel Tankel's, one of the primary builders and managers of CloudFoundry.org. In a blog posted April 29, he said the trouble started at 6:11 a.m. April 25, when a power supply in a storage cabinet experienced an outage. That deprived users of access to a single logical unit number (LUN), or identifier of a disk or set of disks, in Cloud Foundry. The power supply malfunction wasn't an unexpected event. Clouds are designed to detect and survive lost power supplies, either by invoking a redundant source or by routing around them using a backup copy.

"While not a 'normal event,' it is something that can and will happen from time to time," he wrote, and VMware thought it was prepared for it.

But, Tankel continued in the blog, "In this case, our software, our monitoring systems, and our operational practices were not in synch," he noted. The loss of a LUN was an event "that we did not properly handle and the net result is that the Cloud Controller declared a loss of connectivity to a piece of storage that it needs in order to process many control operations."

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Google in the Enterprise Survey
Google in the Enterprise Survey
There's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity ­products, and 69 percent cite Google Apps' good or excellent ­mobility. But progress could still stall: 59 percent of nonusers ­distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.