Who Do You Blame For Cloud Computing Failures?

October 12, 2009

4 Min Read

Think of the one million T-Mobile Sidekick customers that may have lost important data last week. Think of the dozens of CIOs that anxiously waited for Workday to restore its SaaS service on Sept 24. Cloud computing has created a new era of accountability, and we must demand that tech vendors work harder than ever to prove their trustworthiness.In both of these instances, customers were completely dependent on their vendors to manage their data. And in both instances, technical failures are to blame. The growth of cloud computing is not going to let up-we're not going to suddenly start moving away from the Internet and speedy networks and store more data on our home PCs and company servers-so it's time that everyone, from consumers up to CIOs at the world's biggest companies, start asking questions and demanding accountability from their vendors.

Let's first look at the Sidekick issue. Microsoft Corp. subsidiary Danger, which provides data services for Sidekick, had a server failure in its data center. Over the weekend, T-Mobile wrote to customers that "personal information stored on your [mobile] device--such as contacts, calendar entries, to-do lists or photos--that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger."

Sidekick customers' data is stored on those Microsoft/Danger servers, so could this possibly mean what it sounds like…that there was no copy of customers' data, anywhere? Mirrored servers are a very common practice in data centers these days, made easier and cheaper through server virtualization. Had Microsoft/Danger not made that investment? As of Monday morning, the companies had not released details of the technical failure.

And who is to blame here? It's interesting that T-Mobile mentions Microsoft/Danger five times in its post, as if to subtly point out it's not the only responsible party. My guess is most customers not only didn't know their data was being managed by a Microsoft subsidiary, but probably never even thought about where their data was managed at all. You turn on your smartphone and make a phone call, or check your calendar. (Or not.) But as consumers get more comfortable with things like Google Apps on their home PCs, hosted email on their smartphones, and having mobile service providers manage their most important data, they need to start thinking a bit like CIOs.

Specifically, how has this vendor proved to me that it has made the technical investments to recover any of my lost data in the case of data center failures?

Meanwhile, the SaaS startup Workday, which has about 100 customers using its cloud-based human resources, payroll, and financial applications, had a 15-hour outage on Sept. 24. In this case, the back-up system in place worked-it detected a corrupted storage node-but then it took itself offline." It is ironic that the redundant backup to a system with built-in redundancy caused the failure. This type of error should not have caused the array to go offline, but it did," noted Workday co-CEO Aneel Bhusri in a blog. By some accounts, Workday handled the situation very well. But comments to a blog I wrote about the outage on Oct. 9 drew some interesting reader comments about who is responsible.

I pointed out in the blog that internal IT failures happen, too. Here's how one reader weighed in on that thought:

"If [a service failure] happens for a package directly supported by the company's IT staff, chances are they would be hung out to dry by the CEO and CFO. If it is the vendor, how much flack the CIO is going to take probably depends on who pushed for the choice of going with the SaaS in the first place."

Another reader said it gets down to the details of the service-level agreement between customer and vendor:

"If the contract was to guarantee a certain uptime per year - even with this outage they are still above 99% uptime. If the uptime is per month, this outage represents 98% uptime. Certainly an outage during 'normal' US hours would be more noticeable…. For a business-core application such as payroll, I am questioning why Workday does not have a hot failover...or if that also failed. I do rather agree that 15 hours is not really acceptable...especially if it disrupts the pay cycle."

And another says fingers should be pointed at more than one party:

"…Cloud providers like Workday need to be held to the same strict standards that a CEO would hold his/her internal CIO org to via the SLR/SLA. But…that CIO org still needs to be held accountable for the chaos and confusion of a cloud outage, because that org will now and forever be responsible for facilitating information-rich back office processes with technology. Backup strategy is STILL the responsibility of that CIO org."

Interesting observations. So who's responsible for T-Mobile's failure to recover your data? T-Mobile? Microsoft? The Danger subsidiary? The data center staff? The people who chose Sidekick without ensuring a data-protection guarantee?

It's a whole new frontier with cloud computing. Everyone needs to be asking the right questions.

About the Author(s)

Mary Hayes Weier

Contributor

See more from Mary Hayes Weier

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice