Is Workday's 15-Hour SaaS Outage Acceptable? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud
Commentary
10/9/2009
12:47 PM
50%
50%

Is Workday's 15-Hour SaaS Outage Acceptable?

On Sept. 24, Workday's SaaS service for human resources, financial applications and payroll was down for 15 hours. That's right, not 15 minutes, not 1.5 hours, but 15 hours. Google Gmail is down for 90 minutes, as it's as if the world has come to an end. So it begs the question: Is 15 hours' downtime for core applications such as accounting and HR acceptable?

On Sept. 24, Workday's SaaS service for human resources, financial applications and payroll was down for 15 hours. That's right, not 15 minutes, not 1.5 hours, but 15 hours. Google Gmail is down for 90 minutes, as it's as if the world has come to an end. So it begs the question: Is 15 hours' downtime for core applications such as accounting and HR acceptable?The day after the outage, Workday Co-CEO Aneel Bhusri posted a blog explaining to customers what happened, but it wasn't until this week that the ERP-focused blogosphere and twitterers began discussing the incident. In a blog posted Thursday, software consultant Michael Krigsman took pangs to point out that Workday did a nice job of damage control, daring to say that the outage was actually about "a success and not a failure."

Workday gave Krigsman the phone number of marquee customer Manjit Singh, CIO of Chiquita Brands, who told Krigsman:

"Outages are never good, but they do happen. Workday's communication was fantastic: they kept us informed of the problem, steps they were taking to resolve it, and expected time to solution."

Well, glad to hear it. I've gotten to know the folks at Workday, and they're all very smart, nice, committed and hardworking people. The co-CEOs and founders, Aneel Bhusri and PeopleSoft founder/billionaire Dave Duffield, are earnest in their vision to bring positive changes to the world of enterprise software. But let's get down to the brass tacks and answer this question:

Is 15 hours of downtime acceptable? For Chiquita's Singh, it was tolerable.

"First, we lost the ability to process HR transactions during the normal course of that day's business. Second, and more significantly, we were preparing to go live with our Costa Rica implementation, so this outage had the potential to delay our schedule. However, we worked around it and went live as planned."

But Singh isn't using Workday's financial applications. It seems to me a 15-hour outage could affect payments going out, payments coming in, payroll and other important financial processes. Workday describes its cash management SaaS here:

Cash Management automates the coordination and control of cash-flow activity, automates administrative and control activities such as bank statement reconciliation, and provides business intelligence.

Is it acceptable to lose that cash management for 15 hours?

This is Bhusri's explanation of the outage on his blog:

Yesterday, the network attached storage (NAS) device that stores operating system files for our production servers detected a corrupted node within a backup RAID array. Rather than simply log the error, which is what it is supposed to do, the NAS took itself off-line. It is ironic that the redundant backup to a system with built-in redundancy caused the failure.

This type of error should not have caused the array to go offline, but it did. The most important result is that our failover plans worked as expected. Within hours, all customers were live in our secondary datacenter with all their data intact.

We've tested our failover plans many times, but this is the first time we did it for real. We've learned quite a bit in the process - some of it technical, some of it regarding communications with customers. That knowledge will be used to further refine our datacenter practices, our hardware choices, and our failover plans so that we can do even better in the future.

We all know that companies that run their own HR and financial apps can also have service outages. But this 15-hour outage raises some interesting questions about how CIOs will feel about it when the problem is the vendors, and not theirs. Some may pull out their hair over their lack of control with the matter and start looking for an exit strategy. Others may feel some relief that they don't have to deal with it.

In fact, this is what Dave Duffield told blogger Vinnie Mirchandani:

"Unbelievably, I got emails from couple of our customers basically saying, 'Better you than me.' They are so glad they are not being woken up middle of the night. That's our job now."

Interestingly, this is Merchindani's take on it: "If on-premise vendors were not concerned about SaaS and cloud vendors, this episode should be a loud wake-up call." Again, another software consultant giving props to Workday for handling the situation well. And it's easy to like Workday; it's the little guy trying to make a difference. What if this happened at Oracle, SAP, Salesforce.com, or heaven forbid, Google? The press and blogosphere would be on it like white on rice.

So no matter who the vendor is, the question still lingers in my mind…is 15 hours acceptable for your ERP to be down, especially when it's someone else who's handling it, and not your internal IT team?

Would love to hear others' thoughts on this matter.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
COVID-19: Using Data to Map Infections, Hospital Beds, and More
Jessica Davis, Senior Editor, Enterprise Apps,  3/25/2020
Commentary
Enterprise Guide to Robotic Process Automation
Cathleen Gagne, Managing Editor, InformationWeek,  3/23/2020
Slideshows
How Startup Innovation Can Help Enterprises Face COVID-19
Joao-Pierre S. Ruth, Senior Writer,  3/24/2020
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
IT Careers: Tech Drives Constant Change
Advances in information technology and management concepts mean that IT professionals must update their skill sets, even their career goals on an almost yearly basis. In this IT Trend Report, experts share advice on how IT pros can keep up with this every-changing job market. Read it today!
Slideshows
Flash Poll