Cloud Outages: Who's at Fault? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

08:00 AM
Connect Directly

Cloud Outages: Who’s at Fault?

Cloud outages do happen. So, how can you and your IT group minimize the impact on the company?

Imagine a scenario where the unthinkable happens. Your company’s cloud provider suffers a major outage that grinds business to a halt. While the IT department and CIO are busy placing blame of the outage on the provider, the rest of the organization is likely to blame the internal IT staff for the disruption in service. So that begs the question, where does the responsibility for outages lie, and what can be done to mitigate outage risk?

An “unthinkable” scenario of a catastrophic cloud outage isn’t nearly as far-fetched as some might think. Want some recent proof? How about Amazon’s massive 5-hour AWS outage that occurred in February? This impacted things such as Quora, GigHub and Docker. Another recent example is when Rackspace faced a 3-hour worldwide cloud outage that impacted popular SaaS products including Cisco Spark.

Image: Pixabay/metsi
Image: Pixabay/metsi

Clearly, cloud outages remain a fairly common occurrence. IT leadership must recognize this and understand that they aren’t off the hook when it comes to outages that could impact a company’s bottom line. When you outsource infrastructure management to a third-party cloud provider, you’re trusting that the provider will adhere to the level of accessibility as outlined in their service level agreement (SLA). But, it’s important to note that a transference of trust in supporting underlying network components is not a blanket transference of responsibility when outages occur. For IT departments, the SLA is your first line of defense. If an SLA does not meet your requirements, it’s up to you to seek out providers that offer more robust solutions with higher penalties if the agreement level is breached.

Beyond the SLA, there are plenty of other ways that cloud customers can limit the impact of a major service provider outage. One way is to leverage a hybrid cloud approach where you load balance between on-premises and public cloud resources. That way, an outage in one segment of the infrastructure will not completely knock your applications offline. Multi-cloud strategies are also becoming popular. This is especially true now that administrators have a wide array of multi-cloud management platforms that significantly reduce the effort required when working inside differently-architected cloud environments.

For those of you that already have robust plans and network designs in place that sufficiently reduce the impact of a potential public cloud outage, I have one more question for you: What is your strategy surrounding uptime of shadow IT applications?

Even though an employee or department skirted standard operating procedures for application usage on the corporate network, it remains the duty of the IT department to track down these apps and do whatever is possible to manage accessibility risk. This is where a shadow IT outreach program could be used to identify and wrap protection around unauthorized applications that remain critical to the business.

The one caveat to cloud outage risk mitigation is that it’s not going to be free. Cloud service providers that offer more robust infrastructures and improved SLA’s are going to demand a premium price. So too is the time and money spent implementing hybrid, multi-cloud and other cloud resiliency protocols and procedures.

If the business makes the universal decision to not pay for this type of risk mitigation, that’s one thing. But IT departments and IT leadership must, at minimum, perform their due diligence and provide a cost/benefit analysis based on the probability that a cloud outage will economically impact business operations. In some cases, that economic impact will be so low that the added time and money spent to bolster cloud resiliency is not worth the investment. But for most, at least some form of added protection will be money well spent. It’s simply up to the IT department to determine exactly where that level of protection should be.

Andrew has well over a decade of enterprise networking under his belt through his consulting practice, which specializes in enterprise network architectures and datacenter build-outs and prior experience at organizations such as State Farm Insurance, United Airlines and the ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
Time to Shift Your Job Search Out of Neutral
Jessica Davis, Senior Editor, Enterprise Apps,  3/31/2021
Does Identity Hinder Hybrid-Cloud and Multi-Cloud Adoption?
Joao-Pierre S. Ruth, Senior Writer,  4/1/2021
White Papers
Register for InformationWeek Newsletters
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll