Cloud Outages: Who's at Fault? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud
Commentary
7/19/2017
08:00 AM
Connect Directly
Twitter
LinkedIn
Google+
RSS
50%
50%

Cloud Outages: Who’s at Fault?

Cloud outages do happen. So, how can you and your IT group minimize the impact on the company?

Imagine a scenario where the unthinkable happens. Your company’s cloud provider suffers a major outage that grinds business to a halt. While the IT department and CIO are busy placing blame of the outage on the provider, the rest of the organization is likely to blame the internal IT staff for the disruption in service. So that begs the question, where does the responsibility for outages lie, and what can be done to mitigate outage risk?

An “unthinkable” scenario of a catastrophic cloud outage isn’t nearly as far-fetched as some might think. Want some recent proof? How about Amazon’s massive 5-hour AWS outage that occurred in February? This impacted things such as Quora, GigHub and Docker. Another recent example is when Rackspace faced a 3-hour worldwide cloud outage that impacted popular SaaS products including Cisco Spark.

Image: Pixabay/metsi
Image: Pixabay/metsi

Clearly, cloud outages remain a fairly common occurrence. IT leadership must recognize this and understand that they aren’t off the hook when it comes to outages that could impact a company’s bottom line. When you outsource infrastructure management to a third-party cloud provider, you’re trusting that the provider will adhere to the level of accessibility as outlined in their service level agreement (SLA). But, it’s important to note that a transference of trust in supporting underlying network components is not a blanket transference of responsibility when outages occur. For IT departments, the SLA is your first line of defense. If an SLA does not meet your requirements, it’s up to you to seek out providers that offer more robust solutions with higher penalties if the agreement level is breached.

Beyond the SLA, there are plenty of other ways that cloud customers can limit the impact of a major service provider outage. One way is to leverage a hybrid cloud approach where you load balance between on-premises and public cloud resources. That way, an outage in one segment of the infrastructure will not completely knock your applications offline. Multi-cloud strategies are also becoming popular. This is especially true now that administrators have a wide array of multi-cloud management platforms that significantly reduce the effort required when working inside differently-architected cloud environments.

For those of you that already have robust plans and network designs in place that sufficiently reduce the impact of a potential public cloud outage, I have one more question for you: What is your strategy surrounding uptime of shadow IT applications?

Even though an employee or department skirted standard operating procedures for application usage on the corporate network, it remains the duty of the IT department to track down these apps and do whatever is possible to manage accessibility risk. This is where a shadow IT outreach program could be used to identify and wrap protection around unauthorized applications that remain critical to the business.

The one caveat to cloud outage risk mitigation is that it’s not going to be free. Cloud service providers that offer more robust infrastructures and improved SLA’s are going to demand a premium price. So too is the time and money spent implementing hybrid, multi-cloud and other cloud resiliency protocols and procedures.

If the business makes the universal decision to not pay for this type of risk mitigation, that’s one thing. But IT departments and IT leadership must, at minimum, perform their due diligence and provide a cost/benefit analysis based on the probability that a cloud outage will economically impact business operations. In some cases, that economic impact will be so low that the added time and money spent to bolster cloud resiliency is not worth the investment. But for most, at least some form of added protection will be money well spent. It’s simply up to the IT department to determine exactly where that level of protection should be.

Andrew has well over a decade of enterprise networking under his belt through his consulting practice, which specializes in enterprise network architectures and datacenter build-outs and prior experience at organizations such as State Farm Insurance, United Airlines and the ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Enterprise Guide to Digital Transformation
Cathleen Gagne, Managing Editor, InformationWeek,  8/13/2019
Slideshows
IT Careers: How to Get a Job as a Site Reliability Engineer
Cynthia Harvey, Freelance Journalist, InformationWeek,  7/31/2019
Commentary
AI Ethics Guidelines Every CIO Should Read
Guest Commentary, Guest Commentary,  8/7/2019
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll