How to Architect for Resiliency in a Cloud Outages Reality

It's no secret that the cloud can be a fickle beast. Outages are all too common, and when they happen, they can cause massive disruptions for businesses. So how can you ensure that your business is safe in the cloud?

Cyril Plisko, Co-Founder & CTO, Statehub

March 11, 2022

4 Min Read
red error sign that something has gone wrong
Denys Rudyi via Adobe Stock

Ever since a typo took down AWS’s S3 server and brought most of the internet down with it, we all became aware just how delicate the cloud and the internet is. This
outage was so bad that Amazon employees couldn't even get onto their own dashboard to warn the world about what had happened.

While this major event shook the world, smaller scale outages constantly occur. This year is no different, with a slew of outages affecting cloud vendors from Amazon Web Services to Google Cloud, and Microsoft Azure.

For many IT teams, these events highlighted that something as small as a typo written by a programmer on the other side of the world has the power to severely effect their entire business. And depending on a business’s deployment choices and architecture, the results could be devastating.

This is leading companies to begin to take cloud outages into account when creating their business continuity plans but given the wide range of applications that are generally provisioned on public clouds, finding a way to reduce the risk of failure is proving to be difficult.

Cloud Outages Are Unfortunate, But Inevitable

No system is foolproof, and mistakes or random black swan events can derail even the most well-thought-out strategies.

Outages are an unfortunate but inevitable aspect of cloud computing, and every cloud vendor has had outages. outages will keep happening. It is a part of life.

And while many companies have incorporated cloud outages into their disaster recovery plans, others are still struggling to wrap their heads around the new risks that outages pose for their business operations.

Shifting Workloads With Cloud-Agnostic Architectures

One way businesses can protect themselves from cloud outage is by making their applications cloud-agnostic. This means that they are not dependent on any single cloud vendor and can shift workloads seamlessly between cloud vendors and regions in the event of an outage. Cloud-agnostic applications give businesses the freedom to choose the best vendor for their needs, and it ensures that data is always safe and available, even in the event one cloud provider gets knocked offline.

However, making applications cloud-agnostic can be a complex and expensive process.

Common sense says not to put all your eggs in one basket, so it’s logical to assume that by running data on multiple clouds, a business would be safer from a single outage.

This is the reason data resiliency in multi-cloud and distributed systems has become a hot topic recently. When key business solutions are architected to run across multiple cloud vendors and on-premises infrastructure, business leaders can rest assured with the knowledge that their data is safe and that their company will be able to continue running 24/7.

These outages are causing businesses to reevaluate how they deploy and architect their applications. The awareness that outages are inevitable is creating a healthy tension in the market as it forces people to think about how they build their software as well as act more responsibly and consider resiliency as a first-class concern.

For some companies, this means refactoring their applications to run across multiple public cloud vendors -- an important part of surviving cloud outages.

One way companies are making their data more resilient is by creating cloud agnostic applications, which enable their data the freedom to seamlessly shift workloads between cloud regions and vendors in the event of a disaster or outage.

Choosing cloud-agnostic architectures can provide companies with the peace of mind that their data is safe, no matter what problems happen to any of the vendors they are working with.

The Complexity is Prohibitive for Many Organizations

While the idea of adopting the cloud-agnostic type of architecture sounds great in theory, the solution is neither cheap nor easy to implement. It requires a lot of time along with highly skilled IT professionals to do it right.

It's difficult for a company to take a complex application that's been around for years and retrofit it to run across multiple clouds. The complexity and costs required can be prohibitive for many organizations, and the expertise needed to do it is quite challenging. However, there are ways to make it easier for the IT teams putting these new architectures in place.

Rather than building their own tooling, IT teams can find ways to implement multi-cloud infrastructures as a service (IaaS). Companies need to be able to increase their resiliency and adopt cloud-agnostic architectures. The next major challenge is therefore making multi-cloud so simple that people don't have to think about it.

Public cloud outages are inevitable and there is nothing we can do about it. However, what businesses can do is ensure their applications are cloud-agnostic and don't depend on a single vendor.

About the Author(s)

Cyril Plisko

Co-Founder & CTO, Statehub

Cyril Plisko, the Co-Founder & CTO at Statehub, is a known Linux and storage expert, with vast system administration and kernel development background. With over two decades of designing and creating complex enterprise storage systems, Cyril is now solving similar challenges in the modern cloud environment. Among his specialties are enterprise storage and network solutions, UNIX kernel development, and performance analysis and tuning.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights