9 Spectacular Cloud Computing Fails - InformationWeek
07:06 AM
Connect Directly
Moving UEBA Beyond the Ground Floor
Sep 20, 2017
This webinar will provide the details you need about UEBA so you can make the decisions on how bes ...Read More>>

9 Spectacular Cloud Computing Fails

For some of you, the cloud failures listed here may simply highlight areas where cloud service providers need to grow or adapt in order to better service their customers. For others, the examples may be more personal, as your data or applications may have been affected.
1 of 11

(Image: Geralt via Pixabay)

(Image: Geralt via Pixabay)

Most of us have heard about at least one spectacular cloud failure, and some of us have been directly affected by one. While cloud technologies and security mechanisms continue to mature, they still suffer the same types of issues as in-house infrastructures. The primary difference, however, is that cloud failures impact many more users than an in-house problem would and therefore have greater visibility when problems occur.

Failures that plague cloud service providers tend to fall into one of three main categories:

  • "Beginner mistakes" on the part of service providers. This is when the provider starts out or grows at a rate faster than can be properly managed its by data center staff. Cloud giants, including Amazon Web Services and Google Compute, were often plagued with outages early on as each company grew at incredible rates. Even Microsoft, which entered the cloud game later than the others, ran into outage problems early and often.
  • Security flaws that hackers eventually expose. Because clouds are massive in scale compared with private data centers, they're much bigger targets for hackers. Cloud companies learned this lesson in a series of mistakes that exposed customer data on several occasions. Security breaches are a major concern among cloud computing customers, and many enterprise organizations are wary of handing over data protection responsibilities to a third-party service provider.
  • Poor processes within the cloud. Processes such as inadequate security audits, poor backup procedures, and administrators with inappropriate access to servers are all procedural problems that could be avoided. Unfortunately, these non-technical problems commonly are overlooked until it's too late.

For some of you, the cloud failures listed here may simply highlight areas where cloud service providers need to grow or adapt in order to better service their customers. For others, the examples may be more personal, as your data or applications may have been affected. Either way, we'd love to hear which cloud failures you found to be the most spectacular and why. Tell us all about it in the comments section below.

Andrew has well over a decade of enterprise networking under his belt through his consulting practice, which specializes in enterprise network architectures and datacenter build-outs and prior experience at organizations such as State Farm Insurance, United Airlines and the ... View Full Bio

1 of 11
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Moderator
4/22/2017 | 4:19:22 AM
I agree with you, you wrote very right words
User Rank: Apprentice
7/28/2015 | 2:01:08 PM
"While many of the outages listed in this slideshow could have been avoided, they're really par for the course."

Hindsight is 20/20, right? Of course looking back on these issues, you can point out what could have been done to avoid such a large-scale outage. This is why proper planning and continuous monitoring of network environments is so important for a continuous, smooth IT operation.

User Rank: Ninja
7/28/2015 | 1:31:00 PM
Bungles are bungles not just on the cloud
These mostly seem to be computer errors not cloud errors. Most of due to people not leanring from previous mistakes (Microsoft nd the cloud) to Joynet's reboot, which I can't quite figure out how they managed that level of bungle. The problem is, unfortunately usually the people involved that make poor decisions or faulty code. How spectacular it is depends not on the the bungle itself but on how many people are affected and by how much.
User Rank: Author
7/21/2015 | 1:59:20 PM
Re: 9 Spectacular Cloud Computing Fails
Good point, @jagibbons. It's true, having more scrutiny magnifies the impact of its failures even more than its successes.
User Rank: Apprentice
7/17/2015 | 11:41:03 PM
Process Trumps eGRC Software
Looking back at each of the cloud "fails" described here, there's not much argument that what they all share a common "root cause":  a misalignment between Process-People-Technology-Culture. Cloud providers can overcome the challenges of scaling up to meet demand, the engineering expertise to grow massive cloud infrastructure is no longer a pioneering the frontier undertaking. But inherently, "scalling up" processes to a similar degree lags well behind engineering. The large eGRC vendors have all done a superb job of using the 'shinny new toy' trick on cloud service providers, lulling them into the false and dangerous miapprehension that 'data management" is the same thing as 'process maturity'. The rude awakening that  AWS, Microsoft, IBM, and Verizon have coming is you can't 'scale up' process the way you scale up hardware and software infrastructure. 

For example, take a core cloud service - Identity and Access Management. This is by far one of the most critical cloud services providers MUST get right. Yet eGRC tools, without exception, regardless of their dashboards, managment consoles, incident triggers, policy enforcement, and event monitoring solutions which are trumpeted loudly by eGRC vendors fail to mention there's nothing in their offering that can 'scale' this service in a engineering way because it depends exclusively on how a gievn organization's organic processes for identity and access works. Sidestepping the issue by shoe horning an organization's people down a NIST SP800-53 control baseline works only as a 'one size fits all' solution. 
User Rank: Strategist
7/17/2015 | 4:21:03 PM
You left off Microsoft irresponsible destruction of the Danger cloud for the T-Mobile sidekick.
They claimed they compensated everyone and that they recovered all the data.  In fact many of us, including myself permantantly lost over 1/3 of my address book and other data.  I also never received any compensation for my loss.

The inside scope was that a Microsoft VP told them to not backup the SAN before doing a SAN microcode upgrade.

The foolish move caused a cascade of failures which disabled our phones for quite a while.  The recovery meant that you had to manually download the subset of the data you had lost that they had found and re-import it to your phone.  In fact 1/3 of my data never was found and I was days without my address book, eventually I found a backup from when I had used a Palm V and recovered all of the older entries.  Too add insult to injury they claimed we were compensated, and that all the data was returned.  Those I knew that used the early sidekick phones were never compensated and never got back all our data.  The level of disregaurd for maintaining the integraty of customer data, was so apollying that I wrote Microsoft off as a cloud supplier.  When the brought up their own cloud the experience showed no evidence that they had learned or cared.

 Some have tried to argue these are just growing pains, but running a cloud requires a production operations mind set, and requires extreme attention to redundancy, fail over, and recovery.  Those who think they can stand up a bunch of servers and get buy without well planned fault tolerance and redundancy, or casual security or operational mechanisms will fail, and take down their customers with them.  Most cloud providers are little more than ISPs selling VMs and hoping their remote admin team is good enough.  Those are not really cloud providers, companies to do not have the right mindset and operational experience most be avoided.  Player the fail the same way more than once should be written off forever as the problem is normally a cultural or leadership failure that is unlikely to ever be fixed.
User Rank: Ninja
7/17/2015 | 3:17:22 PM
Re: 9 Spectacular Cloud Computing Fails
I tend to agree that the reason these ended up so spectacularly big is because cloud was/is the buzzword of the day. For the most part, these are the kinds of problems that any network or system can have, be it large or small. My network of 150 servers can be crippled by an admin playing around with a script pointed in the wrong place.

More importantly, while these were big and newsworthy, there are thousands of successes for each of these mammoth failures. There are hundreds or thousands of large, medium and small companies leveraging the cloud in a way that is critical for their success. And, they are succeeding, making money and driving economic growth. What these failures should do, which I guess didn't help in the case cited about Microsoft Azure, is help the rest of us find the possible failure points and address them before they reach critical mass.
User Rank: Apprentice
7/17/2015 | 4:25:23 AM
Huge failures
Cloud has and will always have some security faillures thats the rule. It is the job of companies and Cloud services to try to minimize it and as it has been said, Lastpass was a good example for that. In my opinion, Cloud services is a really great thing and a great step forward since few years but its always tough to manage this security issues when it comes to huge companies and huge amount of service / ressources.
User Rank: Ninja
7/16/2015 | 1:49:09 PM
Re: 9 Spectacular Cloud Computing Fails
There are definitely some embarassing bungles on here, but looking at the list in aggregrate, I don't know how many of them can really be chalked up to a cloud-specific problem. Anybody could patch something at the wrong time or leave a security flaw unfixed for too long - these sound like routine IT problems. Your non-cloud service provider could have an outage, or you could have an outage. As long as your cloud provider discloses what was compromised in a timely manner (which is becoming more commonplace and legally-mandated), is it really all that different? People may sleep better at night knowing that they're the ones responsible for their own breaches, but that sounds a lot like a placebo effect.

LastPass, for example, sound like they did everything by the books and then some to minimize the damages. It's inevitable that they'd be a target for hackers. Netflix outages always make headlines - but, Netflix's continued dominance in spite of that fact may serve as evidence of cloud's long-term reliability, not knocks against it. As for healthcare.gov? Well, truth be told, I didn't even know Verizon offered a cloud hosting service that was comparable to Azure or AWS... I suppose there's a lesson in there about going with the lowest bidder, and the government's practices therein. That's a big takeaway here for me - whatever the cloud does, what it can't do is mitigate or increase human stupidity (or maliciousness).
User Rank: Ninja
7/15/2015 | 8:08:19 AM
Failing big
In most of these you can look at the failure and say that the company was on the leading edge if not bleeding edge and made some small mistakes that hurt them.  A couple though are really ambitious failures not by the company but by individuals the Joyent  example is just mind boggling.  Was someone playing with a script that they thought was pointed at a development environment?  The Cloudflare example was another one, automated updates that someone set and warning bells should have been going off.  The others I can kind of understand even Dropbox's security issue, someone lower down the line probably slipped up with a chunk of code but at least they didn't remove all security...
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll