Microsoft Azure Storage Service Outage: Postmortem - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Comments
Microsoft Azure Storage Service Outage: Postmortem
Threaded  |  Newest First  |  Oldest First
nasimson
50%
50%
nasimson,
User Rank: Ninja
11/20/2014 | 12:50:54 PM
To err is human, to tolerate is machine.
> Amazon's most serious outage occurred on Easter weekend in 2011 when
> a storage network line was inadvertently choked off by human error.

In this day and age, it seems surprising that systems are not so fault tolerant that these can get choked by "human errors". Quite surprising. Machines by now should have become intelligent enough to not to get disrupted by human errors. To err is human, to tolerate is machine.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
11/20/2014 | 3:54:25 PM
Computer overlords?
Computers are still dumb in many ways, unable to tell humans when they're about to make a mistake. It's our job to test and test again before going live. Sounds like the "Flighting" project wasn't adequately tested. That's quite a data gap to overcome. Let that be a reminder -- one of many cloud customers have had -- that cloud infrastructure is as fallible as their own data center infrastructure.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
11/20/2014 | 5:17:48 PM
Source of cloud failures
In both the Amazon Easter outage and Microsoft Leap Year outage, an initial small human error lead to automated systems drawing the wrong conclusions and setting off a chain reaction that for all practical purposes forze up parts of the infrastructure. Clouds experience failures, the same as enterprise data centers, yes, but the failures are different and I think the operators are learning from them. Still not a foolproof proposition.
Thomas Claburn
50%
50%
Thomas Claburn,
User Rank: Author
11/20/2014 | 7:06:05 PM
Re: Source of cloud failures
Given the fallibility of people, I wonder whether the headlines for these types of stories should be more along the lines of Software & People Still Prone To Error.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
11/20/2014 | 10:33:58 PM
Automated management adds unforeseen complications
There's much about this outage that hasn't been fully explained. To be fair, Microsoft hasn't pretended to offer up its full post mortem yet. But I'm looking at data from third parties that says the trouble started at all Azure data centers simultaneously but didn't affect them all the same way. I'd like to learn more.


The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Commentary
Where Cloud Spending Might Grow in 2021 and Post-Pandemic
Joao-Pierre S. Ruth, Senior Writer,  11/19/2020
Slideshows
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
White Papers
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll