Comments
Microsoft Explains Exchange Outage
Oldest First  |  Newest First  |  Threaded View
rradina
50%
50%
rradina,
User Rank: Ninja
6/30/2014 | 11:52:59 AM
Short Network Loss Overloads Networking Elements
If minor network blip can create a traffic storm for which they aren't prepared, what happens if connectivity is lost to entire data center for several hours?

What happens if there's nothing wrong with MS data centers but a major fiber cut renders a large geographic swath of customers unable to connect and when repaired, the "traffic storm" takes out cloud e-mail for everyone?

Are they running their network capacity that close to 100%?  I'd think they'd have Hoover dam spillway sized pipes and paying for the potential to burst even higher if traffic warrants it.

The services likely run on thousands of virtual servers.  From a network perspective, it sounds like they should better-segment the traffic so they can perhaps shape the traffic and contain the storm.
Laurianne
100%
0%
Laurianne,
User Rank: Author
6/30/2014 | 12:50:37 PM
Re: Short Network Loss Overloads Networking Elements
rradina, I understand your surprise. If anyone could burst up to extra capacity when traffic spikes, you'd think it would be the likes of Microsoft. It could be an Azure success story in that case.
cafzali
50%
50%
cafzali,
User Rank: Moderator
6/30/2014 | 1:43:05 PM
E-mail over cloud
I wonder how many of these situations have to occur before people stop relying on the "all in one" solution providers for productivity applications? While it's true that any e-mail server can fail, it seems as if companies selling all-in-one solutions seem to particularly be prone to failures. 

People once thought of Blackberry e-mail as "rock solid," until they had a few outages lasting multiple hours at a time. Like Microsoft, their main selling point was reliability. 
Number 6
100%
0%
Number 6,
User Rank: Moderator
6/30/2014 | 1:54:31 PM
If a Network Doesn't Fail in a Forest...
Here's a thought. What if Microsoft really can, and does, handle most network spikes without any noticeable delays or outages? We know when something fails, but how do we know when a Plan B does work? I'm not saying that's the case, but we wouldn't know if it was, would we?
vnewman2
50%
50%
vnewman2,
User Rank: Ninja
6/30/2014 | 2:26:01 PM
Re: If a Network Doesn't Fail in a Forest...
The communication on this issue from MSFT was poor, which heightened the frustration from the masses I think. Although I wasn't in the office that day (whew) here's the email we received from our SPAM company, MIMECAST. "Mimecast has identified that Office 365 servers may be issuing intermittent "4.3.2" deferrals for inbound messages. Mimecast services are working correctly and emails sent to these servers will continue to queue. Office 365 customers should contact Microsoft directly to report and investigate the issue." At least someone is looking out for us.
pcharles09
50%
50%
pcharles09,
User Rank: Ninja
6/30/2014 | 6:14:48 PM
Re: If a Network Doesn't Fail in a Forest...
I feel bad for all the in-house IT guys/gals that had to deal with that. Where I work, there's a lot of remote users. I can imagine the headaches the internal folks dealt with.
rradina
50%
50%
rradina,
User Rank: Ninja
6/30/2014 | 8:07:58 PM
Re: If a Network Doesn't Fail in a Forest...
But their explanation doesn't jive with customer experience.  The "external" network issue only lasted a few minutes and yet cascaded into an all-day outage.  That sounds a freeway with so much traffic that a 15 minute flat tire on the shoulder creates a parking lot that takes all day to dissipate.  If you were in the shipping business, would you ever route deliveries on such a freeway?

I don't see any way to spin this positive unless we're still missing information such as a DDOS attack or some kind of rabid SPAM event.

One would also expect the world's largest software company whose goal is to be the world's largest cloud resource to have a plan C and probably even a plan D.  I also don't think it's unreasonable to expect that when plan A fails, a task force convenes and starts working on plan E and plan F -- possibly skipping plan C and D because they've come up with a specific response that solves the issue.

Imagine what might happen to a retailer that relied on Microsoft for credit card payments?  Why would a service provider of this claimed caliber assume e-mail is such a casual service?
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/30/2014 | 8:25:03 PM
Cloud transparency, round and round it goes
In two cases, Microsoft's Leap Day outage in 2012 and a later outage, it was pretty forthright on the occurrence and cause. In this case, some of that transparency is going down the drain.
PaulS681
50%
50%
PaulS681,
User Rank: Ninja
6/30/2014 | 8:40:52 PM
Re: Short Network Loss Overloads Networking Elements

Its kind of mind boggling the MS isn't prepared for spikes. We use office 365 and all i noticed that day was lync bouncing up and down. For whatever reason email wasn't affected much.

Li Tan
50%
50%
Li Tan,
User Rank: Ninja
7/1/2014 | 9:31:18 AM
Re: Cloud transparency, round and round it goes
This is one critical thing about cloud based stuff. The cloud must be 24x7 up and running. No outage is really tolerable, which is quite different compared to old enterprise software days - we can at least allow some maintenance window. This is a challlenge for both development and operation personel.


Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.