Windows Azure Outage Avoids Xbox One Catastrophe - InformationWeek
Cloud // Infrastructure as a Service
02:06 PM
Connect Directly

Windows Azure Outage Avoids Xbox One Catastrophe

Microsoft's cloud services, including Xbox Live, were disrupted Thursday due to a DNS error.

Services connected to Microsoft's Windows Azure cloud suffered a disruption Thursday -- the second interruption in a less than a month. Online reports indicate impacted services included,, Office 365, and Xbox Live. Microsoft had resolved most of the problems by Thursday night, avoiding the potentially cataclysmic possibility that Xbox Live would be down when Xbox One units went on sale just after midnight Friday morning.

The disruptions began at 2:22 p.m. PT and stretched across multiple regions. Microsoft corporate vice president Scott Guthrie confirmed via Twitter that the problem did not involve Azure itself. Rather, "The problem is a DNS name server issue outside of azure." Microsoft said Thursday evening that Azure was running normally.

As of Friday morning, the Azure service dashboard showed most services were functioning as intended, though partial interruptions were plaguing compute functions in Asia, Europe, and the US. Despite the outage, Windows Azure has generally proved as reliable as its competitors, many of which have also endured widespread disruptions. Amazon, for example, suffered a major failure over Easter weekend in 2011.

Glitches that knock multiple regions offline are especially rare because Microsoft, Amazon, and other major cloud providers typically organize datacenters into clumps -- or "stamps," in Microsoft parlance -- of 1,000 servers each.

[ What does it take to maximize the cloud? Read The Cloud Transition Demands 3 Critical Characteristics.]

These stamps include independent power, networking, and storage infrastructure. Theoretically, this tactic stops a problem in one place from spreading to others, thus keeping things like Azure available even when problems inevitably arise.

As Guthrie's tweet implies, if a DNS failure was the culprit, Microsoft's stamps weren't part of the problem. Rather, Azure was operating as it should; customers just couldn't reach it.

Though Azure outages are rare, Microsoft has typically been transparent when they've occurred. The company published a technical report following its most notorious disruption, the Leap Day interruption on Feb. 29, 2012. In that case, faulty security certificates incorrectly indicated that servers were failing, which triggered the cloud's governing software to transfer virtual machines inappropriately. The fact that the new VMs carried incorrect certificates themselves exacerbated the issue. Microsoft deployed a fix within 10 hours.

Azure outage affected Xbox Live hours before Xbox One went on sale.
Azure outage affected Xbox Live hours before Xbox One went on sale.

Another significant outage occurred at the end of October. In that case, Azure GM Mike Neil told InformationWeek's Charles Babcock this week, the disruption stemmed from a bug in the API for staging systems. Neil said Microsoft will release its full analysis of the October incident this year. When a problem occurs, Microsoft focuses on restoring operations as quickly as possible to minimize the effect on customers. More in-depth forensic determinations, such as the root cause of the problem, are saved until later.

Some businesses remain hesitant to embrace the cloud due to concerns over security and reliability. Service disruptions such as the one that happened Thursday do little to persuade these skeptics. Nonetheless, Azure and the products it supports are among Microsoft's most promising assets.

Neil told Babcock that Microsoft's cloud is gaining 1,000 customers per day. The company reported in September that its Azure-backed Office 365 products were on pace to post $1.5 billion in annual revenue. Microsoft also said this year that more than 300,000 Azure servers would support enhanced Xbox One experiences.

Consumerization 1.0 was "we don't need IT." Today we need IT to bridge the gap between consumer and business tech. Also in the Consumerization 2.0 issue of InformationWeek: Stop worrying about the role of the CIO (free registration required).

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Ninja
11/27/2013 | 1:56:14 AM
Re : Windows Azure Outage Avoids Xbox One Catastrophe
When there is a DNS failure, it doesn't really matter if Azure is operating as it should be because it becomes inaccessible anyway. Businesses' concerns about the security and reliability of cloud are not very ill founded. Cloud has not matured to the level where it could be relied upon to shift everything on.
User Rank: Ninja
11/27/2013 | 1:56:11 AM
Re : Windows Azure Outage Avoids Xbox One Catastrophe
Disruption problems aside, it is really heartening to read that Microsoft has been transparent about such disruptions which is a rarity among her other competitors. We have seen most of the organizations putting blanket over such issues. Adobe's security breach is the recent example. What these organizations don't understand is that transparency about such issues brings good name to the organization instead of stigma.
User Rank: Ninja
11/25/2013 | 7:54:15 AM
The silver lining
Aside from pointing out the flaws in DNS and how bad for their image it is for services to go down I think this actually has a slight up side.  The happened before the holiday season officially kicked off.  This week a large number of the heaviest Xbox live users will be at home and playing games.  Next month there will be a whole bunch of new players jumping online.  Sorting this out early means less bad press than if this had happened on say Christmas day like Apple saw with iTunes crashing for example. 
Lorna Garey
Lorna Garey,
User Rank: Author
11/22/2013 | 2:49:35 PM
DNS on the edge
The fragility of the DNS system is becoming increasingly evident -- any word on what exactly the problem was? Packet bombing by someone looking to discupt the Xbox launch?
Register for InformationWeek Newsletters
White Papers
Current Issue
The Next Generation of IT Support
The workforce is changing as businesses become global and technology erodes geographical and physical barriers.IT organizations are critical to enabling this transition and can utilize next-generation tools and strategies to provide world-class support regardless of location, platform or device
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll