Cloud // Infrastructure as a Service
News
11/22/2013
02:06 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

Windows Azure Outage Avoids Xbox One Catastrophe

Microsoft's cloud services, including Xbox Live, were disrupted Thursday due to a DNS error.

Services connected to Microsoft's Windows Azure cloud suffered a disruption Thursday -- the second interruption in a less than a month. Online reports indicate impacted services included Microsoft.com, Outlook.com, Office 365, and Xbox Live. Microsoft had resolved most of the problems by Thursday night, avoiding the potentially cataclysmic possibility that Xbox Live would be down when Xbox One units went on sale just after midnight Friday morning.

The disruptions began at 2:22 p.m. PT and stretched across multiple regions. Microsoft corporate vice president Scott Guthrie confirmed via Twitter that the problem did not involve Azure itself. Rather, "The problem is a DNS name server issue outside of azure." Microsoft said Thursday evening that Azure was running normally.

As of Friday morning, the Azure service dashboard showed most services were functioning as intended, though partial interruptions were plaguing compute functions in Asia, Europe, and the US. Despite the outage, Windows Azure has generally proved as reliable as its competitors, many of which have also endured widespread disruptions. Amazon, for example, suffered a major failure over Easter weekend in 2011.

Glitches that knock multiple regions offline are especially rare because Microsoft, Amazon, and other major cloud providers typically organize datacenters into clumps -- or "stamps," in Microsoft parlance -- of 1,000 servers each.

[ What does it take to maximize the cloud? Read The Cloud Transition Demands 3 Critical Characteristics.]

These stamps include independent power, networking, and storage infrastructure. Theoretically, this tactic stops a problem in one place from spreading to others, thus keeping things like Azure available even when problems inevitably arise.

As Guthrie's tweet implies, if a DNS failure was the culprit, Microsoft's stamps weren't part of the problem. Rather, Azure was operating as it should; customers just couldn't reach it.

Though Azure outages are rare, Microsoft has typically been transparent when they've occurred. The company published a technical report following its most notorious disruption, the Leap Day interruption on Feb. 29, 2012. In that case, faulty security certificates incorrectly indicated that servers were failing, which triggered the cloud's governing software to transfer virtual machines inappropriately. The fact that the new VMs carried incorrect certificates themselves exacerbated the issue. Microsoft deployed a fix within 10 hours.

Azure outage affected Xbox Live hours before Xbox One went on sale.
Azure outage affected Xbox Live hours before Xbox One went on sale.

Another significant outage occurred at the end of October. In that case, Azure GM Mike Neil told InformationWeek's Charles Babcock this week, the disruption stemmed from a bug in the API for staging systems. Neil said Microsoft will release its full analysis of the October incident this year. When a problem occurs, Microsoft focuses on restoring operations as quickly as possible to minimize the effect on customers. More in-depth forensic determinations, such as the root cause of the problem, are saved until later.

Some businesses remain hesitant to embrace the cloud due to concerns over security and reliability. Service disruptions such as the one that happened Thursday do little to persuade these skeptics. Nonetheless, Azure and the products it supports are among Microsoft's most promising assets.

Neil told Babcock that Microsoft's cloud is gaining 1,000 customers per day. The company reported in September that its Azure-backed Office 365 products were on pace to post $1.5 billion in annual revenue. Microsoft also said this year that more than 300,000 Azure servers would support enhanced Xbox One experiences.

Consumerization 1.0 was "we don't need IT." Today we need IT to bridge the gap between consumer and business tech. Also in the Consumerization 2.0 issue of InformationWeek: Stop worrying about the role of the CIO (free registration required).

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
11/22/2013 | 2:49:35 PM
DNS on the edge
The fragility of the DNS system is becoming increasingly evident -- any word on what exactly the problem was? Packet bombing by someone looking to discupt the Xbox launch?
SaneIT
50%
50%
SaneIT,
User Rank: Ninja
11/25/2013 | 7:54:15 AM
The silver lining
Aside from pointing out the flaws in DNS and how bad for their image it is for services to go down I think this actually has a slight up side.  The happened before the holiday season officially kicked off.  This week a large number of the heaviest Xbox live users will be at home and playing games.  Next month there will be a whole bunch of new players jumping online.  Sorting this out early means less bad press than if this had happened on say Christmas day like Apple saw with iTunes crashing for example. 
SachinEE
50%
50%
SachinEE,
User Rank: Apprentice
11/27/2013 | 1:56:11 AM
Re : Windows Azure Outage Avoids Xbox One Catastrophe
Disruption problems aside, it is really heartening to read that Microsoft has been transparent about such disruptions which is a rarity among her other competitors. We have seen most of the organizations putting blanket over such issues. Adobe's security breach is the recent example. What these organizations don't understand is that transparency about such issues brings good name to the organization instead of stigma.
SachinEE
50%
50%
SachinEE,
User Rank: Apprentice
11/27/2013 | 1:56:14 AM
Re : Windows Azure Outage Avoids Xbox One Catastrophe
When there is a DNS failure, it doesn't really matter if Azure is operating as it should be because it becomes inaccessible anyway. Businesses' concerns about the security and reliability of cloud are not very ill founded. Cloud has not matured to the level where it could be relied upon to shift everything on.
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.