Microsoft Azure Leap Day Outage Questions Persist - InformationWeek
Cloud // Infrastructure as a Service
10:03 PM
Connect Directly

Microsoft Azure Leap Day Outage Questions Persist

Is defective security certificate from a third party to blame for Azure outage? Microsoft won't say yet.

Microsoft continues to do root cause analysis on why its Windows Azure cloud identity management service failed to operate properly in three regions over eight hours on Feb. 29. As of this week, users remain in the dark regarding full outage details.

Microsoft's cryptic explanation so far said the cause of downtime for the Access Control Service "has been traced back to a cert issue triggered on 2/29/2012 GMT." In other words, a certificate processing software glitch occurred an hour and forty-five minutes after the start of the Leap Day, Microsoft said in its first explanation after the outage.

A spokesman said on Monday that conclusions from the follow up analysis "will come soon" and will be made public.

Microsoft spokesmen declined to confirm that it was a faulty security certificate, or who might have supplied the certificate. Security certificates can be issued by either a cloud service provider or by independent third parties, such as Symantec's Verify unit, GoDaddy, or Comodo. The bulk of security certificates are issued by the third parties.

[Want more background on Azure's leap day service outage? See Microsoft Azure Outage Explanation Doesn't Soothe]

That leaves open the prospect that a trusted third party supplied a faulty certificate whose date and time stamp couldn't be accepted in Azure operations as correct. That fault might occur if the certificate issuer had failed to account for Feb. 29 in a leap year.

Asked to indicate whether it was an in-house or third-party that issued the problem certificate, Microsoft spokesmen said only that the company was continuing its analysis.

Whatever the origin, the faulty certificate interfered with the operation of Windows Azure's Access Control Service. The service is used by Web application builders, who build it into their applications to provide a combination of single sign-on and authentication.

When the service was unavailable, applications that depended on it would not have been able to obtain identity confirmations and authorize visitors to reach parts of applications that they would normally be able to reach.

No figures have been issued on the number of Web application owners affected, or the visitor traffic that may have been lost to their applications.

At one point as the outage unfolded, the Microsoft Service Dashboard indicated that "less than 3.8% of hosted services" were affected.

Behind Access Control Service are several other services that frequently depend on it, including SQL Azure Data Sync, SQL Azure Database, and SQL Azure Reporting. Operators at CloudSleuth, a cloud service monitoring system provided by Compuware, confirmed that CloudSleuth test servers in Azure continued responded to test pings during the service outage. That provides outside verification to Microsoft's claim that running servers were not taken down, as least not in most cases.

Bill Laing, Microsoft's corporate VP for server and cloud, wrote in a blog Feb. 29 on the heels of the outage that Microsoft had given priority to keeping active systems up and running. Requests for new services that involved identifying users were declined until the certificate problem was rectified, he wrote.

Spokesmen for another cloud monitoring service, the French company, Cedexis, sponsor of the Cedexis Radar, said feedback from end users around the world had documented the Access Control Service outage and plotted its duration.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
Digital Transformation Myths & Truths
Transformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll