Microsoft Azure cloud services dashboard shows access problem in three regions; most users back online within eight hours.
Office 365 Vs. Google Apps: Top 10 Enterprise Concerns
(click image for larger view and forslideshow)
Microsoft's Windows Azure cloud services suffered an outage early Wednesday morning at its data centers in Northern Europe and North Central and South Central U.S. The problem may have been related to a leap year timing glitch buried in automated software procedures.
Microsoft is known to have facilities in Ireland, Chicago, and San Antonio, Texas, and users at all three locations lost access to the facilities sometime before 4:55 a.m. Central time in San Antonio, Texas. In Ireland, the time was 10:55 a.m. Greenwich Mean Time.
"Incoming traffic may not go through for a subset of hosted services," warned the service dashboard at the time. After about eight hours, Microsoft spokesmen said most service had been restored but the company was still troubleshooting the problem.
For a San Antonio cloud center, more network bandwidth is assigned as traffic begins to pick up at that hour of the morning. As workers rise, check websites, and log into information services at the start of another busy workday, their activity typically results in the calling up of new resources, which in turn calls for security certificate checks and other date and time sensitive operations.
That explanation, however, wouldn't seem to apply to a Windows Azure facility in Ireland, where 10:55 a.m. GMT is the local time.
"We are experiencing an issue with Access Control 2.0 in the North Europe subregions. Users are not able to access their ACS namespaces at this time. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers," said a posting to the Windows Azure Service Dashboard. The same message was issued at the same time for the South Central U.S. subregions at the 11:15 a.m. Greenwich Mean Time mark. Later, Microsoft spokesmen said the North Central U.S. or Chicago area users were affected as well.
Other regions served by Azure, such as West Europe or Asia, appeared unaffected by the outage. The service dashboard did not say whether some or all cloud services couldn't be accessed.
Microsoft later said in a statement the service management problems were caused by "a cert issue triggered on 2/29/2012," or a security certificate issue activated once every four years.
It said access to services and management functions were "restored for the majority of customers" by 1:30 p.m. GMT in Northern Europe or 7:30 a.m. in the U.S.
The outage was a reminder of Amazon Web Services outage that started April 21 and extended over the Easter weekend last year. Both Amazon and Microsoft were knocked off line by a power outage in Dublin the following August.
As enterprises ramp up cloud adoption, service-level agreements play a major role in ensuring quality enterprise application performance. Follow our four-step process to ensure providers live up to their end of the deal. It's all in our Cloud SLA report. (Free registration required.)
Multicloud Infrastructure & Application ManagementEnterprise cloud adoption has evolved to the point where hybrid public/private cloud designs and use of multiple providers is common. Who among us has mastered provisioning resources in different clouds; allocating the right resources to each application; assigning applications to the "best" cloud provider based on performance or reliability requirements.