Hotmail Crash Blamed On System Update Error - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Software as a Service
News
9/12/2011
03:11 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Hotmail Crash Blamed On System Update Error

Microsoft explains why its email service disappeared from the Internet's Domain Name System.

Office 365 Vs. Google Apps: Top 10 Enterprise Concerns
Office 365 Vs. Google Apps: Top 10 Enterprise Concerns
(click image for larger view and forslideshow)
Microsoft was in the process of updating a network traffic load balancing tool when its Hotmail service became inaccessible to thousands of users worldwide Sept. 8, according to the company's explanation of the outage.

The faulty update effort lead to corruption of Hotmail's addressing on the Internet's Domain Name Service. Microsoft said what started out as "a service degradation" became a "service disruption" that took about 3.5 hours to analyze and fix.

"On Thursday, September 8th at approximately 8 p.m. PDT, Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services," a Microsoft spokesman said in an email response to InformationWeek.

"A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption. Service restoration began at approximately 10:30 p.m. PDT, with full service restoration completed at approximately 11:30 p.m. PDT. We are continuing to review the incident," a Microsoft spokesperson said.

For end users, a full service restoration was still to come. An update at 11:29 p.m. PDT on Sept. 8 by Chris Jones, author of the blog, Inside Windows Live, said the Domain Name System correction had just been effected and it would take at least 30 minutes for the change to propagate itself through the system. He said the DNS corruption may have caused problems accessing other Microsoft cloud services, including its cloud storage system, SkyDrive, and the Microsoft Office applications offered online.

The service disruption and length of Microsoft response prompted numerous comments from around the world as Hotmail users noted on Inside Windows Live and at DownRightNow.com that they were all experiencing the loss of service at the same time.

Changes to software by IT staff are major cause of outages in enterprise data centers. Cloud services, including Microsoft's Windows Azure, SkyDrive and Hotmail aren't immune either, even though cloud providers strive to automate as many operations as possible in ways that have been tested and proven free of human error.

Even so, the leader in cloud services, Amazon Web Services, also suffered from a common human error when an AWS network administrator in the early morning hours of April 21 switched a communications network onto a backup network unequal to the task of carrying all the traffic. Many running systems in EC2 found they couldn't access their data, triggering what Amazon termed on April 29 a "remirroring storm" as data systems tried to create new, accessible copies. That choked the systems and froze the ability of Elastic Block Store and Relational Database Service to access data and keep EC2 instances supplied with fresh data.

All brands of cloud providers will need to build in more safeguards against faulty software upgrades and human error as they continue to try to convince enterprise data center operators to make greater use of their services.

In January, Hotmail accidentally deleted 17,335 user accounts, then restored them through backup procedures.

Automation and orchestration technologies can make IT more efficient and better able to serve the business by streamlining common tasks and speeding service delivery. In this report, we outline the potential snags and share strategies and best practices to ensure successful implementation. Download our report here. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Commentary
Where Cloud Spending Might Grow in 2021 and Post-Pandemic
Joao-Pierre S. Ruth, Senior Writer,  11/19/2020
Slideshows
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Slideshows
Flash Poll