Hotmail Crash Blamed On System Update Error
Microsoft explains why its email service disappeared from the Internet's Domain Name System.
The faulty update effort lead to corruption of Hotmail's addressing on the Internet's Domain Name Service. Microsoft said what started out as "a service degradation" became a "service disruption" that took about 3.5 hours to analyze and fix.
More Cloud Insights
Webcasts
- Creating an Agile, Flexible Cloud Computing Model
- Maximize ROI with Database Consolidation onto Private Clouds
White Papers
- e-Commerce Strategies for Business-to-Business (B2B) Sales and Marketing
- Cloud Computing Drives Break through Improvements in IT Service Delivery, Speed, and Costs
Reports
More >>"On Thursday, September 8th at approximately 8 p.m. PDT, Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services," a Microsoft spokesman said in an email response to InformationWeek.
"A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption. Service restoration began at approximately 10:30 p.m. PDT, with full service restoration completed at approximately 11:30 p.m. PDT. We are continuing to review the incident," a Microsoft spokesperson said.
For end users, a full service restoration was still to come. An update at 11:29 p.m. PDT on Sept. 8 by Chris Jones, author of the blog, Inside Windows Live, said the Domain Name System correction had just been effected and it would take at least 30 minutes for the change to propagate itself through the system. He said the DNS corruption may have caused problems accessing other Microsoft cloud services, including its cloud storage system, SkyDrive, and the Microsoft Office applications offered online.
The service disruption and length of Microsoft response prompted numerous comments from around the world as Hotmail users noted on Inside Windows Live and at DownRightNow.com that they were all experiencing the loss of service at the same time.
Changes to software by IT staff are major cause of outages in enterprise data centers. Cloud services, including Microsoft's Windows Azure, SkyDrive and Hotmail aren't immune either, even though cloud providers strive to automate as many operations as possible in ways that have been tested and proven free of human error.
Even so, the leader in cloud services, Amazon Web Services, also suffered from a common human error when an AWS network administrator in the early morning hours of April 21 switched a communications network onto a backup network unequal to the task of carrying all the traffic. Many running systems in EC2 found they couldn't access their data, triggering what Amazon termed on April 29 a "remirroring storm" as data systems tried to create new, accessible copies. That choked the systems and froze the ability of Elastic Block Store and Relational Database Service to access data and keep EC2 instances supplied with fresh data.
All brands of cloud providers will need to build in more safeguards against faulty software upgrades and human error as they continue to try to convince enterprise data center operators to make greater use of their services.
In January, Hotmail accidentally deleted 17,335 user accounts, then restored them through backup procedures.
Automation and orchestration technologies can make IT more efficient and better able to serve the business by streamlining common tasks and speeding service delivery. In this report, we outline the potential snags and share strategies and best practices to ensure successful implementation. Download our report here. (Free registration required.)
Related Reading
| To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy. |
Subscribe to RSSResource Links
Related Webcasts
- Creating an Agile, Flexible Cloud Computing Model
- Maximize ROI with Database Consolidation onto Private Clouds
- Big Data at High Speed: Complex Event Processing at 10x
- SMB Server Guide: Meeting Email, Virtualization, and Business Application Challenges
- Best Practices for Improving Database Testing: Upgrades, migrations, business growth and more - ensuring you can handle the workload!
SELECTED CLOUD CONTENT
- Ciena's Virtual WANs Offer Bandwidth For Cloud Apps
- Oracle Buys Vitrue For Social Marketing
- EMC Shares New Atmos Details
Sponsored Resource Center
This Week's Issue
Free Print Subscription
SubscribeCurrent Healthcare Issue
- InformationWeek Healthcare CIO 25: Our second annual honor roll of the health IT leaders driving healthcare's transformation.
- EHR Unreadiness: Only a small percentage of physicians planning to apply for Meaningful Use funds have e-health record systems capable of achieving most of the requirements. .
- And much more!
- Read the Current Issue
Featured Whitepapers
- e-Commerce Strategies for Business-to-Business (B2B) Sales and Marketing
- Cloud Computing Drives Break through Improvements in IT Service Delivery, Speed, and Costs
- Cloud First IT: Managing a Growing Network of SaaS Applications
- Top 8 Identity and Access Management Challenges with Your SaaS Applications
- A Revolutionary Approach to Cloud Building



