Commentary
Thank Goodness For The Microsoft Azure Crash
Over the past weekend, Microsoft Azure was unavailable for nearly a day. Microsoft's cloud OS offering is still in beta, so the company isn't making any promises about availability or reliability at this point. However, events like this are just what a company needs to improve the product before it ships -- and before it's too late.Over the past weekend, Microsoft Azure was unavailable for nearly a day. Microsoft's cloud OS offering is still in beta, so the company isn't making any promises about availability or reliability at this point. However, events like this are just what a company needs to improve the product before it ships -- and before it's too late.Problems like this are critical to the experience a company must have in order to deliver a reliable service. Let me give you an example. At one Web company where I worked earlier this decade, the Web hosting providers had the typical uptime guarantees that amounted to a promise of less than one hour of continuous downtime per month. Although there were several annoying issues and problems, none ever came to the level of being able to claim any credit through the uptime guarantees.
Then came Jan. 23, 2003, and the SQL Slammer worm. The entire data center for this hosting company was wiped out for nearly 24 hours as a result of this worm. They were responsible for maintaining the firewalls, installing OS patches, and isolating customer loads to prevent interference, and they failed in all these categories on this day. Even by the contract terms of the hosting-friendly service-level agreement, we were issued a refund of more than half of that month's hosting fees.
More Windows Insights
White Papers
- Mobile BI: Actionable Intelligence for the Agile Enterprise
- The BlackBerry PlayBook tablet's Good Bones - by BlackBerry
Reports
More >>Webcasts
- Maximize ROI with Database Consolidation onto Private Clouds
- The ABC's of Cloud Computing in the Midmarket
After that day, the hosting company changed completely. They enacted new procedures and practices that have made them both reliable and responsive when problems occur. More important, problems are extremely rare. It took a reputational and financial disaster like SQL Slammer to bring the company to its senses and do what it needed to do. Companies like Amazon have had similar experiences with their S3 service and come out better for them.
This scenario seems to be playing out in the Microsoft Azure group. It took them a long time to figure out what was going on and fix this problem, and even now it doesn't sound like the Azure folks completely understand how the disaster played out. That is what makes this event such a great opportunity. Management should talk to the group and point out how catastrophic this would have been -- both financially and reputationally -- if Azure was a shipping product. Then they should dissect the problem and figure out how they'll avoid the problems where possible and respond more quickly when they do happen.
Related Reading
| To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy. | |
|
|
T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting! |
Subscribe to RSSResource Links
This Week's Issue
Technology Whitepapers
- Mobile BI: Actionable Intelligence for the Agile Enterprise
- Creating the Enterprise-Class Tablet Environment - by Yankee Group
- How To Regain IT Control In An Increasingly Mobile World - by BlackBerry
- The BlackBerry PlayBook tablet's Good Bones - by BlackBerry
- Red Alert: Why Tablet Security Matters - by BlackBerry
Featured Resource
This technical brief dives deep into migration recommendations and explains how to plan thoroughly, adopt a phased approach and who to ask for help.
Read Now












