Then came Jan. 23, 2003, and the SQL Slammer worm. The entire data center for this hosting company was wiped out for nearly 24 hours as a result of this worm. They were responsible for maintaining the firewalls, installing OS patches, and isolating customer loads to prevent interference, and they failed in all these categories on this day. Even by the contract terms of the hosting-friendly service-level agreement, we were issued a refund of more than half of that month's hosting fees.
After that day, the hosting company changed completely. They enacted new procedures and practices that have made them both reliable and responsive when problems occur. More important, problems are extremely rare. It took a reputational and financial disaster like SQL Slammer to bring the company to its senses and do what it needed to do. Companies like Amazon have had similar experiences with their S3 service and come out better for them.
This scenario seems to be playing out in the Microsoft Azure group. It took them a long time to figure out what was going on and fix this problem, and even now it doesn't sound like the Azure folks completely understand how the disaster played out. That is what makes this event such a great opportunity. Management should talk to the group and point out how catastrophic this would have been -- both financially and reputationally -- if Azure was a shipping product. Then they should dissect the problem and figure out how they'll avoid the problems where possible and respond more quickly when they do happen.