re: Microsoft Azure Outage Explanation Doesn't Soothe
There is a very simple lesson to be learned from the Azure outage (and last year's Amazon outage): You must perform detailed Business Continuity (DR) planning. Identify all potential single points of failure. The lesson here is that an "entire vendor" can be a single point of failure. Amazon advertised "availability zones" to protect their clients from outages. Oooops, Amazon suffers a multi-availability-zone outage. This Azure outage was multi-data center. Murphy is alive and thriving in the cloud community, just like he(she) is in corporate data centers. So, plan for it.
If you are using cloud services, you must have contingency plans in place for a complete vendor failure, whether that is bringing critical apps back in-house, or switching to another provider. The cloud providers are victims of their own hype, in that the growth is too rapid for them to cover all their bases. We all know that change is public enemy #1 to reliability. The growth within our cloud providers requires constant change as they expand their environments, especially when it pushes the limits of their architectures.
There is another aspect of these outages that I find disconcerting, namely, how the providers handle the problems, especially as it relates to client communications. This was not a major issue with outsourcing, because clients had dedicated account management teams. With the commodity pricing of cloud, we don't have the luxury of Customer Relationship Management with the providers. The cloud community must address this, either by providing far better on-line communications vehicles, or biting the bullet and having account managers that can serve as communications conduits in either direction.
Cloud is here to stay, and I am sure we will witness an evolutionary process. With what we are witnessing and experiencing, it is time for the cloud providers to acknowledge their lack of enterprise class maturity and develop the plans to bridge the gaps.