Outages Aftermath: What's Next for Microsoft and Its Customers?
Microsoft and its customers have a lot of food for thought in the wake of a recent outage caused by a DDoS attack while still processing the CrowdStrike incident.
A DDoS attack disrupted several of Microsoft’s Azure cloud services less than two weeks after the CrowdStrike update that sparked outages of Windows services on July 19 on a global scale.
“It's definitely not ideal for Microsoft that this happened so close to July 19. That'll be a date I think many of us will remember for our entire careers because it was such a large outage,” Kim Anstett, CIO at cybersecurity company Trellix, tells InformationWeek.
Two outages so close together raise questions about what happens when a company such as Microsoft with massive market share cannot provide its services and how customers should think about the future.
The Outages
On July 30, “ … a subset of customers experienced intermittent connection errors, timeouts, or latency spikes while connecting to Microsoft services that leverage Azure Front Door (AFD) and Azure Content Delivery Network (CDN),” according to Microsoft. The issue persisted for nearly eight hours.
The outage caused by the DDoS attack is unrelated to the CrowdStrike incident, but the timing so close on the heels of the CrowdStrike incident further underscores how widely felt disruptions to major service providers can be.
A software update released by cybersecurity company CrowdStrike triggered a mid-July outage that impacted millions of Windows devices around the world. In the wake of the outage, critical services struggled or ceased entirely to function. Blame continues to fly fast and furious, with accusations potentially playing out in legal battles. CrowdStrike, Microsoft, and Delta -- one of the airlines that had to cancel thousands of flights due to the outage -- are trading legal letters, Axios reports.
“There's a theme around technology and large-scale firms that are deeply penetrated across enterprise and people questioning the sense of responsibility and what it means to have that kind of influence over how so many businesses run,” says Anstett.
Isolated Incidents or Not?
A defect in a CrowdStrike update led to the widespread Windows outage on July 19, causing devices to bear the blue screen of death. The July 30 outage was related to a DDoS attack, but Microsoft also revealed that its defenses against this frequent type of attack were not properly implemented.
While these two incidents are separate from one another, they do suggest a need for a closer look at the review process.
“It's really important that companies not only put these systems in place to implement security and protection. They [also] need to take the proactive steps to really test their own defenses all the time and do it in a way that reflect real-world realities,” David Allen, CTO and CISO of Prevalent, a third-party vendor risk management solutions company, tells InformationWeek.
The leaders who rely on services such as Microsoft’s to operate their businesses have tough questions to answer about how they can take proactive steps forward knowing that these types of IT outages will almost certainly happen again, whether via a different root cause or with another major IT provider.
“It's a wakeup call for people to think about how can we build resilience into the security and the IT infrastructure,” says Aimei Wei, founder and CTO of cybersecurity company Stellar Cyber.
Allen urges enterprises to have disaster recovery plans in place and test them. “What would happen if this vendor went offline for 10 hours? What would happen if they went offline for a week, two weeks, a month? What are our alternatives? What can we set up in place to make sure our business isn't impacted?” he asks.
As enterprise leaders run through those scenarios, it may become clear that a more heterogeneous environment is prudent.
“What do you have in terms of business continuity and resiliency?” Anstett asks. “That for me that starts with more of a heterogenous footprint rather than single source.”
For example, enterprises may explore a multi-cloud strategy that leverages multiple service providers. “That will increase costs. But what is the cost of being knocked out of business for a certain amount of time? These two things need to be balanced,” says Allen.
The Reliance Dilemma
Many organizations rely on Microsoft to operate, and the consequences of not being to access its services were quite clear during the global outage. The IT giant’s major market share is catching the eye of regulators, The Washington Post reports.
On July 19, Federal Trade Commission (FTC) Chair Lina Khan posted on X: “Another area where we may lack resiliency is cloud computing. In response to @FTC's inquiry, market participants shared concerns about widespread reliance on a handful of cloud providers, noting that consolidation can create single points of failure.”
Mike Morper, senior vice president of product marketing at data encryption company Virtru, argues that Microsoft has created a “monoculture.” “And not only government entities but commercial ones as well have become overdependent on this monoculture,” he says.
Anstett also pointed to Microsoft’s strategy as one that does not encourage a heterogeneous environment. “The way that the products are bundled and the way that they sell has made it very difficult for individuals to have flexibility, companies to have flexibility to evolve the landscape and to bring in additional solutions that have that heterogenous environment,” she says. “They really need to spend some time soul searching and thinking about how they show up as a technology entity that is here for the greater good versus their own interest,” says Anstett.
It is possible government regulation could attempt to make changes to this kind of culture.
“When government bodies start putting claws into this, when lives are put at stake above and beyond just data, I do think there will be tectonic shifts,” says Morper.
About the Author
You May Also Like