UniSuper’s Cloud Outage and Google's 'One-of-a-Kind' Misconfig
A misconfiguration led to the accidental deletion of UniSuper’s cloud account and a week-long outage.
Earlier in May, members of UniSuper, an Australian superannuation fund (pension program), were unable to access their accounts -- an outage that included UniSuper's customers. The culprit? A Google Cloud misconfiguration that resulted in the deletion of UniSuper’s Private Cloud subscription. On May 2, UniSuper released its first statement on the service disruption. Members had to wait until May 9 to log in to their accounts. On May 15, UniSuper confirmed that all member-facing services were fully restored.
In a world where enterprises rely on data and its availability in the cloud, the UniSuper outage offers IT leaders valuable lessons on risk and outage response.
Backups and Redundancies Are Vital
The 3-2-1 rule is a common mantra in the world of data management and protection. Keep one primary copy of your data and two backups, for a total of three copies. Those backups should use two different storage media, and one backup should be stored offsite.
Cloud providers, even the big ones, aren’t perfect. “Relying solely on a single cloud provider for backup, even one as reputable as Google Cloud, can pose significant risk,” Kim Larsen, CISO at Keepit, a cloud data protection platform, says in an email interview. “UniSuper’s experience is a stark reminder of the potential data protection gaps when relying on one single cloud service for SaaS backup.”
UniSuper did, in fact, have backups in place, but the misconfiguration had a cascading impact. “UniSuper had duplication in two geographies as a protection against outages and loss. However, when the deletion of UniSuper’s Private Cloud subscription occurred, it caused deletion across both of these geographies,” according to a joint statement from UniSuper CEO Peter Chun, and Google Cloud CEO Thomas Kurian.
The superannuation did have backups with another service provider, which helped to minimize data loss, according to the statement.
Despite those backups, UniSuper still had to contend with the fallout of a week-long cloud outage. “This incident raises questions about both geographical redundancy and retention periods for data stored in Google Cloud,” Todd Thorsen, CISO at CrashPlan, a cloud backup solutions company, tells InformationWeek in an email interview. “The deletion of UniSuper’s private cloud subscription ... led to deletion of all of their data. It seems to me that customer data should still be available for a reasonable period of time post-subscription and should not be immediately deleted, unless the customer directs it.”
What does this mean for enterprise leaders as they consider their organizational approach to the 3-2-1 rule?
“CIOs should ensure they maintain strong third-party backup capabilities in line with service providers’ terms and conditions and that backup frequency is in line with risk tolerance for their organizations,” says Thorsen.
Kevin Miller, CTO at enterprise software solution company IFS, recommends enterprise leaders also think about the shared responsibility model. What is the enterprise responsible for, and what is the cloud provider responsible for? “Outline those different responsibilities and more importantly accountability,” he recommends.
Understanding responsibility and accountability can help organizations during the recovery process following an outage, whether caused by a misconfiguration or a cyberattack.
Misconfigurations Are an Ongoing Risk
The misconfiguration that caused the UniSuper cloud outage is referred to as a “one-of-a-kind occurrence” in the joint statement. Google conducted an internal review and took steps to prevent a recurrence of this particular incident.
“Openness and transparency about unfortunate incidents like this is important because it enables the rest of the IT community to learn from what happened and strengthen data protection measures,” says Larsen.
While this specific misconfiguration is unlikely to happen again, others like it could. “I think due to the complexity of things like full cloud, hybrid cloud, some shared responsibility of where data … is stored, it's inevitable that it will happen again,” Miller cautions.
The exact nature and fallout of future cloud misconfigurations are difficult to predict, but their inevitability is a reminder for enterprise leaders to include them in their risk assessment and planning processes.
“One thing that can assist you in an uncertain world is proper testing of your business continuity plan and disaster recovery plan, so you can ensure the organization's ability to recover after a fallout or a cyberattack,” says Larsen.
Practice Disaster Recovery Plans
What does testing a disaster recovery plan look like?
“Sometimes when we think of disaster recovery, we think of natural disasters: a tornado or a hurricane hits the data center or there's some kind of weather event,” says Miller. “But the truth is things like malware, malicious attacks, a cloud provider having a hiccup, someone cutting through lines, those all need to be topics that are reviewed as part of that disaster recovery process.”
Developing and testing that disaster recovery plan is an ongoing process for enterprises. Various scenarios -- like a cloud misconfiguration that causes an outage -- need to be run through, and everyone on the team needs to know their role in the recovery process.
“That whole disaster recovery and backup process needs to be reevaluated and should be revisited multiple times a year,” says Miller.
AI, inevitably popping up in any IT conversation, has a potential role to play in strengthening these plans. “Humans can't look at everything all at once at the same time 24/7, but certain machine learning models … can,” Miller points out. AI could potentially help enterprise teams spot gaps in their disaster recovery plans.
The UniSuper incident may be anomalous, but the ongoing risk of cloud outages and data loss, stemming from any number of causes, is very real.
“It should serve as a wakeup call for CIOs to assess their organizations’ data resilience postures related to not only IaaS environments but across all essential and critical data,” says Thorsen.
About the Author
You May Also Like
2024 InformationWeek US IT Salary Report
May 29, 20242022 State of ITOps and SecOps
Jun 21, 2022