Southwest Airlines' Latest Tech Woes Point to Firewall Failure

Issues with a firewall led to the temporary grounding of the airline’s entire fleet, reviving concerns of tech management.

Joao-Pierre S. Ruth, Senior Editor

April 24, 2023

7 Min Read
Jim West via Alamy Stock Photo

Last week, Southwest Airlines called for the brief grounding of its fleet to give the company the chance to resolve data connection issues triggered by a firewall failure. With approval from the Federal Aviation Administration, Southwest kept its planes on the ground and went to work on what it stated was an unexpected loss of some operational data.

This latest grounding of Southwest’s fleet brought back memories of winter when the airline canceled more than 16,700 flights from December 21 through December 31, 2022. That disruption was related to the effects of Winter Storm Elliot, when Southwest’s software system had trouble accommodating the reassignment of flight crews and planes after the storm lifted.

That led to discussions about the airline’s tech strategy and the hiccup in April revived questions about modernization and oversight of technology, though this time a vendor-supplied firewall was to blame.

Gunter Ollmann, CTO for Devo, developer of a cloud-native security analytics platform, spoke with InformationWeek about the expectations placed on firewalls for resilience and who typically has oversight of their functionality.

What if there’s an issue with the firewall? What can break potentially?

You can sort of imagine a firewall as the front door, and the only door to your house, with a cat flap.

And the beauty of using that analogy is that the firewall’s there to block all the stuff that you don’t want, and the cat flap lets in the stuff that you do want. It is not there to block absolutely everything.

With that cat flap being there, these are the things that I want to let in, what I want to communicate. This is the channel that I’m going to allow. You can also misconfigure a firewall such that the flap no longer works. So, you’re blocking all the traffic in and out, even the traffic that you want it to come in.

For most organizations with a firewall, if that cat flap is not working, it means that all the other services on either side of that door can no longer communicate. So, all those services now shut down. What that generally means is that someone can make an incorrect configuration, in which case that door is now closed for everyone -- which sounds very similar to what’s been hinted at here. All traffic was blocked, or at least all the traffic for key applications was blocked, and that broke those applications. Most modern applications have to connect to other distributed applications. If the firewall is broken, it ceases communication between applications and the whole ensemble of applications then stop functioning.

Most high-end firewalls and firewall technologies are still appliance-based. They’re designed to handle very high data flow rates and the physical connection between power grids and power systems, cloud infrastructure, and networking to other networking type technologies.

And it’s a rare case where the appliance itself fails. Generally, the device is designed to fail closed, which means that if the firewall has some physical interruption or physical degradation of services, the communication will still be allowed to continue between systems.

With the actual maintenance or overall oversight of a firewall, who tends to be responsible for that? Day-to-day, hands-on oversight -- is that something the vendor is typically responsible for? Is there some responsibility with the customer who’s using it? Is there a bit of both having some shared responsibility?

It really depends on how the IT infrastructure is set up. From my experience with air carriers, they tend to outsource their infrastructure. There are a number of large systems integrators and communication and network service providers that are specifically for the aviation industry. What that translates into is you will have the air carrier who will have their own IT teams and their own internal security teams, but they’re relatively small. They’re focused more on future vision and day-to-day internal operations of these systems.

The outsourced component typically is a data center that is managed by, hosted by someone else. So that data center, all of the physical infrastructure, the physical firewalls, the hard drives, the backup storage, and key servers are hosted in that outsource infrastructure and as part of that package. The outsourcing company provides the resiliency in the configuration of the firewalls and other physical services.

It depends on where the aviation provider is in their digital transformation. They may be leveraging public cloud services such as AWS or Azure, in which case all of that physical infrastructure is ephemeral and virtualized and are part of the services come from those cloud service providers. In that case, there is more onus on the IT team within the aviation company to configure and manage the software inside those cloud service providers, in which case they are responsible for the firewall configuration policies, the configuration of routing traffic, and things like that.

Meanwhile, the cloud service provider is responsible for the resilience of the physical device and physical mediums.

If I have two front doors with two cat flaps in there, but only one of them is used at any one particular time, and if that door is blocked or no longer usable, then I can automatically flick over to the second resilient backup of my firewall and that configuration.

Sometimes you can just mess up. Instead of saying that front door A is the one I want to be using or B is my backup and won’t be open right now but remains ready, I may configure it incorrectly and say both doors think they’re in backup mode and so neither of them operating.

How often does firewall technology need to be refreshed, updated, modernized? Is it something that requires constant updates?

I will answer this in two pieces. One is the physical device and the other is the software that runs on that physical device. The physical devices are generally very resilient and have been operating well in the enterprise space for well over 30 years, so there’s a solid history of evolution and it’s a very mature technology.

Those physical devices can fail. It’s increasingly rare. Many times, those physical firewall devices and technologies have an average lifetime of 10-plus years, which in IT terms is very long.

There is a lot of history and a lot of knowledge in how to configure and physically rack, mount, and architect around that physical device so that there is additional physical resiliency.

Physical failure is rare, but most common architectures build into the architecture plan that even if one of those devices physically fails, there are spares and automatically flick over to the backups.

On the software side, again, it’s a very mature technology. On average for a firewall appliance, generally there’s maybe one or two firmware updates per year.

It’s very different from 20 years ago where lots of new vulnerabilities were being found. There was almost a monthly cycle of new patches and updates. But nowadays, firewall devices and appliances are very robust from a security perspective and a class of technology.

Are there ways to create redundancies that would potentially help mitigate issues if there’s a firewall failure, so an organization would not lose access to data?

There are decades of experience in the architectures to ensure that your firewall capability is robust and has redundancy in there.

Even before traffic comes to your firewall, often there’ll be a load balancer between the source of that traffic and your firewall, and that load balancer is designed to route that traffic to the appropriate firewall. As part of that load-balancing part, it means that load-balancing architecture can also be used to ensure that the customer or the end user experience is timely. For example, load-balancer architecture that sits in the cloud may be referred to as a CDN or a content distribution network. What that basically means is that if I am sitting in London and I want to access the website, if that website was traditionally just held in Texas, then my traffic would have to travel across your satellite link or fiber link across the oceans and it takes longer for traffic for a response to come back with CDNs and load balancers.

What I can do is that instead of routing that traffic or that request to the Texas infrastructure, I may make a decision that I should route that traffic to Ireland because it’s closer, physically closer so there is a faster response. Behind the scenes, what that means is that I have a duplication of my application and everything that is around the application, which would include the firewalls.

So, for example, if a provider stopped working in Texas, I would still want to have other lines going to other data centers around other states such that a customer or my own operation should not be affected globally just because the provider is having trouble at the time.

What to Read Next:

Taking Additional Steps to Protect Financial Information

Stress-Test Your Software to Prevent a Southwest-Type Calamity

CIO Lessons Learned from Southwest Airlines’ Winter Plight

About the Author

Joao-Pierre S. Ruth

Senior Editor

Joao-Pierre S. Ruth covers tech policy, including ethics, privacy, legislation, and risk; fintech; code strategy; and cloud & edge computing for InformationWeek. He has been a journalist for more than 25 years, reporting on business and technology first in New Jersey, then covering the New York tech startup community, and later as a freelancer for such outlets as TheStreet, Investopedia, and Street Fight.


Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights