informa
/
Commentary

A Word of Warning for Remote IT Infrastructure Workforces

To continue to support a remote IT workforce, due diligence must be performed to both reduce the number of hardware components that must be managed and the steps to be taken when outages occur.

As we approach the two-year anniversary of the COVID-19 pandemic's beginning, I think back to the number of times I’ve been asked by clients and colleagues about whether remote IT workforces will be a temporary or permanent fixture. While I initially thought that a certain level of team cohesiveness would be lost across the board due to the physical separation of IT team members, I’ve since warmed up to the idea that remote IT workforces may be the way forward given these uncertain times.

However, there are some warning signs that have cropped up recently that show that organizations must plan a bit more carefully for those in IT who are responsible for managing physical equipment such as private data center servers, network infrastructure hardware, and autonomous IoT devices.

A perfect example of what I’m referring to can be found in the recent Facebook outage that occurred earlier this month. Apparently, a flawed DNS update caused the outage that lasted over five hours and impacted users across the globe. What’s more interesting is the fact that, as was reported by the New York Times, the outage resolution required a team of Facebook engineers to travel and gain physical access to a specific data center in order to remediate the problem.

Considering that Facebook is allowing nearly all employees to work remotely due to the pandemic, one must wonder if the outage lasted far longer because the right people with the right skills were not able to be where they needed to be.

Unlike other IT roles that revolve around software and/or programming, IT infrastructure does require a physical element to their role. When physical systems malfunction to the point where they need to be manually replaced or physically reset, time is certainly of the essence. These types of outages also occur far more frequently than one might expect. I recall several times throughout my career where an errant remote configuration change to a network router or switch required that I drive into the office or data center to locally access and/or reset the device so that it would revert to the previous configuration settings.

To lessen the chances of these types of incidents for typical enterprise IT organizations, I recommend that IT leadership consider a two-pronged approach.

The first step of this approach is to devise a robust remote hands strategy for situations in which physical tasks can be performed by a third-party or operations staff that work close to critical infrastructure locations. While many colocated data centers offer these types of remote hands services, little thought or preparation is put into training on-site staff on how to identify specific infrastructure devices and the tasks they are likely required to perform when an outage occurs. These types of processes should be documented and regularly enforced with training, so that skills remain fresh in everyone’s minds.

The second step is to further offload the management of underlying infrastructure hardware and software to third-party cloud and edge service providers. This puts the onus on the service provider to remedy physical infrastructure issues as opposed to your in-house staff. While incidents can still occur on these types of managed services platforms (like this one) uptime inside hosted data centers typically remains far higher than on-premises alternates.

When it comes down to it, most of IT work -- even infrastructure-related -- can indeed be faithfully performed from anywhere. However, it’s important to note that when working with physical equipment, there will always be a need for direct access to the hardware. Thus, for companies that wish to adhere to remote workforce policies, due diligence must be performed. So that companies both reduce the number of hardware components to be managed, as well as formulate precise steps to be taken when outages occur that require qualified people to have fast physical access to downed equipment.

What to Read Next:

5 Lessons from Facebook, Instagram, WhatsApp Outage

Gartner: Top Predictions for IT Organizations and Users for 2022 and Beyond

Facebook’s Teachable Moment