When you watch your home town football team play on Sunday or you are Skyping with family over your Internet connection, it's really annoying to have your Internet connection go down. If it's restored in five minutes, you may not call customer service, but you probably still will be annoyed. If that kind of service interruption happens a few times, you might start thinking about switching to a new service provider.
Matt Tegerdine wants to prevent that. As director of network performance analytics at Verizon, his team of 45 data scientists, data translators, data engineers, and data stewards, has been putting analytics and predictive analytics in place to detect problems with Verizon's network before those problems actually occur. It's a challenge Tegerdine been working on for about five years, even before he earned the director title just under two years ago.
Back then, Verizon Wireless had a system built from some vendor tools and other technologies designed to help diagnose network problems before they happened. It was a system that the residential and commercial business needed to replicate. Tegerdine's challenge was to do something similar for the FIOS and Public IP group of Verizon. But he had to do it with different technology, less time, and less money.
Tegerdine assembled a system built on several open source big data technologies, including Hadoop (It was the internal version created by Yahoo. Verizon acquired Yahoo in 2017.), Jupyter notebooks, Apache Flink, and Apache Storm. The team deploys its work as PySpark apps.
Verizon has decoupled these technologies with the visualization tech that its users would use.
"I don't want to dictate how you consume the data," he said. "I just want to make sure it's available." Users can consume it in Excel or Tableau, for instance. Tegerdine said that Verizon is also a big Splunk shop.
The team's first efforts were focused on analyzing data from the customer service ticketing system to listen to customers and their pain. From there the team moved into the network itself, collecting data from devices with the goal of detecting issues and responding to them before customers realized there even was an issue. This is where Verizon FIOS and Public IP networking first used predictive models.
For instance, one of those first predictive models gathered data from post amplifiers -- devices that Verizon places in the "last mile" to the customer site of the fiber optic cable. Sometimes those lengths of cable are too long and require an additional amplifier for the light signal to make it all the way. If one of those amplifiers fails, Tegerdine said, it can cause an instant catastrophic event, defined as 5,000 or more customers impacted.
Once it saw the benefit of collecting amplifier data, the team started collecting data from other devices, too. For instance, they created models built on systems logs for edge routers. They've done modeling on application performance in the network, and they've been modeling all the interfaces including those used by customers and those used by internal users at Verizon.
In the first year, the group's work prevented 17 catastrophic events that Tegerdine said would have impacted more than 75,000 customers. Up until now the system has detected over 520,000 customer events. These could be as simple as detecting a router failure, or they could be as tricky to detect as a deteriorating router card that is causing packet loss but hasn't failed yet.
But there's more work to do. Right now, some of Verizon's critical network parts are covered by the network performance analytics project, but not all. Adding more compute will help expand the network performance analytics effort.
Other changes are also afoot. Tegerdine's big data project is about to enter a next phase with a project known internally as the Network Pod -- a sort of a data refinery.
"We had our tools in place, and we were sort of MacGyvering it," Tegerdine said. "It got the job done but it wasn't scalable. We quickly got buried by our own success."
To compete on sophisticated data analytics, this Verizon group needed an end-to-end framework that could scale.
"If you treat data science as an organization practice, you are going to restrain yourself. If you do it in silos, you are not leveraging your scale."
Tegerdine was looking to create more of a data refinery. Data is the raw material pumped through the refinery of tools. The goal is to let the entire company use this data lake to run and build the network.
"What drives me is the democratization of data," Tegerdine said. "When you bring these people data, the benefits and insight explode out of that data.