Working on InteropNet takes root-cause troubleshooting to a new level because no one on the NOC team wants hear "Hey, what happened to the network?" once Interop begins.
During the Hot Stage phase of InteropNet, our goal was to make sure the network was always operational and available. And, instead of troubleshooting existing network issues, we did something I like to call pre-troubleshooting; that is, actively preventing problems from existing once Interop opens.
To do this, we used PathSolutions TotalView, our root-cause troubleshooting product. During the network build out, it took about twelve minutes to fully deploy and get all of the equipment added to the configuration.
Keep in mind that the equipment represents a multi-vendor network that has to be assembled, configured, and deployed in Las Vegas in less than four days—not the norm for most company networks, but something the NOC team prides itself on accomplishing every year.
One of the first things I did is look at the Issues tab, which showed all immediately discovered problems in the network.
This allowed the pre-troubleshooting activities to start so we could work with the other NOC team members to remove the bugs before they became weird mysteries that required serious digging through network device configurations.
Switches and routers have thousands of counters that track all sorts of performance, error, and configuration information. TotalView discovers and analyzes all this information. We listen to what the equipment is trying to tell us through its error counters, and produce plain-English answers to the problem.
For example, during Hot Stage a high-throughput test wasn't showing as much data as should be passing from one point to another.
TotalView's Path mapper identified the layer-1, layer-2, and layer-3 path through the network the load test was using, and then spotted one interface that was discarding a ton of packets due to jumbo frame misconfiguration.
With that interface reconfigured to permit the jumbo frames, the test produced expected results.
Now that we're done at Hot Stage and on our way to Las Vegas, our goal is to have a rock-solid network capable of handling anything Interop attendees and exhibitors throw at it.
"As rapidly as the InteropNet is deployed, you can't just plug things in and hope and pray that they work," said Glenn Evans, Lead Network Engineer of InteropNet.
"You have to be on top of what the equipment is complaining about and rapidly address the problems or you’ll be stuck in constant firefighting mode."
For InteropNet, pre-troubleshooting problems is key to staying on top of any possible issues just as for company networks, root-cause troubleshooting is key to rapidly resolving network issues. To see TotalView in action, drop by the NOC or the PathSolutions booth #1651.
Interop Las Vegas is April 27 through May 1. Register now to learn the latest technology developments, accelerate your career, and network with your peers in the IT community.Tim Titus, CTO of PathSolutions, is a network geek and proud of it. Before starting his own network troubleshooting company, he spent more than 25 years working for small and large companies in a variety of roles: from network engineer, to network manager, to network ... View Full Bio