Category Archives: Networking

Don’t Forget Visual Inspection When Network Troubleshooting

My small engines shop teacher said it in high school. Countless Air Force electronics instructors said the words when I went through Electronic Warfare school. I myself even harped on it when I became an Air Force instructor, and again years after when I taught basic electronics classes at a local vo-tech center.

Always first do a visual inspection when you’re troubleshooting. Always.

It’s easy to say, and just as easy to blow right past. Like I did yesterday when troubleshooting a wireless bridge link, which cost an extra hour of troubleshooting time.

In this scenario, a farm campus is tied together by three Ubiquiti bridges. It’s an environment that I took over and cleaned up a few years ago. I had my hands full eliminating all the oddball consumer routers that were in way too many places and moving the entire environment to a manageable topology that both I and the owner could understand. I inherited two M5 Nano Station bridge links, that were actually pretty well done- or so it seemed. Later, I would add a 900 MHz bridge link to get past a large stand of tall pines for a new connection, but this tale of my own shortcomings focuses on one of the M5 links.

The trouble call was for the single PC in the Robot Barn- a facility used for automatic feeding of dairy cow calves. The PC has two network connections; one goes to the modem that uplinks the robot feeders on proprietary low-voltage protocols, and the other connects to one of the M5s and ultimately back to the Meraki MX that head-ends the network. Basically, nothing was working.

A quick stop at the barn, and I found that the PC was in the kind of shape that comes when someone doesn’t know what they are doing, but are trying to fix it anyway. Both adapters had all kinds of oddball, nonsensical settings. I quickly got the dairy application side up so the important robot data was at least being buffered, and it could upload to offsite servers when I got the network link figured out.It was pretty clear that the PC was not talking back into the network, nor would my own laptop. But… from the remote end I could get to the far-side bridge admin interface, and see that it showed link down. On the way out of the building, I took a quick look and saw this:
M5.JPG

Then, I drove to the other end of the farm to where the root bridge is. As I walked in to the building to check to make sure the root had link-light and such, I got distracted by one of the owners. He told me he had re-arranged some of the power cords and the monitor for the CCTV system, which are co-located with the network equipment the same time the problem started. Ah-hah! I’m highly skeptical of coincidences, and bit right into the probability that THIS MUST BE THE PROBLEM. I sat down, got into the root bridge UI, and started thinking desperate thoughts. Like… even though I can get into the UI on both bridges, maybe one died on the radio side. Or maybe one of the cheap power supplies wasn’t getting it done (despite both bridges eagerly presenting their UIs to me).

For the next hour, I let myself go down goofy rabbit holes. I replaced both bridge power injectors. I dorked with settings on each bridge. I falsely concluded that one bridge or the other was at least corrupted, if not bad. My next step was to take them both down and see if I could reset them and start over getting them to talk. I walked outside with one of the owners to show her where I needed to get access to take down the root bridge- and then felt profoundly stupid.

The root bridge was not where it was supposed to be. It was laying down on the metal roof, looking sadder than a country song on a Sunday morning. Remember, I inherited this bridge, along with the others. The “mast mount” was an anemic two sheet metal screws into the thin metal peek of the roof, and it’s amazing it held up as long as it did. Up I scurried, and cobbed it back into place with wire as it was getting dark with proper mounting to follow. And- the link came back up.

LESSONS:

  • When I took responsibility of this network over, I should have looked closer at the shoddy way this bridge was mounted and dealt with it then.
  • Whoever hosed up the computer shouldn’t have. The owners will work with the staff to ensure that doesn’t happen again.
  • I SHOULD HAVE gotten out of my vehicle and walked immediately to where I could see the root bridge installed, after having verified all at the non-root site was seemingly fine.
  • I SHOULD NOT HAVE gotten starry eyed jumping to the conclusion that the problem came from things being touched near the network equipment.

Having skipped the important visual inspection step at the root end pushed me into a trap of bad judgement that we all land in occasionally, and when I realized that had happened my mind was immediately flooded with voices from the past (including my own) saying yet again “Always do a visual inspection first!”.

Whether you’re looking for a wireless bridge laying on a roof, a burnt-out resistor on a circuit board, a corroded Ethernet jack, or a damaged fiber cable, a quick once-over with the eyes is sound practice before you start digging in on configurations.

Had I followed my own guidance, I would have had my client back in service a lot quicker.

(And yes… I did make sure all of the other bridges were mounted right before I left!)

Document That Small Business Network Environment- Whether You Are the Customer or Provider

Small networks can still be complicated. But too often a slew of information that should be recorded for the benefit of the customer and the technology providers gets overlooked, because… well, it’s small.  That is, until the environment needs to be troubleshot or serviced in some way, and big questions can arise from sloppy or lacking initial documentation.

See the article I wrote on the topic at IT Toolbox, or skip write past it and check out a simple version of a checklist you might use to get you started when making sure the important documentation basics are covered when buying or providing a small business network.

This isn’t meant to be comprehensive or all-inclusive, but it is the kind of information I make available to my own small-site customers. It gives us a common frame of reference, and empowers the customer to better understand what they just purchased (which I find they almost always have been frustrated with from “the last guy”).

Code Bugs Do Have Real World Consequences

I’m not sure if my expectations are just too high for today’s world. When I buy a new vehicle, I don’t want to see surface rust forming two weeks after it leaves the lot. I don’t like the current presidential election and the horrible choice that voters have to make. And I actually expect that network vendors will put out decent code, or at least be very up front and open when significant faults are found. 

You see, those significant faults have real-world consequences. They bring operations to a screeching halt, and diminish organizational credibility. And ill-conceived “work arounds” and cavalier vendor attitudes to the customer’s bug-induced plight just make matters worse.

Here’s a real-world example.

I had a carefully worked-out maintenance window to upgrade both ends of a site-to-site VPN topology that spans Syracuse to London, using my favorite cloud-managed vendor’s gear. I’ve done this procedure at least a half dozen times, and have installed at least 30 of this particular security appliance. My Syracuse work was coordinated with a gent on the other end, and we’d do one end at a time. But… we never got past my end.

I configured the new appliance with what few settings it needed: IP address, gateway, subnet mask, and DNS servers. I saved them, then I waited for the indications that the box had made contact with the cloud and pulled down it’s updates. But those indications never came.

Like many a networker would do, I went to verify that the settings that I entered were correct. Curiously, there were NO settings saved. OK- maybe I forgot to save… The second try yielded the exact same result as the first. It was time to open a support case- as my maintenance window ticked away and my partner in London waited patiently.

I opened the case, then immediately called the support line (for the sake of expedience). I was told that this particular appliance has a firmware bug straight from the factory and that I’d need to find a DHCP-served network to use because it won’t actually save anything you enter with out-of-box firmware. When I asked if this was documented anywhere, I was told very matter-of-factly “we don’t share that information with customers” and that it shouldn’t be a big deal to just use DHCP.

Grrrrr.

Most places I’ve installed these appliances don’t have DHCP services readily available, because ultimately the appliances use a static IP and eventually ARE the DHCP servers for inside clients. And, I don’t tend to lug around an extra SOHO router just on the off-chance I’ll have to jam something in that can act like a DHCP server to get around a code bug that my vendor doesn’t feel customers need to know about before they actually try to use the product.

Let’s skip to the end:

  • I got to use some of my best “military” language after I realized the gravity of the situation
  • The maintenance window was busted, and the scheduled change didn’t happen
  • I probably lost credibility with my London partner as I was the Guy in Charge for this
  • My vendor has absolutely lost my confidence given the bug, and the “you should just be okay with this” attitude. I’m just not sure I can trust them at this point
  • This vendor had my respect and trust for years, and those have pretty much been undone with this one incident

So… I dragged the appliance off to where I could hook it up to a DHCP server and it could get a firmware upgrade. We’ll have to do the same on the London end, and then reschedule the outage and maintenance.

Sadly, the examples don’t end here. Same vendor- different hardware set. Also dealing with a long-running problem with a feature set that absolutely adds to the appliance’s stratospheric price tag. The work around? Don’t use the feature. The feature that I bought- to use. It’s insanity, and it’s way too frequent.

And I can just deal with that, because code bugs are pretty much a way of life anymore with certain vendors.