Sequel: A Week in the Life- Cleaning Up Afterwards- When WLAN Pieces Don’t Live Up to Their Responsibilities

Captain’s log, stardate 170619. I have just piloted the SS Enterprise WLAN out of the Codesuck Nebula after hostilities with both the Switchites and the WAPs. It was a trying 48 hours of lost man-hours cleaning up after a breakdown in WLC update procedures, but I’m glad to be heading home.  Regrettably, we did suffer casualties. Two valiant 802.1ac access points were cut down in their prime (hee hee, Prime).  Ah well, time for an adult beverage and some cheese.

– Captain Beef Wellington, Intergalactic Wi-Fi Warrior

I feel for Captain Wellington. In fact, its impossible to tell his story without revealing a bit of my own. Do you remember this missive about network bits and pieces not living up to their responsibilities? Of course you do. And now that the cleanup work is done from that misadventure, let’s talk about the indirect costs of a code upgrade gone a bit wrong on a large wireless network.

On this particular code upgrade, I did three failover-pairs of WLC. The first one hosts 144 APs. The second, 908 APs. The third currently has 3,212 access points.  All WLC are the same model, had the same starting and ending code, and all APs are uplinked to switches of two different models (but all running same OS version).

The first WLC pair went swimmingly. The WLC pair and 144 access points upgraded in a textbook maintenance maneuver that yielded no surprises.

The second upgraded pair was generally OK, but three APs were orphaned. They seemingly lost their configurations and names, and kept hitting the upgraded controller and falling away. Over, and over, and over, and over, and over, and over. This went on until their switchports were identified, and the interface PoE was cycled. Then TWO came back fully configured, properly named, and code-upgraded, while the remaining AP did upgrade, but lost it’s shit and had to be fully reconfigured.

  • INDIRECT COSTS: 
    • The loss of use of each AP during their little visit to the Muffin Man
    • Around a man-hour and a half or so to locate the APs MAC addresses in switching, deal with the PoE, verify, and configure the lone problem child.

The last and largest environment didn’t go so well for the upgrade. That’s despite the facts that this environment has not changed much since the last upgrade, and that I have done this procedure many a time in the past. Here, around 80ish access points did not take the upgrade. For the math-minded, that’s around 2 1/2 per cent of the APs in this big environment. Many completely dumped their configs and went stupid, some only seemed stupid until PoE resets, about half needed multiple PoE resets (after waiting a goodly period to see if the AP would snap out of it each time), and two completely failed and had to be replaced.

  • INDIRECT COSTS:
    • The loss of use of each AP during their outage- that’s a lot of capacity denied to end users
    • Because the APs that failed the upgrade were scattered far and wide on several dozen switches, and that many needed to be power cycled more than once, it took at least the equivalent in hours of five full working days at the engineer level to tame the chaos and reconfigure those APs that needed it.
  • DIRECT COSTS:
    • Two current-model APs were irrecoverably lost in this process
    • One man-hour per AP to get each replaced

Items of note throughout this:

  • We did the code upgrade in the name of stability and bug fixes on the WLAN side (yes, irony- shut up)
  • We recently learned of a PoE bug or two on the switching side, which may or may not have been in play
  • Top-end gear is not without problems
  • Even “routine” changes can go off the rails, at least in this product set
  • System complexity and scale lead to more indirect costs in the form of support overhead, that’s just a fact of life on certain product sets
  • There is no moving away from bugs, only trading bugs for other bugs- at least in my own reality

And there you have it.

 

 

To That Guy From 1990 That I Gave $11 To- I Need a Little Help Here

Let me get two points out there straightaway: this post has nothing to do with technology (remember, this is my mostly wireless blog), and I will be making a pitch to help someone in a need. Feel free to bail now, or read on about a decent young man trying to raise some funds.

Reaching Waaaay Back Into the Time-Karma Continuum

In 1990, I was a newly-married young airman in the US Air Force, living in base housing that was detached from Keesler Air Force Base in Biloxi. Down the road from my neighborhood on the same street was a low-income neighborhood that we called “The Projects”. One day while my wife was at work, I was fixing my bicycle in the driveway when a resident of the projects walked by, and struck up a conversation that quickly moved to him saying “… and that’s why I need some money. I’ll pay you back when I can.” I don’t remember what his story was, but I do remember he was roughly my age, and I really didn’t believe a word he said.

But I also remember thinking that I had a regular payday coming soon, and this guy probably didn’t. I had no idea where the eleven bucks I would give him would get spent, and didn’t really care. I don’t think of myself as an overly “Christian” person, but every now and then when I see someone needing help, I do what I can. This was one of those times. Lots of people have been kind to me through the years, so I try to give a bit back when I have it to give.

Fast Forward to Today, and Introducing Adrian.

I never saw that money again, but I hope it somehow helped that stranger. On the long-shot chance that he’s reading this (OK, it’s a ridiculous long-shot), or anyone else who believes in Paying it Forward, I want to introduce Adrian. If you’re out there, 1990 Eleven Dollar Guy- please consider giving the eleven bucks I gave you back then to Adrian now.

Adrian Adrian is half-way through his BS degree and pilot training at Embry-Riddle, and circumstances have conspired against him to put his professional future in serious jeopardy. He’s an awesome young man, and he’s trying to do everything right despite some sudden financial challenges.

That eleven bucks would help. So would anything that anyone feels compelled to donate to one of the sweetest young men I’ve ever met.

Adrian’s Go Fund Me page is here. If I didn’t know him personally, I wouldn’t be sharing this.

And thank you who read through this for letting me take a time-out from technology to spread the word.

 

A Day in the Life- When WLAN Pieces Don’t Live Up to Their Responsibilities

I stared into the darkness, and softly spoke
“What the shit is this? Why didn’t it reboot?”
The early morning mocked me
The clouds and the birds and the rising sun
Even my first cup of coffee
All sang and screamed and laughed
“Your stupid WLC didn’t reboot! It didn’t reboot!”
And so I laughed, like an idiot, as not to cry.
Beef Wellington, from The Controller Chronicles

Sigh… Sometimes things don’t do what they’re supposed to do. Like in the case of a simple Cisco 8540 controller upgrade. It matters not that I’ve done this procedure about a hundred times through the years. THIS TIME, the controller had it’s own idea about how this code upgrade would go down.

And Time, a maniac scattering dust,
And Life, a Fury slinging flame.
Tennyson, from In Memoriam

no reboot

The maintenance window was claimed. Change control was done. Code was downloaded to the 8540. And… the required reboot was scheduled May 30, at 0400.

Yet… 05:16 rolled around on May 30, and the reboot was still configured for 0400.

Have you seen the bruise on that man’s head?
   -Professor, on Gilligan’s Island: Waiting for Watubi episode

The bruise is mine. From beating my head against the wall. But whatever… forcibly reboot the 8540 (slightly outside of the maintenance window- but don’t tell anyone). Now all is good- except for dozens of APs that lose their config in the process.

APs default

EXCEPT not ALL of the APs that went to defaults REALLY went to defaults. Only about 20% did. The rest come back proper, with full config, if you remove and restore their power. It makes no difference that they are correctly showing in CDP, drawing good inline power, etc. You’ll reboot if you want them back. That other 20%? They really are defaulted. Build their configs from scratch, and shut up about it.

I need you, I need you, I need you right now
Yeah, I need you right now
So don’t let me, don’t let me, don’t let me down
I think I’m losing my mind now
It’s in my head, darling I hope
That you’ll be here, when I need you the most
So don’t let me, don’t let me, don’t let me down
D-Don’t let me down
Don’t let me down
Chainsmokers, Don’t Let Me Down

Sorry, Chainsmokers. Letting people down is kind of a way of life in/for these parts.

Why You Should Care About MetaGeek’s MetaCare

metageek logoTo the WLAN support community, there are just a few tools that are truly revered. Among these are the various offerings by MetaGeek. I still have my original Wi-Spy USB-based Wi-Fi spectrum analyzer dongle that I used a million years ago when 2.4 GHz was the only band in town, but have also added almost every other tool that MetaGeek offers. Go to any WLAN conference or watch the typical wireless professional at work, and you’ll see lots of MetaGeek products in play. So… is this blog a MetaGeek commercial? I guess you could say so to a certain degree. I decided to write it after my latest renewal of MetaCare to help other MetaGeek customers (and potential customers) understand what MetaCare is all about.

I queried MetaGeek technical trainer Joel Crane to make sure I had my story straight, as MetaCare is one of those things you refresh periodically so it’s easy to lose sight of the value proposition. Straight from Crane:

MetaCare is our way of funding the continued development and support of our products. It’s also a great pun (in my opinion), but people outside of the United States don’t get it. When you buy a new product, you basically get a “free” year of MetaCare. When MetaCare runs out, you can keep on using the software, you just can’t download versions that were released after your MetaCare expired.

On this point, I have let my own MetaCare lapse in the past, then lamented greatly when an update to Chanalyzer or Eye P.A. came available. You have to stay active with your MetaCare to get those updates! Which brings me to Crane’s next point.

When you renew MetaCare, it begins on the the date that MetaCare expired (not the current date). Basically, this keeps users from gaming the system by letting it lapse for a year, and then picking up another year and getting a year’s worth of updates (although I try to not point fingers like that, generally our customers are cool and don’t try to do that stuff). MetaCare keys are one-time use. They just tack more MetaCare onto your “base” key, which is always used to activate new machines.
Like any other decent WLAN support tool, you gotta pay to play when it comes to upgrades. At the same time, I do know of fellow WLAN support folks who have opted to not keep up their MetaCare, and therefor have opted out of updates. Maybe their budget dollars ran out, or perhaps they don’t feel that MetaGeek updates their tool code frequently enough to warrant the expenditure on MetaCare. As with other tools with similar support paradigms, whether you use to pay for ongoing support is up to you. But I give MetaGeek a lot of credit for not rendering their tools “expired” if you forego MetaCare.
Crane also pointed out one more aspect of the MetaGeek licensing model that is actually quite generous (other WLAN toolmakers could learn something here!):
 Speaking of base keys, they can be activated on up to 5 machines that belong to one user. Each user will need their own key, but if you have a desktop, laptop, survey laptop, a couple of VM’s… go nuts and activate your base key all over the place. 

And now you know. As for me, my MetaCare costs are a business expense that I don’t mind paying- and I’m really looking forward to new developments from MetaGeek.


But wait- there’s more! Thanks to Blake Krone for the reminder. MetaGeek has a nice license portal for viewing and managing your own license keys, so you don’t have to wonder where you stand for available device counts, license expiration, etc.

_______

Related:

The Unfunny Knock Knock Joke

Knock Knock.

Who’s there?

Contemporary WLAN-related code.

Nooooo. Just go away.

Come on now. Let’s spend a few weeks together making me work right again.

In the name of all things good and decent, fix yourself and get back to work. I paid a shitload of money for you.

You know better by now… Escalation build! Patch! Super secret patch! Database diddling! Nonsensical workaround! I want it. I want it all!

You realize that you are the systems that are supposed to keep up the system? Not the system that I’m supposed to dedicate my entire freakin life to keeping up because someone built you wrong? You know that, right? 

Pfft. Talk to my developers. You’re lucky to have the privilege of wallowing in my suck. This is market-leading suck.

I don’t have the hundreds of hours per year you need. We may not be the best match. You’re kinda high-maintenance, no offense.

Software company! Software company!

I have other work to do- like real work. I may have to let you sit here, not delivering the value I’m supposed to be getting out of you.

Maybe you need to buy more licenses! I got lotsa license types, so there’s more room for bugs!

Yeah… see, I’m just gonna go now. I guess I really don’t need to see the clients attached to your flagship, cutting edge APs (that we also spent a boatload of money for) on floorplans. And I suppose I can do without that other highly-touted feature we bought- because it actually breaks the network. Just make sure my users don’t get screwed over for basic access, OK? That is actually still a thing, you know.

Escalation build! Patch! Super secret patch! Database diddling! Nonsensical workaround! Upgrade to non-recommended code version! Kock knock.

Ah geeze. We’re looping. Maybe another reboot… 

 

The Idiot’s Guide to Ubiquiti UniFi

BTW- I’m the idiot, in this case. Something about Ubiquiti’s “UniFi” approach to networking can make me feel confused and inexperienced at times. But I’m determined to make peace with it, and to also maybe help save someone else the confusion. Ubiquiti’s product lines are interesting, feature rich, innovative, flexible, and cost-effective. And… also occasionally bewildering if you have yet to Ubiquitize your mind. To this point, let me (hopefully) make the indoctrination to UniFi a little easier.

UniFi is a Management Methodology AND Networked Components

Part of what confused me early on was the name- “UniFi” must surely just be a bunch of bridges and access points… As in, things that do Wi-FIIf you’re thinking that, you’re wrong. UniFi is more like UniFied in that a wide range of switches, access points, security gateways, video components, and more are branded with the UniFi moniker and managed as an ecosystem.  First major point: UniFi isn’t just wireless.

As for how the UniFi ecosystem is managed, that’s one of the main areas of getting to know Ubiquiti’s latest stuff that made me feel like a child (and not a very smart child, at that). I have set up and managed my share of other non-UniFi Ubiquiti bridges, where you get to the individual component’s UI and configure to you heart’s delight. But if it’s a UniFi AP, switch or gateway, life gets a little more involved. Forget the individual per-component UI, for UniFi you need to adopt each component into a “controller” and then manage a “site” worth of stuff (or multiple sites) via the controller.  Second major point: you don’t generally manage individual UniFi parts/pieces, you adopt each into a “controller” and then manage them all from the controller interface. I’m not a fan of the term “controller” here, but it is what it is. Think OpenMesh or Meraki dashboards and you’re on the right track.

Maybe Too Flexible?

This is where experienced UniFi users might tell me to go eat rocks, and I’m OK with that. But I have been utterly confounded trying to wrap my head around the various incarnations of the UniFi Controller. One way or another, you need to get to this point:
UniFi Controller

This inventory view of the Controller shows what devices I have, then from there it’s pretty robust in both configuration and monitoring capabilities.
UniFi Controller1

UniFi Controller2

Once you get your devices into the controller instance, life gets pretty pleasant. I give Ubiquiti a lot of credit for the completeness of the management interface and for putting together a framework that makes perfect sense- once you get there. Getting there, however, can be tricky. To me, Ubiquiti isn’t doing so hot on their messaging that the UniFi controller can take multiple forms and that you have to really know which form you want to use before your bring an environment to life.  I’ve spent a lot of time pouring through Ubiquiti’s web pages, and there seems to be more of an emphasis on dazzling potential customers with grand claims of cloud this and that and SDN blah blah blah than a realization that newcomers to Ubiquiti may need some basic buzzword-free guidance on this controller thing. The UniFi controller can exist in different forms, and you can only use one at a time with a given set of end devices:

  • On a laptop. You need to use the controller to manage devices, but the devices don’t NEED the controller to operate, so you might only invoke the controller when you have changes to make. But… here you don’t get the monitoring and statistics that you would with a more persistent controller method.
  • On a CloudKey.  Now this is cool. I wrote about my first use of CloudKey here, and you need to know that it’s just another way of managing the UniFi devices.
  • On your own virtual host. Load up a controller in AWS, manage a bunch of sites in your own private cloud- but know that you have to provision the devices to get them to your cloud-hosted controller with effort not required in pure cloud-managed systems like Meraki and OpenMesh.
  • Let Ubiquiti host it. Recently added to the UniFi offerings is the Elite Controller option. Here, you end up with something that’s kind of like Meraki but not nearly expensive. You pay a modest fee per device, and in exchange Ubiquiti provides cloud hosting of the controller for your devices, and phone and chat support. Unlike Meraki or Open Mesh, this is not plug and play. Your devices do not magically tunnel out to the cloud controller just because you’d like them to! You need to provision the devices, as Justin Paul writes about in his blog. If you don’t do the provision thing right, you’ll beat your head against the wall in frustration.

Third major point: there are several versions of “UniFi Controller”. You have to grasp the differences to decide how you’ll manage a given network, 

I’m currently kicking tires on UniFi hardware and the Elite Cloud option. I will have much to say on both as my evaluation continues, but I do hope that this quick primer can help anyone who is new to Ubiquiti’s UniFi environment.

Newsflash: All 5 GHz Clients Don’t Work on All 5 GHz Channels

OK- this really shouldn’t be a newsflash. But, if you’ve never had to deal with what I’m about to summarize, then it may well be a headline story. But first, a word from today’s musical guest- Genesis, fronted by the great Phil Collins:

Talk to me, you never talk to me.
Ooh, it seems that I can speak.
I can hear my voice shouting out.
But there’s no reply at all.

Look at me, you never look at me,
Ooh, I’ve been sitting, staring, seems so long.
But you’re looking through me
Like I wasn’t here at all.
No reply, there’s no reply at all.

Phil and the boys know well what happens when you assume that any 5 GHz client will work on any 5 GHz access point. Rumor has it that Genesis was troubleshooting a wireless installation at a mall in Duluth when they were inspired to write the super-hit “No Reply at All”, but that’s a story for another time.

I’m here to tell you of- and show you- an example of a 5 GHz client that just can’t (and therefore WON’T) talk to anything but a few 5 GHz channels. If it’s not obvious, there is high potential for the “the network sucks!”  factor here. If you don’t know what you’re doing, you can foolishly add more APs, tweak every setting there is to tweak, RMA one client device after another, and end up with an over-radiating nonfunctional heap of squadoosh, baby.

Trouble in Po Po Land

Once upon a time, there was an awesome dual-band Wi-Fi network that few could match. The APs were pretty, the signals were clean, and the installation crew was a bunch of snappy gents. Thousands upon thousands of client devices used this high-performing WLAN daily- every kind of laptop under the sun, all sorts of common mobile devices, and smartphones aplenty.

Then the police cars came.

The Long Arm of the Law wanted in on that Wi-Fi goodness. The idea was simple: police cars would pull into their very wireless well-covered parking area at the end of shift, and dashcam video would automatically download to network servers via that sweet, sweet Fi. A vendor was hired to equip the cars, the police technical staff got the lowdown from the network folks on how to configure the client devices, and everything seemed good.

Except it didn’t work.

About That Police Car Wireless Client Device

The cruisers in question are equipped with the Ubiquiti Bullet M5 radio. These have a handy form factor, and can be had for less than $100 (then obscenely marked up and resold as something special).  And look- they are 802.11a and 11n-capable!

M5-2

Should be no issues on that robust dual-band network, as long as signal is coming out of the 5 GHz radios in theAPs and the 5 GHz radio in the M5- yes? I can stand next to the police car with my iPhone and connect on 5 GHz, so the car should work too! But… the cars weren’t working at first, despite their 5 GHz output being verified with a number of tools.

Curse you, fickle Fi! What dark magic is afoot?

5 GHz is a Big Range of Channels. You Gotta Understand Those Channels.

So, this big world-class WLAN uses a lot of 5 GHz channels (36, 40, 44, 48, 52, 56, 60, 64, 149, 153, 157,  and 161). But take a look at that graphic again. The M5 operates in the range of 5170 to 5825 MHz, whatever that means. And did you catch the footnote?

DID YOU CATCH THE FOOTNOTE? (* Only 5725 – 5850 MHz is supported in the USA)

If you didn’t know any better, you might expect that the entire range of 802.11a and .11n is 5725-5850 MHz, and that all of the channels on the WLAN would fit in that range. This is American Wi-Fi, and that’s an American client device!

It just isn’t that simple. Looky here (5 GHz channels, Wikipedia):
5 chans

It turns out that the M5 only works in one small slice of the entire 5 GHz range that 802.11a/n/ac Wi-Fi can function in. So… those police cars were hitting lower frequency channels from the WLAN that they don’t support. A quick channel change for the parking lot APs to the few that the M5 does support, and the video was soon flowing from the cars as desired.

This Happens Often on Utility Devices- Be Aware!

I’ve seen this same scenario play out on ticket scanners in stadiums, retail scanners in warehouses, and wireless cameras that all operate in only a slice of 5 GHz. You absolutely MUST understand what radio capabilities are in play when it comes to non-mainstream devices.

These are the cases that often separate WLAN pros from those who don’t understand the important nuances that unfortunately pervade modern Wi-Fi. And that lack of understanding can lead to a lot of wasted time and money trying to fix a problem that is nothing more than poor configuration born of ignorance.

Just how complicated is the question of which individual devices can operate on what specific 5 GHz channels? Let’s ask a good guy named Mike Albano.