The Dangers of Experts Writing Documentation: A Real Life Example

There are some real, tangible dangers to having experts write documentation. Experts have the perfect tools, skip steps, know where things are based on experience, use jargon, have spare parts so mistakes aren’t a big deal, and as a result make terrible time & work estimates. This leads to confused, and subsequently angry, people, which is probably not what you wanted.

I was thinking about all this as I entered my fourth hour of installing a trailer wiring harness on my Mazda CX-9 today. It’s a unit from Curt Manufacturing, kit #56016. When my CX-9 was in the shop for an alignment a few weeks back I had them put a hitch on it. They got squirrelly & weird when I mentioned installing the wiring harness, though, and I decided that I could just do it myself some afternoon.

The documentation in the harness kit is horrible, the written equivalent of “just wire it up, duh.” Luckily they offer a YouTube video. Since I don’t do a lot of trailer wiring it seemed prudent to take five minutes and watch it.

One hour? That’s cool. I’m also wondering what “proper safety equipment and precautions” are. I guessed that their lawyers made them keep it vague so the onus was on me to figure it out, and absolve them of any liability. Whatever, I’ll figure it out.

I notice their tools have air hose couplers on them, and aren’t quite what people would have in their garage. These also aren’t the tools that were listed in the installation documentation. I pause the video and spend 10 minutes digging out my cordless Dremel, my right-angle drill (yeah, I actually have one) & my bit index, and 5 more minutes trying to find my 10mm socket which is the social butterfly of my socket set. Total time elapsed so far: about 20 minutes.

I also notice that the car in the video is on a lift, but they omitted a lift from this list of tools. This becomes significant later.

Begin by opening the back hatch. So far, so good.

Remove the floor coverings, storage trays, and rear scuff panel. The woman in the video pops all that stuff out in 5 seconds on the video. Things she didn’t cover include the mystery of the fasteners keeping the storage trays attached to the chassis, five additional panels she didn’t remove but are in the way for me, and the subwoofer back there with its 400 bolts. Back to the web, and luckily has a 10 minute video on this same topic and doesn’t skip these things. If you ever write documentation these are the sort of details that really matter. Total elapsed time: 50 minutes, and that’s with the help of my Milwaukee impact driver, which also wasn’t on the tool list.

Disconnect the negative battery terminal. Done. 55 minutes.

Remove the rear cargo loops & Phillips screws. Luckily the other video I found had advice on this, because the woman in the Curt video took 10 seconds to pull these things out of their CX-9. I’m up to 65 minutes.

Disconnect the taillights and insert the wiring harness connectors. This seems straightforward but in the video the rear vehicle trim stays popped out so the woman can work on it. I conclude she must have a prehensile tail.  Us normal humans use a scrap 2×4 cut to hold the panel open. Elapsed time: 85 minutes. Note that “miter saw” and “2×4” are not on the list of tools.

Route the green wire over to the other side and hook it up. The other video had good advice on this, too, as there were nuances completely skipped by the Curt video, and by the time you’d realize that you’d have everything all closed up already. 95 minutes.

Grind off some paint, drill a pilot hole, and use the included screw to connect the ground wire to the chassis. “Be mindful of what you drill into and what is behind it.” No kidding. I love my Dremel, by the way. If you get a Dremel get a cordless one with a lithium-ion battery (so it holds a charge over time), the engraving handle accessory (I never take it off), a combo pack of bits, and extra cut-off wheels. You will be unstoppable. 100 minutes.

Find a suitable mounting location for the converter box, and use the tape to attach it to the chassis. Unfortunately for me my CX-9 has a whole gob of mechanical stuff right where she stuck her converter in the video, so I have to figure that one out. In the process I also notice that the green wire is looped outside something where it shouldn’t be, so I have to go back to the other side, unwire it, and rerun the wire. 120 minutes.

Strip the power wire and use the included butt splice connectors to crimp the fuse holder on. No sweat. 125 minutes.

Remove the positive accessory nut on the battery cable. Connect the fuse holder’s eyelet to it & refasten the nut. Done. 130 minutes.

Route the black power wire down past the engine block and towards the back of the car, keeping away from moving parts and excessive heat sources. Um, excuse me?

I mentioned this before, but perhaps you notice something peculiar about the woman in this image:

Yeah, she’s standing underneath the damn car. As I am neither 6 inches tall nor in possession of a car lift I go and get my set of vehicle ramps and get the Mazda up on it. I don’t consider myself an idiot but I can only guess what gets hot or not under there. She ties her power wire to some HVAC connections, but I surmise that one of them probably gets hot at times.

The other video runs the cable from the trunk side first, and in looking at things that seems like a better plan, so I take it all apart, cut the splice connectors off, and run it through a rubber grommet in the trunk. There was some black silicone sealant in the kit which is never mentioned anywhere, so I use that to glue the grommet back down so the cable doesn’t move & rub & short out, and to coat the ground wire connection I made earlier so the chassis doesn’t rust there.

I end up taking that black tray you see behind her hands off the car and running the cord through there. The kit includes the world’s worst zip ties, especially when you’re upside down under a car, and about half of what I used, because I absolutely don’t want this wire snagging on something, nor being an obstacle in the future when some mechanic is hacking away at the car. At this point I’m wondering why I don’t just plug this into the accessory outlet in the back, but trailer wiring is often sketchy, especially for boat trailers, and I’d rather not blow my accessory fuses.

In the video it takes her 11 seconds to do this. ELEVEN. SECONDS. Total elapsed time for me, out here in this hell I call reality: 210 minutes. That’s 3.5 hours, and they said it’d take me 1.

Locate the rubber grommet that gains access to the trunk. Punch a hole in the grommet and run the wire through. Yeah, I did this already.

Strip the power wire and use a splice connector to connect it to the converter box. You have got to be kidding me — I don’t have any more butt splice connectors. I ponder riding my bicycle to the hardware store but I carefully secure everything, reconnect the battery, and drive over there. I look like I’ve rolled around under a car all day, which amuses the staff. Local hardware stores rule, by the way. Try asking a Home Depot employee where the butt splice connectors are and you’ll get a blank look and a “this isn’t my department” comment (to which I always retort “so why are you standing here, then?”). I was in & out of my local True Value in 2 minutes. Anyhow, 230 minutes, and I now own 47 more connectors than I need.

Route the “four flat” under the trim. Replace any previously removed vehicle parts. Put the fuse in. Test it using an electrical tester or a properly wired trailer. I love how they use jargon here, “four flat.” A few more seconds and their demo CX-9 is all back together. It takes me more than that, I mess around and find a better way to route the connector through the spare tire area so that it can be stored out of the way. They also don’t mention that getting the interior of the vehicle reinstalled means 15 minutes of fighting to get it underneath the trunk door gasket again, which is hard because it’s squirrelly. I end up using some levers designed for changing bicycle tires.

Total elapsed time: 270 minutes. 4.5 hours.

It’s one thing to let experts write documentation, but it absolutely needs to be tested by novices. What would have helped here?

  • A video that isn’t abridged. Show the whole process, even if it’s long. Do not skip any steps.
  • Better paper documentation and a complete list of required tools.
  • Acknowledgement that at some point you are going to need to be under the vehicle.
  • Documentation for all the parts in the kit. Nowhere did anything mention the silicone sealant, so at the end you’d be left wondering if you screwed up.
  • Spare parts in the kit. A couple more splice connectors and more higher-quality zip ties would have helped immensely.
  • Better time estimates. Frankly it would have been better to omit the estimate altogether than seriously understate the time committment like they did.

Better product design would have removed the need for a lot of this, too. As it turns out the CX-9 comes pre-wired for trailer wiring, and a product that plugs directly into the harness in the back would have saved immense amounts of time under the car. In the future I know that’s the route I’ll go in the future, choosing the more expensive OEM part that is way easier & faster to install. Opportunity cost is a real thing.

Intel’s Memory Drive Implementation for Optane Guarantees its Doom

A few weeks ago Intel started releasing their Optane product, a commercialization of the 3D Xpoint (Crosspoint) technology they’ve been talking about for a few years. Predictably, there has been a lot of commentary in all directions. Did you know it’s game changing, or that it’s a solution looking for a problem? It’s storage. It isn’t storage. It’s RAM. It isn’t RAM. It’s too slow to be RAM. It’s too small for storage. It’s useful now. Nobody will use it for years.

Yup. Confusion. It’s because Optane is a bunch of different things. It’s consumer and enterprise, and it’s both storage and memory.

There are plenty of articles out there on the technology itself. There’s a small M.2 version for desktops that acts as a cache, which is thoroughly uninteresting to me. I’d rather have a real SSD in one of my precious M.2 slots than a cache that I overrun with three photos from my Nikon SLR. Not to mention I need a 7th generation Intel Core CPU (Kaby Lake) to do this at all.

The real action is with the data center version, the P4800X. The first version is a 375 GB PCIe NVMe card. 375 GB isn’t very much space, but Intel says they’ll have 750 GB and 1.5 TB models out this year. The technology is a lot faster than the NAND flash typically found in SSDs, and the endurance is a lot higher, too (writes to SSDs use voltages that stress, and eventually destroy, the cells in the SSD). Intel says this thing can do 500,000 write IOPS, which makes it a hell of a write cache for something like VMware vSAN, even if it is a bit small. As a storage device, though, Optane is interesting but really just an evolution of NVMe flash technology.

Memory Drive Technology

What’s really interesting to me is the “Memory Drive” component, which seems intent on blurring the lines between memory and storage. You can use the P4800X to create a pool of something that looks to an OS like memory. It is an order of magnitude slower than regular DRAM, but several orders of magnitude faster than an SSD. Given that you could theoretically put 24 TB of Optane in a two socket server — for a lot less money than 24 TB of DRAM – there are some pretty interesting implications. Think about being able to hold a whole enterprise database in memory. The best I/O is one you don’t do, and having all that data close by means a lot less read traffic on your storage, not to mention it being a lot faster.

There aren’t a lot of details about Memory Drive, though. The product brief says it’s Linux only, and that it’s a software layer of some sort. Recently, though, I found a piece over at AnandTech which actually had details around this (link below, kudos to the author, Mr. Tallis, for digging into this). That post indicates it’s a paid add-on, and something like a hypervisor that boots from a USB device, or an IDE controller before the OS loads.

Amateur Hour at Intel

USB or IDE? An extra hypervisor? Paid? What is this, amateur hour? Intel wants me to pay extra for the privilege of booting my servers from a $5 USB drive, which can’t be mirrored or otherwise protected, so that I can load a software layer that basically makes my OS completely unsupported and more complicated? Oooh, sign me up.

Here’s my prediction: no self-respecting enterprise will use this because it is an operational disaster (lack of boot device redundancy, lack of IDE devices, lack of support for popular operating systems, lack of visibility into the Memory Drive layer, even just the nightmare of hardware licensing). As such, nobody will buy the add-on software. A company like Intel charges for features like this to gauge interest, and Intel will eventually incorrectly conclude that the lack of sales is an indicator that nobody is interested. They will then discontinue the product, and because Intel is effectively a monopoly that’ll be the end of this technology. Long live the status quo! Death to the unholy union of DRAM and storage!

On a parallel track, because the poor implementation means little interest from enterprise users, OS vendors won’t be pressured by users, application vendors, or Intel to develop anything for this new layer of addressable storage. That’s a damn shame because there’s real promise here. If Optane support were simply built into the server CPUs and chipsets moving forward, as a native part of what we get for paying the Intel price premiums, people would use it en masse. It should be as easy as plugging an Optane card in and flipping a switch in the BIOS to make it SSD or memory, non-volatile or volatile.

If that happened we’d start seeing real support for it in OSes, applications adapting to use it, and real, interesting, and positive change happening in our data centers. As it stands, though, I fear that Memory Drive is destined to die a slow death for the wrong reasons, at the hands of the ignorant-of-their-customers Ferengis running Intel.


Install the vCenter Server Appliance (VCSA) Without Ephemeral Port Groups

Trying to install VMware vCenter in appliance/VCSA form straight to a new ESXi host? Having a problem where it isn’t listing any networks, and it’s telling you that “Non-ephemeral distributed virtual port groups are not supported” in the little informational bubble next to it?

Thinking this is Chicken & Egg 101, because you can’t have an ephemeral port group without a Distributed vSwitch, and you can’t have a dvSwitch without a vCenter, so how do you install vCenter when you need something that only vCenter can create?

Yeah, me too. Here’s the secret, though: don’t remove the default “VM Network” port group, or if you did, put it back, and restart the installer (or just back up to select the host again).

Ah, that’s better. I’d removed it in favor of adding another port group with the right VLAN and such. I should have just customized it in place.

In other news, it’s apparently been a while since I’ve done a completely bare-metal install! As much as I hate to admit it, in my frustration I actually broke down and called VMware Support about this. My reputation is safe, though, since they had absolutely no idea what I was talking about, and I figured it out while they were trying to apply their ponderous & regimented support process to me. Just makes me long for Business Critical Support again. Cost/benefit was wrong for us when we renewed the ELA it was part of but you could ask those folks ANYTHING and have an instant & dead accurate answer, and usually an offer for a WebEx to fix it.

vCenter 6.5b Resets Root Password Expiration Settings

I’m starting to update all my 6.x vCenters and vROPS, pending patches being released. You should be doing this, too, since they’re vulnerable to the Apache Struts 2 critical security holes. One thing I noted in my testing is that after patching the 6.5 appliances, their root password expiration settings go back to the defaults. In this case I’d set them to not expire, but it’s clearly not that way anymore:

Depending on your security requirements this might not be what you want. It’s bad form on VMware’s part, changing something that had been explicitly set. I also didn’t test to see if it resets the actual password age, or just the expiry. You might have far less than 365 days before it expires.

While it’s a good idea to rotate passwords, I also hate being locked out of my infrastructure, especially since I usually discover it in the middle of another problem… But to each their own. Good luck!

How Not To Quit Your Job

I’ve thought a lot lately about Michael Thomas, a moron who caused criminal amounts of damage to his former employer in the process of quitting. From The Register[0]:

As well as deleting ClickMotive’s backups and notification systems for network problems, he cut off people’s VPN access and “tinkered” with the Texas company’s email servers. He deleted internal wiki pages, and removed contact details for the organization’s outside tech support, leaving the automotive software developer scrambling.

The real-life BOFH then left his keys, laptop, and entry badge behind with a letter of resignation and an offer to stay on as a consultant.

More than a decade ago I did some consulting for a company that had this happen. They fired their sysadmin and he basically ransomed them, logging in through dozens of back doors to disrupt their business. My first call was to the local police department. This was before these types of crimes were very prevalent; we were lucky that the larger Californian city these crimes were in had a detective with an idea of what to do. Let me tell you: hiring the guy back was never on the list (though pretending to, and meeting up with the guy to grab him, was what the FBI wanted to do). If you do this to someone and they invite you back in to talk or rehire you, and you go, you deserve everything you get because you’re dumb.

Whistleblowing aside, if you’re playing Michael Thomas in a story like this there is absolutely nothing you can say to law enforcement to keep them from throwing you in jail. Think about it. On one side you have a business with a demonstrable material loss because of your actions. On the other side, you’re saying “BUT THEY WERE MEAN TO ME.” And unlike my story above, set in the early ‘oughts, there are actually laws and law enforcement professionals now that will bust your ass and make the charges stick. The process will be years long, too. Mr. Thomas pulled his stunt in 2011, and they finally got around to convicting him. Do you really want to waste that much of your life, with something like that hanging over your head that’ll ultimately destroy your life and career, because of something that felt good for a few minutes?[1]

Beyond all of that, what bugs me the most is how many ways this guy could have screwed with them and gotten away with it. I’m bothered for two reasons:

1. It speaks to how much trust we place in system administrators, and how system administrators need impeccable ethics as well as good judgement. We can implement all the security in the world and, usually, it still comes down to needing to trust a person. Hiring the right people is SO important.

2. It also bothers me because the guy was JUST. SO. DUMB. In a couple minutes over lunch some colleagues and I had ten different, solid, ideas for ways to screw with someone’s systems, mostly based in real-life experience with well-meaning dumbasses. Some highlights were: change the netmasks in their DHCP pools to non-standard ones (e.g. so it’s pretty random what works and what doesn’t, any manner of trickery with scheduled tasks/at/cron, off-hours system shutdowns that look like scripting errors, and redefining localhost (we just had this happen in our Active Directory with someone trying to join an Ubuntu host… OMFG). Extra points if it all just looks like errors, or makes them think you’re an idiot if & when they find the problem. Though in smaller communities that may backfire — people do talk to one another.

Interestingly enough, though, nothing any of us suggested was inherently destructive, just annoying. And when it comes down to it, none of us would actually do any of it, choosing instead to drink a beer and move on with our lives. That, perhaps, is the biggest lesson in the Michael Thomas story. As cathartic as it may be to stick it to the man, if you don’t like your job it’s always a better choice to just simply find a different one and politely move on.


[0] “I was authorized to trash my employer’s network, sysadmin tells court” – The Register, 23 Feb 2017

[1] Get your mind out of the gutter, kids are great.

Standards, to and with Resolve

"You can have any color as long as it's black" - Henry Ford

“You can have any color as long as it’s black” – Henry Ford (Image (C) Michael LoCascio, via Wikimedia Commons)

As the holiday season has progressed I’ve spent a bunch of time in the car, traveling three hours at a crack to see friends and family in various parts of Midwestern USA. Much of that travel has been alone, my family having decided to ensconce themselves with my in-laws for the full duration of the week. That has left me ample time to sing aloud in the car, take unplanned detours to collect growlers of beer from esteemed breweries, and to think.

I don’t do New Year’s resolutions. I’m not against them, per se, but I just think they’re too conveniently abandoned. I like the noun form of “resolve” better — a firm determination to do something. I aspire to have resolve, whether I am deciding firmly on a course of action, or settling or finding a solution to a problem, dispute, or contentious matter.

So to what issue should I bring my resolve to bear? What is it that I want to work on in 2017?

As I thought about this, I always crept back to the idea that IT just isn’t the game I signed up for a few decades ago. It seems a lot less technical, at least at the infrastructure level. A lot of the new infrastructure, whether it’s on site or in the cloud, is just simpler. Storage is getting simpler because SSDs are now cheaper than rotational media. Hyperconverged infrastructure has removed a number of pain points as well, including things like discrete SANs. Compute is getting ridiculously dense. What was possible in a 4U server is now possible in essentially a half rack unit (something like a Dell FX2).

With all that, a lot of the crap we’ve dealt with over the years just evaporates.

So what do I work on? What’s the biggest, most fundamental problem around, lying at the core of everything?


That’s it. Standards. Without standards you cannot automate, and cannot remove many of the remaining problems at the infrastructure level. Without standards there are bad assumptions, and the inevitable human error and downtime that follow. The foundation of a modern IT operation is standards.

As it turns out, standards aren’t a technical problem, either. The way I see it, they’re usually a financial problem, insofar as someone didn’t budget enough money to do something the way everybody else does, and now it needs to work. Or perhaps it’s a difference of opinion, or a technical requirement that is incompatible with things. Maybe a time constraint. Or a workflow problem, where the workflow should have included IT but didn’t until it was too late. Regardless, though, I see standards as the foundation of IT moving forward, transcending clouds, containers, applications, networking, everything.

So that’s what I’m going to work on –finding a way to enable deep automation and staff time savings with standardization, without unduly limiting projects or adding financial burdens. I urge you to do the same with the copious free time you now have because of flash disk and hyperconvergence.


esxupdate Error Code 99

So I’ve got a VMware ESXi 6.0 host that’s been causing me pain lately. It had some storage issues, and now it won’t let VMware Update Manager scan it, throwing the error:

The host returns esxupdate error code:99. An unhandled exception was encountered. Check the Update Manager log files and esxupdate log files for more details.

A little Google action later and it’s clear there isn’t a lot of documentation, recent or otherwise, about this out there. People suggest rebuilding Update Manager, or copying files from other hosts to repair them. The VMware KB has documentation of the particular error but only in context of the Cisco Nexus 1000V, and only for ESXi 5.0 and 5.1. Here’s another thought, if you’re in my same situation.

1. First, do what it says: check esxupdate.log. Log into the console of the ESXi host (SSH or otherwise) and “tail -f /var/log/esxupdate.log”

2. Scan the host with Update Manager so that the log has fresh data in it. You should see it pop up. In my case it showed:

2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: An unexpected exception was caught:
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: Traceback (most recent call last):
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/usr/sbin/esxupdate", line 238, in main
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: cmd.Run()
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/", line 113, in Run
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/", line 244, in Scan
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/", line 106, in _generateOperationData
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/", line 89, in _getInstallProfile
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esximage/", line 627, in ScanVibs
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esximage/", line 62, in __add__
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esximage/", line 79, in AddVib
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esximage/", line 627, in MergeVib
 2016-05-27T15:54:52Z esxupdate: esxupdate: ERROR: ValueError: Cannot merge VIBs Dell_bootbank_OpenManage_8.3.0.ESXi600-0000, Dell_bootbank_OpenManage_8.3.0.ESXi600-0000 with unequal payloads attributes: ([OpenManage: 7807.439 KB], [OpenManage: 7809.081 KB])
 2016-05-27T15:54:52Z esxupdate: esxupdate: DEBUG: <<<

Ctrl-C will end the “tail” command.

3. It looks like during the storage issues that something about the OpenManage VIB became corrupt, and now it thinks there’s two copies with different payload sizes. You know what? I can just remove this VIB and reinstall it (rather than having to rebuild the host or do some other complicated fixes). I issue a “esxcli software vib list | grep -i dell” command to find the name of the VIB:

[root@GOAT:/var/log] esxcli software vib list | grep -i dell
OpenManage 8.3.0.ESXi600-0000 Dell PartnerSupported 2016-05-04 
iSM        2.3.0.ESXi600-0000 Dell PartnerSupported 2016-05-04

4. Then we need a simple “esxcli software vib remove –vibname=OpenManage”

[root@GOAT:/var/log] esxcli software vib remove --vibname=OpenManage
Removal Result
 Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
 Reboot Required: true
 VIBs Installed: 
 VIBs Removed: Dell_bootbank_OpenManage_8.3.0.ESXi600-0000
 VIBs Skipped:

5. Do what it says and reboot, then scan to see if it works. In my case it did, then I reinstalled the missing extension, and patched to the latest version like normal.

Use Microsoft Excel For Your Text Manipulation Needs

I’m just going to lay it out there: sysadmins should use Microsoft Excel more.

I probably will be labeled a traitor and a heathen for this post. It’s okay, I have years of practice having blasphemous opinions on various IT religious beliefs. Do I know how to use the UNIX text tools like sed, awk, xargs, find, cut, and so on? Yes. Do I know how to use regular expressions? Yes. Do I know how to use Perl and Python to manipulate text, and do poor-man’s extract-transform-load sorts of things? Absolutely.

It’s just that I rarely need such complicated tools in my daily work. I often just have a short list of something that I need to turn into a bunch of one-off commands. And many times I’m sharing it with others of varying proficiency, so readability is key. As it turns out, Excel has some very worthwhile text manipulation. Couple that with the ability to import CSV and autofill it’s a pretty decent solution. Let me give you some examples.

First, we need some text to manipulate. In cells A1 through D1 we have Goats, Sheep, Clowns, and Fire. Some people have Alice & Bob, I have goats & sheep.

Excel Text Example

First, we can concatenate strings very easily in Excel, as well as insert new strings. This is very handy for building commands you can then paste into a CLI, especially for doing one-off sorts of things. We do this with the ampersand, ‘&’.

=C1&” eat “&B1&” that are on “&D1

=”puppet cert sign “&A1&””

Excel Text Example

Oh, you’re doing something that needs the text in all upper- or lower-case? No problem. We have UPPER() and LOWER() functions. Suck it, /usr/bin/tr.

=UPPER(C1)&” eat “&LOWER(B1)&” that are on “&UPPER(D1)

Excel Text Example

Maybe we have a list and we need the first or last few characters from each. There’s LEFT() and RIGHT(), which will return a certain number of characters from those sides of the string.



Excel Text Example

Perhaps you have a list of domain names, and want to grab the first part. We can use FIND() with LEFT() and RIGHT(). We can add or subtract 1 to get what we want.



Excel Text Example

Maybe we need to do some autofilling, perhaps for a quick way to take some snapshots through VMware’s PowerCLI. I had the list on the left, then incorporated it into a larger command, dragging down to autofill all the names. Copy & paste that into a PowerCLI window and you’re set. Ad-hoc PowerCLI commands on small lists is actually my #1 use case.

=”New-Snapshot -Name Pre-Patch -VM “&A30&” -Confirm:$false”

Excel Text Example

Autofill automatically adjusts cell references, too, so if you specified A1 and dragged down it’ll use A2, A3, A4, and so on. If that’s not what you want you can preface parts of the reference with a dollar sign, ‘$’, to make it a static reference. I made it completely static with $A$1, but you can do $A1 or A$1, too.


Excel Text Example

Excel knows how to autofill just about anything ending in a number or a letter sequence. If it doesn’t catch on with one, try selecting two cells, then filling down. And if it really doesn’t catch on just insert a new column, autofill there, then concatenate that column with your others. In a pinch I’ve built BIND DNS zone files in Excel this way.

I think you get the idea. There’s a good reference in the Excel help, too – hit F1 and then search for “text functions.” The “Text Functions (reference)” result will show more commands, like LEN() for string length, MID() for getting substrings from the middle of a cell, SUBSTITUTE() for replacing text, and so on.

Next time you are tempted to assemble a list of commands by hand save yourself time, keystrokes, and potential errors by doing it in Excel instead!

Here’s my sample workbook, too, if you want to look at these examples yourself. Have fun!

Big Trouble in Little Changes

I was making a few changes today when I ran across this snippet of code. It bothers me.

/bin/mkdir /var/lib/docker
/bin/mount /dev/Volume00/docker_lv /var/lib/docker
echo "/dev/Volume00/docker_lv /var/lib/docker ext4 defaults 1 2" >> /etc/fstab

“Why does it bother you, Bob?” you might ask. “They’re just mounting a filesystem.”

My problem is that any change that affects booting is high risk, because fixing startup problems is a real pain. And until the system reboots the person who executes this won’t know that it works. If it doesn’t work it’ll stop during the boot, sitting there waiting for someone with a root password to come fix it. So you’ll have to get a console on the machine and dig up the root password. Then you need to type it in. If it’s anything like my root passwords it’s 20+ characters long and horrible to type, especially on crappy cloud console applets that tend to repeat characters because they’re written in Java by a high schooler on a reliable, near-zero latency network, twelve versions of Chrome ago.

Once you’re in you need to figure out what the problem is, and that’s an even bigger rub. It might be months or, God help you, years between when these commands run and when they get tested in a reboot. So there’s no correlation, and you’ll have no idea what the problem is aside from a filesystem issue. And all the while it’s burning up your maintenance window and your chance to do the maintenance you actually intended & scheduled, making you look bad.

But what if we just change it a little?

/bin/mkdir /var/lib/docker
echo "/dev/Volume00/docker_lv /var/lib/docker ext4 defaults 1 2" >> /etc/fstab
/bin/mount -a

Now, when it runs it’ll actually test the entry in /etc/fstab, and you’ll know right away if it’s wrong.

Slick, eh?

Are you properly assessing the risk of your changes? Anything that affects booting is high risk, in my opinion. Rebooting properly is the foundation of good patching practices, disaster recovery, automated deployments, and so on.

How do you know the change you’re making actually works? Not just because it worked on a test system, either. How do you know, without a doubt, that it works on each machine you changed?

Configuration management tools help immensely, too, but there’s no substitute for thinking critically about the change you’re making, big or seemingly small.

Interesting Dell iDRAC Tricks

Deploying a bunch of machines all at once? Know your way around for loops in shell scripts, or Excel enough to do some basic text functions & autofill? You, too, can set up a few hundred servers in one shot. Here’s some interesting things I’ve done in the recent past using the Dell iDRAC out-of-band hardware management controllers.

You need to install the racadm utility on your Windows or Linux host. I’ll leave this up to you, but you probably want to look in the Dell Downloads for your server, under “Systems Management.” I recently found it as “Dell OpenManage DRAC Tools, includes Racadm” in 32- and 64-bit flavors.

Basic Command

The basic racadm command I’ll represent with $racadm from now on is:

racadm -r -u iDRACuser -p password

Set a New Root Password

I don’t know how many times I see people with iDRACs on a network and the root password is still ‘calvin.’ If you do nothing else change that crap right away:

$racadm set iDRAC.Users.2.Password newpassword

The number ‘2’ indicates the user ID on the iDRAC. The root user is 2 by default.

If you have special characters in your password, and you should, you may need to escape them or put them in single quotes. You will want to test this on an iDRAC that has another admin user on it, or where you have console access or access through a blade chassis, for when you screw up the root password and lock yourself out. Not that I’ve ever done this, not even in the course of writing this post. Nope, not admitting anything.

Dump & Restore Machine Configurations

Once upon a time I embarked on a quest to configure a server solely with racadm ‘set’ commands. Want to know a secret? That was a complete waste of a few hours of my life. What I do now is take one server and run through all the BIOS, PERC, and iDRAC settings via the console and/or the web interface, then dump the configuration with a command:

$racadm get -t xml -f idrac-r730xd.xml

That’ll generate an XML file of all the settings, which you can then load back into the other servers with:

$racadm set -t xml -f idrac-r730xd.xml -b graceful -w 600

This tells it to gracefully shut the OS down, if there is one, before rebooting to reload the configurations. It also says to wait 600 seconds for the job to complete. The default is 300 seconds but with an OS shutdown, long reboot, memory check, etc. it gets tight. There are other reboot options, check out the help via:

$racadm help set

You can also edit the XML file to remove parts that you don’t want, such as when you want to preconfigure a new model of server with common iDRAC settings but do the BIOS & RAID configs on your own. That XML file will also give you clues to all the relevant configuration options, too, which you can then use via the normal iDRAC ‘get’ and ‘set’ methods.

Upload New SSL Certificates

I like knowing that the SSL certificates on my equipment aren’t the defaults (and I get tired of all the warnings). With access to a certificate authority you can issue some valid certs for your infrastructure. However, I don’t want to manage SSL certificates for hundreds of servers. Where I can I’ll get a wildcard certificate, or if that’s expensive or difficult I’ll abuse the Subject Alternate Name (SAN) features of SSL certificates to generate one with all my iDRAC names in it. Then I can upload new keys and certificates, and reset the iDRAC to make it effective:

$racadm sslkeyupload -t 1 -f idrac.key
$racadm sslcertupload –t 1 -f idrac.cer
$racadm racreset

Ta-dum, green valid certificates for a few years with only a bit of work. If you don’t have your own CA it’s probably worth creating one. You can load the CA certificate as a trusted root into your desktop OS and make the warnings go away, and you know that your SSL certs aren’t the vendor defaults. What’s the point of crypto when everybody has the same key as you?

There are lots of cool things you can do with the iDRAC, so if you’re doing something manually via the console or iDRAC web interface you might think about looking it up in the Dell iDRAC RACADM Command Line Reference first.

%d bloggers like this: