Bad Day For People Who Actually Patch

Let’s just say that if you’re running VMware Virtual Infrastructure 3.5 Update 2 you probably can’t power your VMs on anymore. DOH. Unfortunately, that’s me. I updated everything on Sunday after testing for two weeks, and I can’t even imagine how I’d test for this. The whole idea of patching sucks. There are always bugs, and you always trade one set of bugs for another when you upgrade. Of course, you use testing to try to figure out if there are more bugs or less, but things like this always show up. I’ve been meaning to write a longer post about patching, especially in the wake of this DNS debacle, but Michael Janke’s post “Patch Now – What Does It …

Read More

Bandwidth of the USPS

Matt’s post over at Standalone Sysadmin about flash drives as archival media made me remember conversations I used to have with coworkers about the bandwidth of the U.S. Postal Service, a colleague’s pickup truck loaded with tapes, etc. Sometimes the fastest way to get data to a location is to mail it, even now. The late Jim Gray had a fantastic interview in ACM Queue back in 2003 where he talked about disk access times vs. capacity vs. Moore’s Law, and especially how he was mailing computers and disks to people. His price comparisons are a little dated now, but the rest is a good use of ten minutes, if you ask me. (and you didn’t, I know). 🙂

Dude, That's 134 Years

I have a hunch that the hardware clock is off on this host… …but by 134 years into the future? I didn’t actually get to see what it was set to, as the machine fixed itself via NTP shortly afterwards. I suspect the last mount time is some sort of unsigned integer that overflowed. Of course, on the subsequent reboot it needed to check the filesystems yet again, putting the last checked time back to normal.

Should vs. Going To

“Going to” means you know. ‘Should’ means you know nothing. “Those servers should come up cleanly after a reboot.” “That storage array upgrade should not cause an outage.” “The customer should be fine with this.” Right. If you can’t say “going to” then you need to do more work. Update: if you think I’m wrong don’t take it personally, join the comments where the beatdown is already happening. Please, no nails in the 2x4s, though. 🙂

Perceived Productivity

“What, you just sit around all day browsing Wikipedia?” “Excuse me?” “What are you looking at in Wikipedia?” “The article on X-Men.” “Tough day at work, I suppose.” “Um, I’m trying to figure out a naming scheme for the 10 new servers I’m bringing in. That okay with you?” “Oh, sorry.” Just because you think I’m not doing work doesn’t mean you’re right. (also, great site for naming schemes: namingschemes.com)

Cloud Computing

My friend Terry’s slightly unorthodox take on cloud computing: To hell with cloud computing. Clouds are puffy crap that float lazily by. Is that what you want out of your service provider? Just floating by without a care in the world? It is time for tornado computing. Or hurricane computing. Real wrath of God type stuff. I want an architecture that knocks me off my feet, whips my apps around and hurls them half way through a tree. I don’t want my data intact for some script kiddie to steal. I want it like a frog in a blender; unrecognizably processed with a taste only I care for. So to that end I am setting half of my air handlers …

Read More

Your Sysadmin Should Know Why Backups Are Good

You know, if you’re a system administrator there are a few things you should know (and probably do). One of those things is why you should have backups. If you can’t figure out why perhaps you should find a different profession. Seriously. I’m fine if you don’t keep backups because you’ve thought about it and you are taking a calculated risk. However, having to explain why backups are valuable to someone who, until this moment, I considered a peer is ridiculous. It’s like having to explain what DNS does to someone who calls themselves a network administrator. I’ve done that, too.

What's a Good Workflow/Request Tool?

Dear readers, You folks are full of good ideas, so here’s my latest question. I’m rethinking workflow for my group of 20+ admins, so the customers we interact with have a nice single point of contact and the admins have a good idea of what’s in the queue for work. I’m looking for tools to help us. How we’ve lived this long without something to help us is a real wonder. The tool needs to be able to accept email and web-based requests. It would be nice if it could have some logic in it so that the customer could help direct who gets the request by choosing the OS and (perceived) priority. It should be fairly lightweight overall. I …

Read More

Accountability and Signatures

One of my favorite tricks lately to make people understand how serious I am about things is to get them to sign a form. You want to run your server without backups? I don’t recommend it at all, but I’ll do whatever you say. Just sign this form acknowledging that you know the risks, you know you could lose all your data at any time for any reason (including things I might do), and regardless of cause you don’t hold me accountable for anything. You want to let your employee take a machine out of the building without following our procedures for wiping the drives? We have a policy against that and it’s a terrible idea, but no big deal. …

Read More

Building NRPE on Solaris 10 with SSL Support

Solaris 10 ships OpenSSL as part of the OS distribution, in /usr/sfw. It appears that they have removed some of the ciphers in order to be compliant with export restrictions. Unfortunately, that throws a wrench in things when you want your Solaris Nagios server to use the Nagios Remote Plugin Executor (NRPE) to securely talk to other hosts. In my case, my Nagios server is a Sun T2000 and I’m referring to NRPE version 2.12. Newer versions may fix these issues. First, I built NRPE 2.12 with: ./configure –with-ssl-lib=/usr/sfw/lib –with-ssl-inc=/usr/sfw/include –with-ssl=/usr/sfw –prefix=/opt/whatever Once that was done the error I was getting on the target Linux host (in /var/log/messages) was the ultra-informative: Error: Could not complete SSL handshake. 5 I checked …

Read More