links for 2006-03-31 »
By Bob Plankers on Mar 31, 2006 in del.icio.us | 0 Comments
im in ur data centrz patchin ur serverz
By Bob Plankers on Mar 31, 2006 in del.icio.us | 0 Comments
By Bob Plankers on Mar 28, 2006 in Outright Rant, Site Administration | 0 Comments
Okay, you comment spamming clowns. Because of you I’ve set it to moderate any comment with more than zero URLs in it.
Grrrr.
By Bob Plankers on Mar 28, 2006 in General Rambling, Outright Rant | 1 Comment
“Hey Bob, can you talk to a customer about upgrading his backup client?”
We run Tivoli Storage Manager for a backup system, and we resell the service to customers. We are in the midst of upgrading to TSM 5.3, and we just informed all the customers that they need to be running a recent client. Recent, to us, means versions 5.2 or 5.3, which represent the last five years of TSM development. IBM doesn’t support older versions of the clients, so to ensure that we can resolve any problems that crop up we ask that people upgrade from time to time. Most of our customers just have us upgrade their clients for them as part of our contracts with them.
“Sure, I’ll call him.”
I call this guy back. He’s irate. I love inheriting angry customers.
“Why in hell do I have to upgrade my backup client?!?”
“Sir, what client version are you running?”
“I’m running version 5.1.”
“Oh, the upgrade is simple, just uninstall the old client and install the new version.”
“That isn’t the point. You people are making me upgrade constantly! I’m sick of it!”
“Upgrade constantly? When was the last time you needed to upgrade?”
“Six months ago you made me upgrade because you wouldn’t support my client version then.”
“What version was it?”
“3.7.”
“Um, dude, that was really old, like from 1997. And you upgraded to 5.1? Why didn’t you go to 5.2 or 5.3 and spare yourself the trouble?”
“Your upgrade instructions only covered version 5.1.”
“Sir, I wrote those instructions, and six months ago I know they covered 5.2.”
“I’m taking my business elsewhere! I don’t have to use your service! My time isn’t free, you know.”
(like mine is, you ass)
“How many machines do you have to upgrade?”
“One.”
“One? Oh, sorry, I thought we were talking about a couple hundred or something. It takes me less than an hour to upgrade the client on the 150 machines I support. Why don’t we do it right now and I’ll talk you through it? It’ll take about five minutes. Much less time than even patching your OS, and you’ve got the guy who wrote the documentation on the line.”
(I doubt this guy patches his OS, either. How dare they ship him buggy software!)
“I don’t think so! I am cancelling my service and just going to back up using tar to a friend’s workstation.”
Okay, pal. All that for 20 GB of data (I looked it up). YAWN. Besides, if you figure he didn’t upgrade his client for six years he’s averaging three years between upgrades. I’m glad he’s not anywhere near my data or systems.
By Bob Plankers on Mar 28, 2006 in del.icio.us | 0 Comments
By Bob Plankers on Mar 27, 2006 in System Administration | 3 Comments
There is definitely an art to writing scripts that don’t suck. It isn’t enough to just get the job done. It’s everything to make your script do the right thing, with messages, with errors, and over time (like years from when it was written).
1. Always use absolute paths for everything.
You cannot assume what your environment will be. You can’t. If your script executes via cron it’s likely it won’t have any of the environment variables you depend on. Including PATH. Or HOME. The working directory will probably not be what you expect, either, so don’t write to files in the directory you’re in without thinking about it. Especially if you’re running as root. You’ll actually have permission to write files as root, and all you’ll do is clutter up some filesystem and make me go looking for the problem later.
Usually when a developer complains to me that their script isn’t writing output a simple search (find or locate) will show their output in some completely unexpected location. Like the root filesystem. Oops.
2. Fix the environment in the script, not the default environment.
Think of OS defaults as an electronic Switzerland. There are so many competing interests that an OS has to satisfy that the defaults are generic and neutral. If your script needs environment variables, like database settings (ORACLE_HOME), put them in your script. Don’t ask your sysadmin to change the system defaults.
The reason I hate changing defaults is mainly because it’s inevitable that another script, running on the same machine, will want different defaults. Besides, if you want the defaults changed you’re probably making assumptions in the script, and I don’t like that.
3. If you don’t want to hard-code, use a configuration file.
You have four scripts that need to have the same environment? The same variables? Use a configuration file and read it in at the beginning of each.
This also gives you a chance to do cool things, like detecting the host you’re running on (/bin/hostname) and setting the variables properly for development, test, or production. That’ll make moving between environments easier.
While you’re at it, name the configuration files, and the variables inside, intelligently. Names like “config” and “config.pl” work nicely. Names like “appwebprd” are slightly more confusing.
4. Write errors to syslog.
The UNIX gods gave us a system logging service. It has its problems but it is well understood, and the logs it writes are usually rotated and handled properly by default. Many monitoring systems also watch the system’s log files, like /var/log/messages, and so things you send there will get handled.
Check out the ‘logger’ command if you want to do this from a shell script.
5. Write nothing to stdout or stderr unless you are debugging.
I don’t want to see the output of your script on the console of my server. Why? Because when I’m trying to work at the console of the server (fixing a problem) your script will write all over my terminal session. Icky! If I am not there then nobody sees the output. What good is that?
This also goes for programs you call in your script. Don’t let them write crap to stdout or stderr, either.
6. Don’t do error handling in loops.
Don’t send error email from inside a loop. Don’t write to syslog from inside a loop. Set a flag for the error and handle it once, at the end. If you want to do things differently while you’re developing, fine, but in production your error loops will fill mail spools and logs when something goes wrong. When that happens I suddenly have three or four problems, not just one.
7. Have a debug mode that is not the standard operating mode of your script.
I’m not talking about for your development environment, either. In production, if your script is malfunctioning, I’d like to be able to run it in debug mode and get useful output to see where the problem lies.
8. Throw useful errors when you choose to.
Filesystem full? Permission problem? Tell me where you were writing so I can fix it. Can’t connect to something? Tell me what it is you’re connecting to, hostname or IP and port.
Installation scripts are notorious for useless filesystem errors. I also noticed that Red Hat’s up2date script gives you information like “Requires 200 MB additional space” without telling you where it needs it. It’s only experience that tells me that it’s complaining about /var/spool, and a less experienced admin isn’t going to know that.
9. Become a daemon properly.
If your script is meant to run in the background, the “production” mode of the script should just put itself fully in the background, also known as “daemon” mode. Scripts that need to be explicitly backgrounded need more care and feeding, and if you do the right thing and add the few lines of code to become a daemon you score points with your admins.
This is also an opportunity for doing interesting things for debugging. Add a “foreground” mode that also turns on debugging, and you’ve dealt with two problems at once.
10. Write a PID file.
It’s so nice to be able to “kill `cat /var/run/yourprogram.pid`” than it is to cobble some killall or pkill command together. This should also serve as your lock file, so that two copies of the program don’t run at the same time unless you mean them to.
This also means it should delete the lock file when the script dies, so you’ll have to add a little bit of signal handling code. There are countless examples of this out there, all findable with Google.
11. Create temporary files in /tmp.
Many OSes have programs that clean /tmp automatically, so if your script leaves stuff lying around in /tmp the stuff will get cleaned up. Besides, that’s what /tmp is for. You might want to make this a variable in your configuration file, just so it’s easy to change someday.
12. Make a unique temporary file every time.
What if two copies of the program start? What if the file doesn’t get deleted properly? Use a function like mktemp() or program like /bin/mktemp.
13. Absolutely know what happens if a variable is empty.
This is especially important if you are deleting things. What happens if mktemp fails and all you have is an empty variable? I wasn’t thinking and wrote something like this recently:
$TEMPDIR = `/bin/mktemp -d`
…
rm -f $TEMPDIR/*
rmdir $TEMPDIR
Yeah… mktemp failed and my rm statement became “rm -f /*”. Great. And it was all because I was paranoid of using “rm -rf” in a script.
14. Do the right thing at system shutdown, and don’t require special shutdown procedures.
I alluded to this in #10 with the PID file cleanup. I absolutely hate rc.shutdown scripts because inevitably the script fails and then the system won’t shut down when I need it to. I also hate adding things to the rc.d directories because it’s one more thing to deal with. Generally I put scripts that need to start at boot in rc.local, and then let them die at system shutdown.
This means that if you’re a script and I’m your admin you need to catch a TERM signal, at least, and quickly do whatever you need to do before you die.
What am I missing here? Anything? These are all the annoyances I can recall from the past month or so, but maybe the developers I support have a limited repertoire of shenanigans to pull. :-)
By Bob Plankers on Mar 24, 2006 in Dear Vendor | 2 Comments
Dear IBM,
At SHARE a couple years ago you were presenting the new stuff going into Tivoli Storage Manager 5.3. A number of us ganged up on your staff afterwards and told you we need transport encryption. Not total encryption of our data, but just something like SSL so that we could move data on untrusted networks.
You asked why we couldn’t just encrypt all of the data, which is a feature you offer. We didn’t like that because there are a lot of other gotchas there. The biggest gotcha is when our customers forget their encryption key. Yeah, we know, they’re dumb, but it’s a real-world problem. When they forget their TSM node passwords we can just reset them. We can’t do that for the encryption key.
We talked, you listened, and you thought transport encryption, like SSL, was a swell idea.
It isn’t in TSM 5.3.
Could you add it? I still really need it, and I’m guessing that the ten other guys in the swarm after the presentation still need it, too.
Thanks.
…Bob
By Bob Plankers on Mar 24, 2006 in del.icio.us | 0 Comments