“Words, like eyeglasses, blur everything that they do not make more clear.” – Joseph Joubert
I’ve been evaluating Nagios over the past couple of days to see if it can replace our aging Big Brother installation. I have about 130 hosts I’d like to monitor, and the other teams have another 200 hosts or so that will probably join me if things work out.
During this exercise I’ve realized a bad habit system administrators have: using abbreviations instead of descriptive text. This is the same bad habit that programmers strive to avoid with variable names. When you’re defining service tests in Nagios you have to give them a name. The first round of names I gave to the services were straight from the old system. These are labels like swap, disk, zombies, processes, and SSH. When I went to add a check for the TSM backup client process, dsmc, I started thinking that the service name “processes” is ambiguous. In fact, all of my names sucked. What does the SSH test do? What is the “disk” test actually testing? Processes? What about the processes? Is it testing to make sure all the right processes are running, that there are too many processes, that there are runaway processes? Why am I subjecting my coworkers to this ambiguity? Why can’t I name the process with something more descriptive?
So I did. I now have services titled “Backup Client,” “Disk Usage,” “Ping,” “Process Count,” “Swap Usage,” “System Load,” and “Zombie Processes.” Isn’t that just so much better? More words equals better understanding, or at least a shot at it.
Now that I’ve noticed this behaviour I also have started noticing all the other systems around me that use arbitrary, ambiguous words and numbers to define things. There are basically two different classes of offenders: abbreviators & number lovers. Abbreviators tell only part of the story, and leave the audience guessing. Number lovers obscure everything with a number. Our help desk’s support ticketing system uses numbers to define the severity of a problem. The three different severities are 1, 2, and 3. That’s just like asking someone to rate something on a scale of 1 to 10. Is 1 good or bad? It would be so much easier if the designers of that system used “SEVERE,” “NORMAL,” and “LOW.” It isn’t like the underlying code is much different if you use words instead of numbers. Heck, it’s probably more readable:
SELECT * FROM table WHERE (severity = '1')
SELECT * FROM table WHERE (severity = 'SEVERE')
Don’t give me the “we can change the definitions later” excuse, either. You aren’t ever going to change these definitions, and if you want to it’s just as hard to update a database to change all the 2’s to 3’s as it is to change all the NORMALs to WHATEVERs.
As best as I can figure, this is a bad habit left over from the 1950’s-1980’s, where computers didn’t have enough memory and storage to permit wanton use of letters, especially vowels. However, the year is 2006, systems are complex, our brains are full, and storage is cheap compared to mistakes made because of ambiguity. Give yourself and your coworkers a break, ditch the numbers, and use lots of words.