Which would you rather have: a document telling you how to start an application in 10 easy steps, or a script (shell, Perl, Makefile, etc.) that does it for you?
I’d pick the script:
- The script is self-documenting. You can look at it and see what it will do. If you need to troubleshoot something you can just run the commands yourself. If you need to change the documentation you just change the script.
- The script can help ensure that the environment is correct for the application. Do you need to set environment variables, like JAVA_HOME, ORACLE_HOME, etc.? Just do it at the top of the script.
- You can call the script at boot, and have the application start automatically. Patching and rebooting a whole bunch of servers? Having the applications come up on their own is a godsend. Plus, if you put it in rc.local and not in the init.d system you can easily comment it out if you don’t want the application to start at boot. (Update: there is some discussion about this in the comments)
- Anybody can run the script without knowing anything about the application. You’re on vacation? Your application died? No problem! Another admin stepped in and restarted the application without knowing anything about it, thanks to your script. This is especially easy for someone else if you always name your scripts the same way, like START.sh and STOP.sh, or if you call the startup scripts from rc.local. They can just look there to see how it is done (or worst case, reboot the server). (Update: there is some discussion about this in the comments)
- You can call the scripts from cron or a monitoring system. Sometimes it is nice to automatically restart an application at a certain time, or when a problem is detected. If you already have a startup script then you don’t have to write one. 🙂
- The script avoids typos caused by poor cut & paste, line wrapping, inability to comprehend what the documentation says, etc. It also avoids dependency loops, like where the documentation is in a wiki which depends on a database, and the instructions for starting the database are in the wiki…
- Written documentation is the last thing to be tended to in most organizations, and even if it is up to date it may not be well tested (by someone other than the original author). However, if your document is the script you use to start the application it won’t be as easy to get away with not having it updated. And you’ll know it works.
Downsides? Yeah, there are a couple. First, to read this form of documentation you need to understand the scripting language it is written in. You should pick a language that everybody on your team knows, and comment things well. A little peer review at a team meeting can help, too, by explaining what it is you’ve automated/scripted and how it works under the hood.
Second, management types often don’t understand that scripts of this nature can serve as your documentation. So they give you a hard time for appearing to not have any documentation at all. When faced with this problem I’ve dealt with it by mentioning that in addition to English, most of my teammates understand various programming languages, too. We’ve just chosen to write some of our documentation in that form, those languages. Then we get the benefit of being able to run our documentation, which saves us a lot of time and hassle. Which, in turn, makes us more productive.
🙂
Ouch, I love your site but #3 and 4 are a baaaad case of reinventing the wheel. Your distro already has an init script system. It already has a set of prewritten functions for common tasks like managing your PATH, checking for pids, and starting and stopping things in the right order. All of which are things you’re going to find yourself eventually one-off hacking into your scripts. Even if you rarely do, you’re still creating two separate places for your coworkers to check for “how do I start/restart this service” and two different places for “how do I enable/disable this service on boot.” Don’t fight the distro.
I am a big fan of not repeating things that have been invented already, and in that vein you have good points. Most of our start/stop scripts are actually copies of other init-style start/stop scripts, so there’s a little duplication saved there.
I like having a separate script that I wrote because I know how it works, and it also means I can manage the dependencies better. Plus, the init system’s priority scheme only affects start order, not start dependencies, so if I want to ensure the database is up before I start Tomcat I have to script that myself, anyhow.
With a separate script it is also very easy for an admin who isn’t familiar with the machine to figure out what comprises the application on the machine, versus what is something delivered with the system. My team has taken the approach that if it comes with the distribution it starts out of the init system. If it is something we put on the machine it starts from rc.local. Yeah, another place to check, but it actually saves time over having to go out to another piece of documentation and look up what services are important on a specific machine.
I think the key, as with most things, is to be consistent.
Oh, and I’m glad that you call me out on things you think I’m wrong about. 🙂 Don’t ever stop!
Yea its a subtle enough difference that consistency is more important. What is your “rebuild to a known state” strategy if not distro-package-manager based? Full system backup? We cherrypick key directories in /var for actual backups, and expect the rest of the system to be rebuildable from our kickstart/yum/cfengine setup.
Same sort of thing. We back the entire machine up, but in a rebuild situation we’ll reinstall from scratch using our Kickstart system and then restore the files and filesystems we need. We’ve only had to do this twice, ever. Once was a total screwup involving RAID controller software (the fellow repairing the mirrored drive set somehow chose to mirror the disk he just replaced on to the one that had the data). The other was a RAID 5 dual drive failure over a weekend. Only time I’ve ever seen that, too.
Once in a while if we are changing hardware we’ll use mkcdrec or TrueImage to image it and drop it on the new box, but usually we take the opportunity to rebuild so we de-cruft the box and make sure our documentation is up to date. We normally don’t bother with making routine images of machines, though.