By default, Linux distributions ship with a number of system maintenance tasks. On Red Hat Enterprise Linux (and CentOS, and OEL, etc.) they are scheduled via shell scripts in the /etc/cron.* directories, and executed by anacron.
The problem is, there’s usually a default time they are executed, like 0400. And when you have 300 RHEL virtual machines all rotating their logs at 0400 you start seeing storage and CPU performance problems, as copies are made and logs are compressed. This can be true of hosts attached to SAN/centralized storage, too.
If we ignore backup windows[0], my RHEL 5 hosts had three main offenders:
/etc/cron.daily/logrotate: kicks off /usr/sbin/logrotate to trim and compress log files in /var/log.
/etc/cron.daily/mlocate.cron: part of the mlocate package, which indexes all of the files on a host so you can type “locate searchstring” and find all the files on the host with “searchstring” in the name.
/etc/cron.daily/tmpwatch: kicks off /usr/sbin/tmpwatch, which helps keep /tmp free and clear of cruft. Some hosts end up having quite a bit of stuff handled by tmpwatch, due to application admin decisions (session data, temp logs, all sorts of things). That’s fine, but I just need to compensate for it.
When you’re dealing with load issues you have a few things you can do. You could eliminate the work that needs to be done. You could make the process to do the work more efficient. You can put limits on the workload to cap the resource consumption. Or you could postpone the work to a more opportune time.
For logrotate I did two things. First, we started shipping our logs off the VMs to a remote syslog host, which eliminated the need to keep a lot of logs around locally. This saved space and the need to compress & move a lot of data. Second, it turned out that nobody who relies on logrotate actually really cares when it runs[1], just so long as it keeps the file sizes manageable. So I amended /etc/cron.daily/logrotate to include the following at the top:
# Delay execution for as much as 32768/5 seconds
/bin/sleep $(($RANDOM/5))
That delays the script by up to 2 hours. In bash, $RANDOM will give you a number between 0 and 32768, so to determine the divisor you need you divide 32768 by the max number of seconds you want to delay. 32768/7200 = 4.55, which I rounded up to 5.
For mlocate, I proposed removing the package to my team (you can use /usr/bin/find, after all) but it turns out some people really, really like using it for finding files rapidly (find can take a while, especially on a giant file share). A compromise was to move /etc/cron.daily/mlocate.cron to /etc/cron.weekly/mlocate.cron, so it runs once a week instead of daily. I also amended it to include a 3 hour random sleep, in the same fashion as with logrotate.
Last, for tmpwatch, I just added a 3 hour random sleep to it, too. Nobody except me cares when /tmp gets cleaned out, just so long as it does.
These changes have helped spread the load out in my environment, so I don’t get spikes at 0400 anymore, but a tolerable amount of steady load over the 0400-0700 window. Now, if I can just keep my backup windows spread out I’ll be set!
————————–
[0] Backups are probably the biggest offender in this category, because they tax storage and CPU (for compression). Getting them spread out is a big topic, depends on what you’re using to do backups, and worthy of another post by itself.
[1] Even the people who care about web logs don’t care about rotation time, because they’re either using Google Analytics or their log processing scripts automatically handle the time & date stuff for them. Everybody is happy as long as the report for yesterday runs before staff come in today.
I will note that there is something to be said for a static splay time, like what cfengine[1] does as opposed to a random splay time like you describe.
We used to use a purely random splay time like you describe. The issue is (theoretically) you are doing different amount of work each time and you end up creating a non-deterministic system. (Our example of why we needed a splay time in the first place was rather interesting[2])
Something like a hash of the hostname[3] into a random-across-the-board-but-nonrandom-for-this-host time works more consistently, as then each host waits the same time each day. This will make each particular host more consistent.
Ultimately we use this to a great extent and get all the benefits of a random splay time with the consistency of knowing, for instance, that each daily log crunch will start 24 hours apart.
[1] http://cfengine.org
[2] The “back in the day” example was when we were, via cron, loading Perl over NFS and causing our 100mbs FDDI ring to flood every 15 minutes as each host was loading Perl at the exact same instant. Ultimately this was solved both with a splay as well as copying the Perl binary to the local machine. Yes, this was before Perl was shipped with every *nix variant.
[3] or IP address, or MAC address… anything host-specific really.
Jeff, I don’t disagree with you. I’ve just been looking for ways to do this without adding administrative overhead. I don’t want yet-another-thing-to-do when I add another VM. Non-deterministic works pretty well most of the time, but yes, it does sometimes lead to unreproducible problems, which is annoying.
Hadn’t thought of the hash of the hostname, that’s a good idea. I’m gonna fool around with that. At least that way it’d be sort of the same all the time.
BTW, this is why I blog about stuff — the discussion is priceless.
When dealing with locate and massive fileshares one solution is to run the db build on a single host and share it out. Most locate implementations will allow for a search path to find multiple dbs.
Really depends on whether you use locate enough to bother with building the extra infrastructure.
We toyed with the idea at a previous place of work and never got around to implementing it.
Another idea is to simply modify /usr/bin/run-parts, which is what calls each daily/weekly/etc script. That way it’s only one script you have to modify on each box.