By default, Linux distributions ship with a number of system maintenance tasks. On Red Hat Enterprise Linux (and CentOS, and OEL, etc.) they are scheduled via shell scripts in the /etc/cron.* directories, and executed by anacron.
The problem is, there’s usually a default time they are executed, like 0400. And when you have 300 RHEL virtual machines all rotating their logs at 0400 you start seeing storage and CPU performance problems, as copies are made and logs are compressed. This can be true of hosts attached to SAN/centralized storage, too.
If we ignore backup windows, my RHEL 5 hosts had three main offenders:
/etc/cron.daily/logrotate: kicks off /usr/sbin/logrotate to trim and compress log files in /var/log.
/etc/cron.daily/mlocate.cron: part of the mlocate package, which indexes all of the files on a host so you can type “locate searchstring” and find all the files on the host with “searchstring” in the name.
/etc/cron.daily/tmpwatch: kicks off /usr/sbin/tmpwatch, which helps keep /tmp free and clear of cruft. Some hosts end up having quite a bit of stuff handled by tmpwatch, due to application admin decisions (session data, temp logs, all sorts of things). That’s fine, but I just need to compensate for it.
When you’re dealing with load issues you have a few things you can do. You could eliminate the work that needs to be done. You could make the process to do the work more efficient. You can put limits on the workload to cap the resource consumption. Or you could postpone the work to a more opportune time.
For logrotate I did two things. First, we started shipping our logs off the VMs to a remote syslog host, which eliminated the need to keep a lot of logs around locally. This saved space and the need to compress & move a lot of data. Second, it turned out that nobody who relies on logrotate actually really cares when it runs, just so long as it keeps the file sizes manageable. So I amended /etc/cron.daily/logrotate to include the following at the top:
# Delay execution for as much as 32768/5 seconds
That delays the script by up to 2 hours. In bash, $RANDOM will give you a number between 0 and 32768, so to determine the divisor you need you divide 32768 by the max number of seconds you want to delay. 32768/7200 = 4.55, which I rounded up to 5.
For mlocate, I proposed removing the package to my team (you can use /usr/bin/find, after all) but it turns out some people really, really like using it for finding files rapidly (find can take a while, especially on a giant file share). A compromise was to move /etc/cron.daily/mlocate.cron to /etc/cron.weekly/mlocate.cron, so it runs once a week instead of daily. I also amended it to include a 3 hour random sleep, in the same fashion as with logrotate.
Last, for tmpwatch, I just added a 3 hour random sleep to it, too. Nobody except me cares when /tmp gets cleaned out, just so long as it does.
These changes have helped spread the load out in my environment, so I don’t get spikes at 0400 anymore, but a tolerable amount of steady load over the 0400-0700 window. Now, if I can just keep my backup windows spread out I’ll be set!
 Backups are probably the biggest offender in this category, because they tax storage and CPU (for compression). Getting them spread out is a big topic, depends on what you’re using to do backups, and worthy of another post by itself.
 Even the people who care about web logs don’t care about rotation time, because they’re either using Google Analytics or their log processing scripts automatically handle the time & date stuff for them. Everybody is happy as long as the report for yesterday runs before staff come in today.