This is post #7 in my December 2013 series about Linux Virtual Machine Performance Tuning. For more, please see the tag “Linux VM Performance Tuning.”
It is universally true that over time a thin-provisioned virtual machine will slowly expand on disk. This happens because of the way files are deleted from disks. When files are written to disk the storage subsystem writes those blocks out, and the thinly-provisioned disk file grows a bit. But when a filesystem deletes a file all it does is update its own internal storage map to “forget” about the file. Nothing ever removes the data from the disk itself (which is why undelete utilities work, and why there are stringent procedures for securely erasing a disk). As such, a virtual disk file never shrinks again once it has grown through writes.
For many servers the biggest source of new data on disk is logs. They’re written to regularly, moved around automatically, compressed, and rotated. Many system administrators like to keep many weeks of logs on disk, in case they need them. The thing is, what will they need them for? If it’s a security problem the logs that were stored on the server itself cannot be trusted, since the attacker may have been able to alter them. If it’s for performance analysis that can be done elsewhere. Logs eat disk space directly on a server, they consume space in backup systems, they waste network bandwidth as they are replicated for DR, and they waste CPU as they are compressed and rotated. The variable nature of logs can also be a liability as a denial-of-service, either intentionally or not. A proverbial “Slashdotting” comes along, generates gigabytes of new log data, fills your disk, kills your service, and causes pagers to go off at 2 AM. Arggghhh. On top of all that, they might be a liability to some organizations, as a log is a document that can be subpoenaed.
Do you really need the logs you’re keeping?
It’s actually pretty enlightening when you start looking at data you’re keeping and asking “why am I keeping this?” Why are you keeping 52 weeks of /var/log/secure on each VM when that log data cannot be trusted after a compromise? Why are you keeping three years of web logs when your web folks use Google Analytics anyhow?
Just in case? Just in case of what? This isn’t just a matter of saving some logs because we can, you’re spending real money to do so, in the form of disk space, disk performance, network bandwidth, and lower consolidation ratios.
Can we keep logs somewhere else?
A central syslog server is a best practice for IT shops, usually for security reasons. Now that the world is virtualized, it’s a best practice for operational efficiency, too. If you don’t have a central log server you should set one up and start shipping logs to it instead of keeping them distributed on your VMs. There are also some interesting new ways to add visibility into your log data. A great open source project for log analysis and “operational intelligence” is Logstash. There are also commercial tools like Splunk and VMware Log Insight that are also highly regarded.
So in short: don’t keep any log data you don’t have to, don’t keep very much log data on your individual VMs, and what you’re keeping long-term should be kept in Logstash or a system designed for storing and analyzing the data.
Image of a logging truck colliding with a bush taxi, which seemed oddly appropriate, © 2007 Amcaja, licensed as CC-BY-SA 3.0, provided via the Wikimedia Commons.