Preparing Linux Template VMs

by Bob Plankers on March 26, 2013 · 23 comments

in Best Practices,System Administration,Virtualization

Dan over at Bashing Linux has a good post on what he does to prep his template VMs for use with Puppet. He’s inspired me to share how I prepare my Linux VMs to become a template. He’s got a few steps I don’t have, mainly to prep for Puppet, and I have a few steps he doesn’t have. One big difference is that I don’t prepare my template images for a particular configuration management system, but instead bootstrap them once they’re deployed. Why? I use my templates for a variety of things, and sometimes the people who end up with the VMs don’t want my management systems on them. It also means I have to handle some of what he does in his prep script via the configuration management system, but that’s just fine. I’d actually rather do it that way because it helps me guarantee the state of the system. Not saying he’s wrong, he’s got different problems to solve than I do.

You can do this in full multiuser — runlevel 3 — or in single-user by issuing an “init 1″ and waiting for all the processes to stop. I wouldn’t do any of this in runlevel 5, with full X Windows running. In fact, I really don’t suggest installing X Windows at all on VMs unless you really, really need it for some reason… but that’s a whole different topic. I’d also suggest taking a snapshot of your template prior to trying any of this out. As Lenin said, “Trust, but verify.”

Step 1: Clean out yum.

/usr/bin/yum clean all

Yum keeps a cache in /var/cache/yum that can grow quite large, especially after applying patches to the template. For example, the host where my blog resides has 275 MB of stuff in yum’s cache right now, just from a few months of incremental patching. In the interest of keeping my template as small as possible I wipe this.

Step 2: Force the logs to rotate.

/usr/sbin/logrotate –f /etc/logrotate.conf
/bin/rm –f /var/log/*-???????? /var/log/*.gz

Starting fresh with the logs is nice. It means that you don’t have old, irrelevant log data on all your cloned VMs, and it also means that your template image is smaller. Change out the “rm” command for one that matches whatever your logrotate renames files as. Also, if you get really, really bored it’s fun to look at the old log data people leave on virtual appliances. Lots of leaked information there.

Step 3: Clear the audit log & wtmp.

/bin/cat /dev/null > /var/log/audit/audit.log
/bin/cat /dev/null > /var/log/wtmp

Again, might as well clear the audit & login logs. This whole /dev/null business is also a trick that lets you clear a file without restarting the process associated with it, useful in many more situations than just template-building.

Step 4: Remove the udev persistent device rules.

/bin/rm -f /etc/udev/rules.d/70*

I have a whole post on this, “Why Does My Linux VM’s Virtual NIC Show Up as eth1?” This is how I’ve chosen to deal with the problem.

Step 5: Remove the traces of the template MAC address and UUIDs.

/bin/sed -i ‘/^\(HWADDR\|UUID\)=/d’ /etc/sysconfig/network-scripts/ifcfg-eth0

This is a corollary to step 4, just removing unique identifiers from the template so the cloned VM gets its own. Thanks to Ed in the comments for the reminder about sed. You can also change the “-i” to “-i.bak” if you wished to keep a backup copy of the file.

Step 6: Clean /tmp out.

/bin/rm –rf /tmp/*
/bin/rm –rf /var/tmp/*

Under normal, non-template circumstances you really don’t ever want to run rm on /tmp like this. Use tmpwatch or any manner of safer ways to do this, since there are attacks people can use by leaving symlinks and whatnot in /tmp that rm might traverse (“whoops, I don’t have an /etc/passwd anymore!”). Plus, users and processes might actually be using /tmp, and it’s impolite to delete their files. However, this is your template image, and if there are people attacking your template you should reconsider how you’re doing business. Really.

Step 7: Remove the SSH host keys.

/bin/rm –f /etc/ssh/*key*

If you don’t do this all your VMs will have all the same keys, which has negative security implications. It’s also annoying to fix later when you’ve realized you’ve deployed a couple of years of VMs and forgot to do this in your prep script. Not that I would know anything about that. Nope.

Step 8: Remove the root user’s shell history

/bin/rm -f ~root/.bash_history
unset HISTFILE

This good idea is courtesy of Jonathan Barber, from the comments below. No sense in keeping this history around, it’s irrelevant to the cloned VM.

Step 9: Zero out all free space, then use storage vMotion to re-thin the VM.

#!/bin/sh

# Determine the version of RHEL
COND=`grep -i Taroon /etc/redhat-release`
if [ "$COND" = "" ]; then
        export PREFIX="/usr/sbin"
else
        export PREFIX="/sbin"
fi

FileSystem=`grep ext /etc/mtab| awk -F" " '{ print $2 }'`

for i in $FileSystem
do
        echo $i
        number=`df -B 512 $i | awk -F" " '{print $3}' | grep -v Used`
        echo $number
        percent=$(echo "scale=0; $number * 98 / 100" | bc )
        echo $percent
        dd count=`echo $percent` if=/dev/zero of=`echo $i`/zf
        /bin/sync
        sleep 15
        rm -f $i/zf
done

VolumeGroup=`$PREFIX/vgdisplay | grep Name | awk -F" " '{ print $3 }'`

for j in $VolumeGroup
do
        echo $j
        $PREFIX/lvcreate -l `$PREFIX/vgdisplay $j | grep Free | awk -F" " '{ print $5 }'` -n zero $j
        if [ -a /dev/$j/zero ]; then
                cat /dev/zero > /dev/$j/zero
                /bin/sync
                sleep 15
                $PREFIX/lvremove -f /dev/$j/zero
        fi
done

This script is partly ripped off from someone on the Internet who didn’t have a copyright note in their work (and we’ve lost track of the source – if it’s yours leave me a comment), and partly the work of my team. It basically fills each filesystem to 98% of full with the output of /dev/zero, as well as creating a logical volume to zero out the unused space in the volume groups. Why do this? Well, if you storage vMotion the template VM to another array, or to another datastore on an array without VAAI, and you specify thin provisioning, the software datamover will suck all the zeroes back out of the image, and it’ll be as small as possible. Keep in mind you can’t do this within an array using VAAI, because under VAAI the array does the copying, and the zero-sucking magic is only in the software datamover at the ESXi level. Just move it to a local disk and back to your array if that’s the case. This is also cool if you have storage that deduplicates, too, like NetApp arrays.

Why only to 98%? That way you can run it on operational VMs and it lessens the chance of causing something to crash because you filled the filesystem. :) On the templates you can probably push it to 100%, just adjust the math in bc.

Keep in mind that by writing zeroes to the free space you effectively un-thin the disks, so make sure you have enough space available in your datastore.

So that’s my prep routine. It relies heavily on keeping the rest of the VM clean, and only cleans up what we can’t avoid sullying. What else am I missing here? Leave me a comment!

{ 23 comments }

alpacapowered March 26, 2013 at 2:18 AM

Nice post, I already do most of the stuff you suggest but totally forgot about the SSH keys. Not so good, so thanks for the hint.

One more thing I do is setting the root password age to 0 (chage -d 0 root), so the owner is forced to change the default password after deployment.

Razique March 26, 2013 at 5:03 AM

Great reminder thank you. Same, I should remove the SSH keys, so the MAC addresses :)
You could add bash history and remove anything you might have downloaded under /root for instance :)

Dan Fruehauf March 26, 2013 at 5:17 AM

Wow!

This post is like 10 times better than mine!!! Definitely adopted a few (if not all) things here!!

*hides in the corner*

Bob Plankers March 26, 2013 at 8:33 AM

The point is you’re out there writing stuff down. If you hadn’t I’d never have thought to post mine. Don’t feel too bad, I’m getting schooled in the comments here, which is awesome. :) It isn’t that I don’t know what I’m doing, just that there’s guys out there that are better at some of it than I am. Glad I’m finding them!

Jonathan Barber March 26, 2013 at 7:56 AM

Just a couple of suggestions for incremental improvements.

Under bash you can truncate files without invoking the “cat /dev/null > …”, you simply do “> …” instead. Less useful in a script, but quite handy when you’re monkeying on the keyboard.

I’d suggest changing your 0′ing script to use the “mktemp” program instead of always using the file “zf” file. Just to make sure that you don’t overwrite any files that might have been created called ‘zf’.

You shouldn’t need to call sync or sleep before removing the LV (although it doesn’t hurt). This is because sync flushes the file system buffers, but because you’re writing directly to a block device and not the file system, it doesn’t do anything. But you probably should add a sync after you do the DD to the file system, as it’s possible (although unlikely with a sufficiently large file) that the output to the ‘zf’ file won’t be make it to the block device! Alternatively, you can use the dd “conf=fdatasync” argument to imply this.

I’d also suggest setting the DD block size explicitly – the default is 512 bytes – and this may take a long time to write if you have a large block device or file system.

Finally, I’d also recommend that the root user’s ~/.bash_history is removed – this means that the future administrators can’t see the random flailing we all make as we try to remember what the syntax for a command is :),

Bob Plankers March 26, 2013 at 8:40 AM

Interestingly enough it used to use mktemp but with a static file name it was easier to have our monitoring system detect the file and not alarm on the transient disk full condition, until that file was more than 30 minutes since last access. A better way might be to write the name of the file to a temp file it tests for, but it was easy enough to do it this other way.

I suspect the spurious sync is from indiscriminately copying code blocks. :)

Good call on the shell history, I’ll add that.

Pete March 26, 2013 at 8:51 AM

Flushing out and/or setting the hosts file is worth including somewhere in the steps as well.

Bob Plankers March 26, 2013 at 9:30 AM

I do this in my configuration management system, but if you aren’t running one of those this is a good time to do that.

Dan Young March 26, 2013 at 8:55 AM

The ever-versatile libvirt tools include virt-sysprep, designed for performing these sorts of tasks. Even if you don’t use it, the list of things it does might be a useful reference:

http://libguestfs.org/virt-sysprep.1.html

Bob Plankers March 26, 2013 at 9:40 AM

True — the problem is that the RHEL-esque distributions I run have a libguestfs from the beginning of time, which doesn’t include that. Given that virt-sysprep is basically a comprehensive version of all of this it might be worth building on its own… Hmm…

Bob Plankers March 26, 2013 at 10:01 AM

I wish people would write system apps in plain old C or C++, and not OCaml and other languages that have a million dependencies. I don’t think it’s worth it, and I’m not 100% sure it’d be worth all the extra crap installed on my base image to run it if it was easy to install. I think maybe I’ll just rip the prep ideas off into a script that uses things already in place on the box.

Dan Fruehauf March 26, 2013 at 6:43 PM

I tend to agree with that. Looks like also that virt-sysprep needs the guest to be shut down before running (correct me if I’m wrong), however usually in cloud hosting you can’t really access easily these images after a machine was shut down. And as Bob said, Keep these apps simple, it’s probably worth a nice bash script but definitely not a huge application. We can count these steps on one hand more or less, can’t we?

Ed March 26, 2013 at 9:37 AM

I’ve gotten really friendly with sed of late, mostly because of the -i flag which makes changes in place. There is also an optional .suffix to keep a backup copy of the original.

sed -i.bak ‘/^\(HWADDR\|UUID\)=/d’ /etc/sysconfig/network-scripts/ifcfg-eth0

Bob Plankers March 26, 2013 at 9:43 AM

Good call — updated the entry. Thanks!

Ed March 26, 2013 at 10:08 AM

The .bak wouldn’t be appropriate in this instance. I only included it as an example.

Sorry I didn’t make that clear.

Bob Plankers March 26, 2013 at 10:15 AM

Nah, I’m slow sometimes. :) Thanks Ed!

(appropriate responses include “Sometimes?”) :)

ABR March 26, 2013 at 2:24 PM

Great post, however there are a few basic errors in the disk zeroing script. For example you want $4 (Available) not $3 (Used) for determining how much to zero. PREFIX is not set. And the case where there is no free space in LVM just generates an error.

Bob Plankers March 28, 2013 at 1:46 PM

Negative on the $4 versus $3 — the way the df output happens the column says “Used” but it’s the available blocks as a number.

I did correct the PREFIX problem, my cut & paste omitted the block that figured out which RHEL version we’re running on and adapted.

Jorge Fábregas March 28, 2013 at 12:49 PM

Thanks for putting this together!

An extra step is need in Step 8, right after the rm command. You’ll need:

unset HISTFILE

…otherwise, when you log out, the .bash_history file gets recreated whith your last session’s commands.

Bob Plankers March 28, 2013 at 1:38 PM

Ooh, good call.

Jorge Fábregas March 28, 2013 at 1:31 PM

Hello again,

I’m wondering…for your vSphere environment, do you also install VMware Tools on your Linux templates ? or do you leave that as an after-deployment task? Do you install them at all?

Bob Plankers March 28, 2013 at 1:40 PM

In order to support customization during deployment you have to have the Tools on the templates. Once deployed I have a Chef recipe that brings them up to date.

I definitely recommend installing the Tools. At the very least it enables graceful shutdowns, which is useful in an emergency.

Jorge Fábregas March 31, 2013 at 11:38 AM

I agree with the VMware Tools installation. It’s just that, when doing a minimal RHEL/CentOS 6 installation, I hate to install its dependencies: perl, gcc, make, kernel-headers & kernel-devel. I find it overkill considering that the distros already have built-in support for the paravirtual adapters (PVSCSI, VMXNET3 etc).

Of course, the device drivers aren’t the only reason for installing the VMware Tools…but they were an important reason (before distros started shipping these modules).

Comments on this entry are closed.

Previous post:

Next post: