I am just about finished with my VMware Virtual Infrastructure 3 upgrades, and I have made a few observations about the process based on what happened to our clusters.
1. Don’t do an in-place upgrade.
Sure, it might work, but it seems unnecessarily complex. What I did was remove a server from a cluster and use it to start a whole new VI3 cluster, including its own VMFS 3 storage. Then I’d scp the disk images from the ESX 2.5.3 cluster to the ESX 3 machine. When I freed enough capacity in the old cluster I’d remove another machine, reinstall it, and tack it into the new VI3 cluster. This plan worked great. It had several benefits, including being able to just restart the old VM if something went wrong, being able to take VM outages that were more convenient to my customers, and having an opportunity to reassess resource allocation for each VM (why did we give this VM 2 GB of RAM when it needs less than 512 MB?).
A definite down side to this plan was the time it took to copy the disk files, as I was running about 30 minutes per 10 GB of allocated disk. Second, the copies are hard on the service console, so you can only do a couple at a time. Last, you need a lot more disk space available to build a whole separate cluster. Big organizations can handle that but smaller ones will have problems. It looks as if 3.0.1 will have a new utility to ease some of this pain, though. I still like the idea of starting fresh, though.
2. Ensure that your licenses are all figured out beforehand.
We didn’t receive some of our VI3 licenses, and nobody at VMware could figure out why. As a temporary fix my sales rep got me some demo licenses to augment the ones we did have, but it took longer than the permitted 30 days to figure out what was wrong. We got to see what happens when your licenses expire (just don’t shut your VMs off!). 🙂 It took about three weeks to find someone at VMware who saw that we had three support contracts, not two, and that the one nobody else noticed was expired and had a bogus license administrator person, which meant no renewal notification. Oops.
3. Redeem your licenses in sets of two or four.
If you have 40 CPUs worth of licenses you can redeem them however you want. I just suggest doing it in twos or fours, though. If you ever want to split those licenses up later or make them host-based it’s a lot easier to do so with ten 4 CPU licenses than one 40 CPU license (namely you can do it yourself, rather than having to involve VMware sales support). Unfortunately it also makes it a huge pain to figure out what licenses are what in the VMware redemption center site, so get a spreadsheet.
4. Upgrade your VMware Tools for Windows (to 2.5.3) before you migrate.
If you don’t keep your VMware Tools up to date you’ll want to make sure that you get to version 2.5.3 on your Windows VMs before you migrate. Older versions had all manner of problems with uninstallation. Damian Murdoch over at ozvms.com has a good blurb about it already, so I won’t repeat him. Rather than all the hassle after the move I just chose to update them before the migration.
My Windows VM migration procedure for machines with old tools is: update VMware Tools, shutdown, scp to new cluster, power on, uninstall VMware Tools 2.5.3, reboot, install VMware Tools 3.0 while ignoring Windows detecting new hardware, reboot. It seems like it’s more of a mess than it really is.
5. Have a solid schedule for migrations.
My VI3 upgrade got stalled 4/5’s of the way through as we ended up with VMs whose customers were unwilling to take an outage to move, even after they’d agreed to let us use their maintenance windows. The result was that there was one ESX 2.5.3 machine from each cluster hanging around with the stragglers, while all the other ESX Servers of those clusters had been migrated to VI3. With a single ESX server you can’t VMotion in case of a problem, and it just generally sucks to have a one-off machine out there. My suggestion is to make a definite schedule for what’s moving when, and don’t leave the hard stuff until the end or it’ll drag on forever.
I like VI3 so far. I am waiting for a number of odd things to be sorted out in 3.0.1, though. 🙂
Lessons I’ve learned with VMWare:
1) If you own a datacenter, you must have it.
2) It’s the only true DR solution I’ve seen for x86.
3) It costs loads of money to implement before you save money. We’ve began measuring ESX servers and parts as cars. “ESX1 in a Lexus.” “That SAN equipment is a 911 Turbo.”
4) It will be an adventure. VMWare support isn’t quite at an enterprise level. (yet) When you have an outage, it will be a big one.