There’s been a lot of discussion and hand-wringing regarding the deprecation of VMware ESX in favor of ESXi. People are worried, the sky is falling, OMG OMG. In contrast, I just finished upgrading my three production clusters to vSphere 4.1 (from vSphere 4.0), and I converted everything to ESXi 4.1 in the process. It’s actually really easy and now I’m future-proofed. Here’s how I did it.
1. Upgrade vCenter Server to 4.1.
Frankly, the 64-bit vCenter jump is the most troublesome part of all of this for most people. The VMware vCenter Server Data Migration Tool may help if you’re using the SQL Express database, but I just did a database restore from a full backup file I made. I used this opportunity to jump to Microsoft Windows Server 2008 R2 and Microsoft SQL Server 2008. The vSphere 4.1 Upgrade Guide is your key to this, I followed the steps in chapter 5 to move my database.
I didn’t move my Update Manager database, as I’m just using it for host updates and it was just as easy to reinstall and start from scratch.
The upgrade testing took about 25 hours of my life. The actual upgrade took about 8 hours start to finish, including some problems I had when I ran out of disk space for the database and had to start over. If I did it again I’d consider following some of the KB articles on reducing your database size prior to the move.
2. Deploy the vSphere Management Assistant.
The vSphere Management Assistant, or vMA, is how you’ll use console tools like esxtop (now resxtop), esxcli, vmkfstools, etc. when the console itself is gone.
This doesn’t take much time, perhaps an hour in total to download the vMA and deploy it, and an hour or less to familiarize yourself with it.
3. Script or automate your host configuration.
I have vSphere Enterprise Plus licenses, so I can use Host Profiles to configure my physical machines. I took this opportunity to build a new host profile, and I did it in several iterations by setting up a test host identically to production, building a profile from it, importing that profile into my production vCenter, and then using the first host I rebuilt to tweak it.
Alternately, if you have a script based on esxcli console commands you can replicate it on the vMA. Most commands you used to use locally now just take –server, –username, and –password from the vMA. My suggestion is to build a test host and get the automated configuration figured out.
My KVM-over-IP system has virtual media capabilities, and I only had about 20 hosts to upgrade, so I just used the ESXi Installable ISO via the KVM system. I also used this opportunity to verify the HBA and BIOS settings on all the hosts, so I figured since I was doing console work anyhow the manual install wouldn’t take much more time. If I was doing it again I’d look at using either a bootable USB stick, as documented by Ivo Beerens, with a proper Kickstart file for automating the install, or one of the deployment appliances like the ESX Deployment Appliance or the Ultimate Deployment Appliance, both of which also help you automate the deployments, too.
I spent about two hours building a sample ESXi host configuration by hand, and perhaps another two hours tweaking the host profile when I found errors on the first build.
4. Repeat the following sequence for each host:
Enter Maintenance Mode -> Let DRS Clear the Host -> Remove Host -> Disable SAN -> Install ESXi -> Boot once fully before re-enabling SAN -> Configure basic networking and root password from the console -> Add to vCenter in a new, temporary cluster -> Enter Maintenance Mode -> Apply Host Profile -> Reboot -> Move host to production cluster -> Exit Maintenance Mode.
I removed the hosts, rather than reconnecting them, figuring that I didn’t care much about the historical host stats and would rather start fresh.
I always disable the SAN when I do upgrades/reinstalls, mainly out of paranoia that the installer will format a pre-existing VMFS volume with running VMs on it. There are a couple of ways to do this, including physically pulling the cables, setting the HBA to incorrect settings (e.g. loop only instead of point-to-point), or disabling the fibre channel switch ports outright. ESXi warns you that it may format some storage on boot (local filesystems, mainly, but I’m paranoid), so I didn’t reconnect the SAN until after the first full boot cycle.
I also used a temporary cluster in vCenter to re-add the host. When you add a new host it isn’t in maintenance mode, so I worry that DRS will move something there if the host isn’t fully configured, but looks that way to vCenter. I also didn’t want to disable DRS in my main cluster.
I was able to crank out a host upgrade in under an hour, which was mostly just sitting there watching status bars move across the screen while I did other things. For 20 hosts that’s about 20 hours, over the course of a week. Most importantly, I did all the ESXi installs during business hours.
I’ll estimate my total time spent was between 60 and 70 hours, over the course of about three weeks. Not bad for a major upgrade, and certainly not as bad as some people will tell you. In fact, most of my problems have stemmed from the 4.0 to 4.1 VMware Tools upgrade being buggy, rather than any sort of ESX to ESXi transition, or vCenter 4.1 problems.
Are you using vDS in your environment and if so at what point did you Upgrade the vDS to 4.1.0?
I am using vDS but I have not bumped it to 4.1.0 yet, nor have I enabled Storage I/O Control. I will likely do that soon.
Thanks for the write up. What kind of bugging problems did you have with the Tools upgrade?