My Thoughts on Upgrading to vSphere 5

by Bob Plankers on July 19, 2011 · 8 comments

in Virtualization

I’ve been thinking a lot lately about upgrading to vSphere 5, mainly the questions of when and how I’d like to get it done.

During the launch on July 12th there was a lot of talk about how many QA hours went into vSphere 5 (2 million+). That’s good news. We had some serious problems with vSphere 4 when we deployed it, bugs all over the place, vCenter crashing every couple of days, etc. VMware support wasn’t super helpful in fixing the problems because they didn’t have much experience, and they were unwilling or unable to get Engineering involved. As a result I took a lot of crap from my coworkers about my decision to upgrade things so quickly. To my defense some of those guys are the types that won’t upgrade unless they’re absolutely forced to (and then sometimes not, I mean, we did finally get rid of Token Ring from the data center last year…), but regardless I really hope that vSphere 5 has truly had much better testing. We have 8 times more VMs than the last time, and the stakes are higher. Heck, my C-level executives know the name “VMware” now — last time they didn’t.

Thinking back on my vSphere 4 experience, it went something like this:

  1. Install vSphere 4 from scratch in my test environment. I spent about a month messing with vSphere 4 in my test environment. Looked great, and I was excited about the new features. I did two rounds of testing, one where I installed from scratch and one where I did a VI 3.5 to vSphere 4 upgrade. Looking back on it now my testing strategy’s main fault was that it was mostly based on clean copies of everything, from ESX to the VMs themselves, which led to a very ideal test situation. Nowadays I have a pretty decent & formal list of things to test, and I use cloned & fenced copies of real VMs to do some of it.
  2. Build a new server with Windows 2008 Standard, 64-bit, and SQL Server 2008 Standard, 64-bit. Since I needed a new server anyhow I decided we’d do the latest stuff, and vCenter 4 supported 64-bit OSes. Turned out to be a mixed blessing, as it made 4.1 an easy upgrade later. There were also some terrible bugs with 64-bit environments, and some serious kludges to make 32-bit stuff work. If I did it again I’d probably do it the same way, though, mainly because my VC 3.5 server was so decrepit and old.
  3. Install vCenter 4.0, detach the ESX 3.5 hosts from the old VirtualCenter and import them into 4.0. I was aiming for a clean start here, with a new, fresh database. This actually worked okay, at least until vCenter started locking up every couple of days. We fixed that with a scheduled job to restart it every night. Knowing what I know now I probably should have stopped here for a while, but then again each update to ESX introduced new problems, too, despite fixing a bunch. If I wanted the vSphere 4 features, and I did, I had to put up with it.
  4. Start a rolling upgrade to ESX 4.0. This part went very smoothly, and I did it over the course of two days, hammering it out on my 10 hosts or so. I didn’t upgrade, but instead I rebuilt from scratch so things were clean. At this point I was probably two months past the release.
  5. Start upgrading VMs to hardware version 7. This was a large effort that was basically about standardizing the configurations of virtual machines to specific VMXNet & SCSI drivers, removing unnecessary virtual hardware (from P2Vs and mistakes), and getting VMware Tools updated. I’m glad we did this. The only thing I’d do differently is better testing of the VMware Tools, because we ended up having some big problems with them, especially on Windows hosts and especially with the autoupgrade functions enabled. This process was long, though, and took roughly six months to get everything done. I tried to piggyback on normal patching processes, and wrote documentation that every sysadmin followed to do the upgrades.

So how does this inform my vSphere 5 upgrade thoughts?

  • Test like a maniac, using what I have learned from this journey. A former boss of mine, an ex-naval aviator, used to say that in the Navy lessons are written in blood. Yeah, pretty much the same. Minus the blood, of course (he also used to tell us, when we were all stressed out, that at least our bad days didn’t involve people shooting at us).
  • Chill out. I will probably upgrade vCenter to 5.0 sometime a couple months after the GA release, if it tests out okay. I have a couple of new cluster builds coming up and I’d like to run ESXi 5 on them, thus necessitating vCenter 5. But I may not actually update my main clusters for a while, at least until I know 5.0 is solid (or, rather, predictable).
  • When I do upgrade to ESXi 5.0, I might see if I can just upgrade one host in a cluster and leave it as the only 5.0 host for a couple of weeks. Note that this idea might be the dumbest thing ever, and might not be supported or a good idea. But all my vSphere 5 experience is based on clean installs, in test environments, and we all know production is different. So if the documentation, when it’s released, doesn’t completely kill this idea it might be a good way to dip a toe in the production ESXi 5 water without converting completely and irreversibly.
  • I won’t push for upgrades to hardware version 8. We don’t need the features in version 8 as much as we did in 7. I’m willing to have two standards for virtual hardware, though, and I’ll upgrade the template VMs to version 8, with all the requisite driver updates and such. I’ll also write documentation for sysadmins to do the upgrade if they want to, and arbitrarily tie inclusion in things like upcoming SRM deployments to the upgrade. Over the next year I expect half of our VMs will get upgraded, quietly and without me having to be the bad guy prodding people to get it done.
  • Stick with vCenter 5 installed on my Windows 2008 physical host for now, and once my server is end-of-life I’ll move to a Windows 2008 R2 VM. vCenter 5 as a Linux-based appliance is really a 1.0 product, it has some decent-sized stated limitations, and probably a few complications that aren’t stated. I also suspect that VMware Support won’t know a darn thing about it for a while, so if I do run into a problem I’m going to be on my own. That said, I’ll probably run it in my lab.
  • Take it easy on the new features, like autodeploy. I will likely install ESXi 5 from scratch, but still to the local disks. I think autodeploy is going to be great, and I do plan on moving to it, but as we don’t use DHCP or PXE in our data center I’ll need to have some changes made. I’ll also need to consider what happens to our DR & COOP plans with autodeploy, because there will be new dependencies involved. Likewise with Storage DRS, and policy-driven storage. Great ideas to start using in 2012, and it’ll give me time to get the storage and provisioning people warmed up to the ideas. Realistically, my initial goal will be solely to replicate vSphere 4 functionality with vSphere 5 and be stable. Perhaps I’ll use some of the time to convert datastores to VMFS 5, via migrating & reformatting.

These are just my thoughts. I encourage you to have your own, and share them with me in the comments! Doubly so if I’ve said something really dumb or factually incorrect. :)

{ 7 comments }

David Vekemans July 20, 2011 at 12:52 AM

Well, I will say the same kind of answer as with Microsoft products. Always wait for an Service Pack 1, or in the case of VMware an Update 1.

We are presently upgrading from 3.5 to vSphere 4.1 U1. So far, we did not have any major issue. The vCenter were migrated to new 64bit machines, but keeping the Database and the config. The ESX are completly reinstalled to ESXi.
The VMware tools upgrade and WM hardware upgrade will be done in the coming months.

Bob Plankers July 20, 2011 at 2:26 AM

I really, really disagree with the whole “wait for the first update” mentality. For starters, the more people that practice it the more there’ll be an incentive to release the first update quickly. Look at Red Hat Enterprise Linux 6 — Update 1 was released in record time. Why? Because of this line of thinking. Is everything fixed now, with Update 1? Um, no.

I think it depends heavily on the vendor. VMware introduces bugs with each update, but fixes others. If you want the features in an upgrade you have to do some testing and decide on your own. I think the fact that you’re just getting around to the vSphere 4 upgrade, six months after 4.1 U1 was released, and years after 4.0 U1 was released (which is what you’re saying to wait for), says that you guys have issues other than just waiting for the first update pack. I don’t recommend anybody follow your advice blindly.

John Rothlisberger July 21, 2011 at 8:04 AM

In other words, do VMware’s beta-testing for them.

You have to evaluate each product on its merits. Some Microsoft SP0 products have been really good while others have definitely deserved the reputation hit associated with a wait for SP1 or even SP2 — I would argue that Vista is worth skipping altogether, while Windows 7 SP0 is stunningly good. There’s no single solution.

Pete July 20, 2011 at 6:37 PM

When moving to 4.0, my biggest mistake that I minimized was updating my existing machines to Virtual Hardware 7. I wrote about my experiences at: http://itforme.wordpress.com/2010/01/17/side-effects-of-upgrading-vms-to-virtual-hardware-7-in-vsphere/ and it proceeded to be my most popular post. That says something! Without doing too much research on 5.0 yet, I’m willing to say my next “virtual hardware update” strategy will be to leave them as is, and at most, do them slowly and carefully. Or, just wait until the VM has to be replaced by another pristine VM, and build up the new one with the latest virtual hardware.

I never would have guessed this, but I’ve i’m still on 4.0, and am okay with a build from scratch vcenter approach, so I may consider just skipping 4.1. Haven’t decided yet.

- Pete

Ceri Davies July 21, 2011 at 7:37 AM

I’d be slightly wary of running a single host in a cluster on a different major version; back in the 3.5 days we had an issue where VMs would go crazy on the CPU after they were vMotioned to a downrev host.

Haven’t seen it with 4.x but that’s because we’ve made sure that we’ve avoided that situation ever since – there’s nothing to stop a VM being vMotioned between clusters so long as the hosts have the same connectivity so we tend to make use of that in order to ensure that DRS never moves a VM downrev. I think we’d probably choose a couple of hosts to create a small 5.0 cluster to try with a number of carefully selected VMs picked out of the main cluster.

Patters January 16, 2012 at 6:58 AM

I’m becoming similarly cautious, though I do tend to be an early adopter. I’ve just been bitten. I have just discovered that vSphere 5.0.0 and Dell EqualLogic iSCSI storage with MEM driver using the Broadcom ToE bnx2 series NICs seems to be a massive screw up. Works fine on 4.1, but on 5.0.0 despite the hotfix roll-up to 515841 as prescribed I’m getting timeouts all over the place, and only one path to the LUN being created. Even hacking the HBA advanced settings to a 60 sec timeout it still happens. I tried installing some updated Broadcom driver hotfix and that caused the hypervisor to purple screen crash at boot time! There’s just no way I can use this.

Similarly it took me 7 months with an open case with Symantec backline to get Backup Exec 2010 R3 working reliably with Agent for VMware Virtual Infrastructures.

Yes there are savings to be made with Virtualization but I’m growing less convinced of them day by day. I work in a place that doesn’t have the cash for a test infrastructure, so while I’m starting a migration the redundancy is offline. Gulp.

Lewis July 9, 2012 at 7:57 PM

I disagree with the waiting for SP1 thing too. It just seems like a really lazy way to avoid proper testing and is a lack of faith. Wait a month and read some reviews if you don’t want to be bleeding edge.

Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: