I’ve been asked a few times about when I’m planning to upgrade to VMware vSphere 6.
Truth is, I don’t know. A Magic 8 Ball would say “reply hazy, try again.”
Some people say that you should wait until the first major update, like the first update pack or first service pack. I’ve always thought that approach is crap. Software is a rolling collection of bugs. Some are old, some are new, and while vendors try to make the number of bugs go down the truth is that isn’t the case all the time. Especially with large releases, like service packs. The real bug fixing gains are, to borrow a baseball term, in the “small ball” between the big plays. The way I see it, the most stable product is the version right before the big service pack.
Some people say that because 6.0 ends in .0 they’ll never run that code. “Dot-oh code is always horrible,” they volunteer. My best theory is that these people have some sort of PTSD from a .0, coupled with some form of cult-like shared delusion. A delusion like “nobody gets fired for buying IBM” or “we’ll be whisked away on the approaching comet.” Personally, I should get twitchy when I think about versions ending in .1. The upgrade to vSphere 5.1 was one of the most horrific I had. Actually, speaking of IBM, it seems to me that I filed a fair number of bugs against AIX 5.1, too, back in the day. Somehow I still can sleep at night.
Thing is, a version number is just a name, often chosen more for its marketing value than its basis in software development reality. It could have been vSphere 5.8, but some products were already 5.8. It could have vSphere 5.9, but that’s real close to 6.0. Six is a nice round number, easy to rebase a ton of products to and call them all Six Dot Oh. Hell, AIX never had a real 5.0, either, except internally in IBM as an Itanium prototype. To the masses they went from 4.3.3 to 5.1. Oh, and IBM’s version number was 5.1.0. OH MY GOD A DOT-ONE AND A DOT-OH. Microsoft is skipping Windows 9, not because Windows 10 is so epically awesome, but because string comparisons on “Windows 9*” will match “Windows 95,” too. And a lot of version numbers get picked just because they’re bigger than the competitors’ versions.
Given all this it seems pretty stupid to put much stock in a version number. To me, it’s there to tell us where this release fits in the sequence of time. 6.0 was before 7.0 and after 5.5.
Oh, but what about build numbers? I’ve had people suggest that. Sounds good, until you realize that the build numbers started back years ago when the codebase was forked for the new version. And, like the version number, it means almost nothing. It doesn’t tell you what bugs are fixed. It doesn’t tell you if there are regressions where 6.0 still has a bug that was fixed in 5.5, or the other way around where 5.5 still has a bug that 6.0 doesn’t because of the rework done to fix something else. Build numbers tell you where you are in time for a particular version, and roughly how many times someone (or something) recompiled the software. That’s it.
Some people say “don’t upgrade, what features does it have that you really need?” Heard this today on Twitter, and it’ll likely end up as a harsh comment on this post. Sure, maybe vSphere 6 doesn’t have any features I really need. But sometimes new versions have features that I want, like a much-improved version of that goddamned web client. Or automated certificate management — the manual process makes me think suicidal thoughts. Or cross-vCenter vMotion, oh baby where have you been all my life. Truth is, every time I hear this sort of upgrade “advice” and ask the person what version of vSphere they’re running it’s something ancient, like 4.1. I suspect their idea of job security is all the busy work it takes to run an environment like that, not to mention flaunting end-of-support deadlines. Count me out. I like meaningful work and taking advantage of improvements that make things better.
Some people say “upgrade your smallest environments first, so if you have problems it doesn’t impact very much.” Isn’t that the role of a test environment, though? Test is never like production, mostly because there’s never the same amount of load on it. And if you do manage to get the same load on it it’s not variable & weird like real user load. Just never the same. And while I agree in principle that we should choose the first upgrades wisely I always rephrase it to say “the least critical environments.” My smallest environments hold some of the most critical workloads I have. One of them is “things die & police are dispatched if there are problems” critical. I don’t think I’ll start there.
So where do I start? And how long will it take?
First, I’m doing a completely fresh install of vSphere 6.0 GA code in a test environment. I’m setting it up like I’d want it to be in production. Load-balanced Platform Service Controllers (PSCs). Fresh vCenters, the new linked mode (old linked mode was a hack, new linked mode isn’t even really quite linked mode, just a shared perception of the PSCs). A few nested ESXi hosts for now. I just want to check out the new features and test compatibility, gauge if it’s worth it.
Second, I’m going to wait for the hardware and software vendors in my ecosystem to catch up. Dell has certified the servers I’m running with ESXi 6.0. Dell, HDS, and NetApp have certified my storage arrays. But Veeam hasn’t released a version of Backup & Replication that supports 6.0 yet (soon, says Rick). Backups are important, after all, and I like Veeam because they actually do meaningful QA (I got a laugh from them once because I said I adore their radical & non-standard coding practices, like actually checking return codes). Beyond that, I’m going to need to test some of my code, scripts written to do billing that use the Perl SDK, PowerCLI scripts to manage forgotten snapshots, etc. I’m also going to need to test the redundancy. What happens when a patch comes along? What happens if we lose a PSC, or a vCenter, or something? Does HA work for vRealize Automation? Does AD authentication work? Can I restore a backup?
Third, I’m going to test actual upgrades. I’ll do this with a fresh 5.5 install, running against demo ESXi hosts with demo VMs, with the goal of having the upgraded environment look exactly like my fresh install. Load balanced PSCs, linked mode, vRealize Operations, Replication, Veeam, Converter, Perl SDK, PowerCLI, everything. I’ll write it all down so I can repeat it.
Last, I’ll test it against a clone of my 5.5 VCSA, fenced off from the production networks. I’ll use the playbook I wrote from the last step, and change it as I run into issues.
Truth is, I’ll probably get through step 1 and 2 by mid-May. But then it’ll drag out a bit. I expect upgrade problems, based on experience. I also know I’ve got some big high-priority projects coming, so my time will be limited for something like this. And it’ll be summer, so I’ll want to be in a canoe or on my motorcycle and not upgrading vSphere.
The one thing I do know, though, is that when I get to the production upgrade my path will be laid out by facts and experience, and not folk wisdom and the wives’ tales of IT.