Sometimes computers are counterintuitive. One great case continues to be why a virtual machine with two vCPUs runs more slowly than a virtual machine with one vCPU.
Think of virtualization like a movie. A movie is a series of individual frames, but played back the motion looks continuous. It’s the same way with virtual machines. A physical CPU can only run one thing at a time, which means that only one virtual machine can run at a time. So the hypervisor “shares” a CPU by cutting up the CPU time into chunks. Each virtual machine gets a certain chunk to do its thing, and if it gets chunks of CPU often enough it’s like the movie: it seems like the virtual machine has been running continuously, even when it hasn’t. Modern CPUs are fast enough that they can pull this illusion off.
When one virtual machine stops running another virtual machine has an opportunity to run. If you have a virtual machine with one vCPU it needs a chunk of time from a single physical CPU. When a physical CPU has some free time that single vCPU virtual machine will run. No problem.
Similarly, in order for a virtual machine with two vCPUs to run it needs to have chunks of free time on two physical CPUs. When two physical CPUs are both available that virtual machine can run.
The trouble comes when folks mix and match single and dual-vCPU virtual machines in an environment that doesn’t have a lot of CPU resources available. A two-vCPU virtual machine has to wait for two physical processors to free up, but the hypervisor doesn’t like to have idle CPUs, so it runs a single vCPU virtual machine instead. It ends up being a long time before two physical CPUs free up simultaneously, and the two vCPU virtual machine seems really slow as a result. By “seems really slow” I mean it doesn’t perform very well, but none of the performance graphs show any problems at all.
To fix this you need to set the environment up so that two physical CPUs become free more often. First, you could add CPU resources so that the probability of two CPUs being idle at the same time is higher. Unfortunately this usually means buying stuff, which isn’t quick, easy, or even possible sometimes.
Second, you could set all your virtual machines to have one vCPU. That way they’ll run whenever a single physical CPU is free. This is usually a good stopgap until you can add CPU resources.
Last, you can group all your two vCPU machines together where those pesky single vCPU virtual machines won’t bother them. When a two vCPU virtual machine stops running it’ll always free up two physical CPUs. This usually means cutting up a cluster, though, so that will have also have drawbacks.
Virtualization can be awesome, but it can be pretty counterintuitive sometimes, too.
Like I assume you did, I found out about this the hard way. I first saw it when I tried to do 2-CPU VMs on two single-core hyperthreaded Xeons. (Back before I knew ESX better.) Nothing doing. Even on a four-core host it’s questionable. On 8-core hosts I haven’t found it to be a big issue (as long as 2-CPU VMs are the exception) but I completely agree that if you avoid mixing 1- and 2-CPU VMs on the same host there is less potential of a problem. Good points in your article.
I remember VMware preaching this back in the ESX 2.x days–as you probably do–so in some ways it’s still surprising that more people aren’t aware of the performance implications of vSMP VMs. It’s a real challenge to get customers to understand that VMs shouldn’t be provisioned in the same way they used to provision physical servers.
Bob, thanks for bringing this issue to light (again).
Yeah, I think I’ve blogged about it before, but I ran into a customer who was having this problem and thought it’d be helpful to repeat. 🙂
We had Xeon(R) X7350 @2.93 GHZ , 2.GB of RAM on our earlier VM, We upgraded our VMs to Dual Zenon(R) X7350 @2.93 GHZ and 4.00GB of RAM but the test we are running on this new VM is almost half slower than the older one i.e with single CPU.
Any help is appreciated !
Thank you
~bisu2000
Good News! The newer ESX versions don’t need to jump on the physical CPUs in one go, so it’s a lot easier than valet-parking a schoolbus at times. You still want to have vCPUs =< pCPUs, of course, but with some snooping it can drop only one ready vCPU onto the pCPU pool without waiting. Yay for progress!!