Frequently asked question: How much capacity should I have in my VMware environment?
My stock answer to this: N+1 in each cluster.
If you have N physical hosts worth of work in a cluster, have N+1 physical hosts. That way you have spare capacity for maintenance operations, and you can take a whole server completely out of the cluster by VMotioning its workload to the spare machine.
Think about your servers as buckets, and your workload as water. If you have 30 gallons of water in 6 buckets, where will you put 5 gallons of water when you need to drain one of the buckets? You need an empty bucket that’s as big as the largest bucket you have. In this case you need to have 35 gallons of capacity, or an extra 5 gallon pail.
In one of my clusters I have 6 Dell PowerEdge R900s (four socket, 96 GB RAM) and 6 Dell PowerEdge 2950s (two socket, 32 GB RAM). I treat one R900 as extra capacity (the ‘+1’ in “N+1”) because it’s able to take all of the work from another R900, or any of the 2950s.
In practice I let Dynamic Resource Scheduling (DRS) move workloads around freely between all of my hosts in a cluster, including the spare. However, I periodically check the load on the cluster by putting my spare machine into maintenance mode and ensuring that the load on the rest of the cluster is within limits, both for RAM and CPU.
(As an aside, I did a presentation on virtualization where I used the water analogy as a demo, with four large, clear plastic cups and some red food coloring. People get what you’re talking about, plus all the people dozing off in the audience wake up.)
Thanks for providing these details on your config. I use the VI3 Enterprise features as well on my clusters, but haven’t been brave/opportunistic enough to use Enhanced VC to combine different server models in the same cluster yet.
This reminded me of the post at vinternals today, where I learned that at MSFT IT, they put a maximum of 23 VMs on a 3-node Hyper-V cluster, so they can tolerate a host outage, too. X * (N+1)… almost the same thing.
http://vinternals.com/2009/04/microsoft-myths-and-realities/
“However, I periodically check the load on the cluster by putting my spare machine into maintenance mode and ensuring that the load on the rest of the cluster is within limits, both for RAM and CPU.”
What are your limits and how do you measure them? At what point do you start to think “I’m starting to run out of hosts”?