Frequently asked question: How much capacity should I have in my VMware environment?
My stock answer to this: N+1 in each cluster.
If you have N physical hosts worth of work in a cluster, have N+1 physical hosts. That way you have spare capacity for maintenance operations, and you can take a whole server completely out of the cluster by VMotioning its workload to the spare machine.
Think about your servers as buckets, and your workload as water. If you have 30 gallons of water in 6 buckets, where will you put 5 gallons of water when you need to drain one of the buckets? You need an empty bucket that’s as big as the largest bucket you have. In this case you need to have 35 gallons of capacity, or an extra 5 gallon pail.
In one of my clusters I have 6 Dell PowerEdge R900s (four socket, 96 GB RAM) and 6 Dell PowerEdge 2950s (two socket, 32 GB RAM). I treat one R900 as extra capacity (the ‘+1’ in “N+1”) because it’s able to take all of the work from another R900, or any of the 2950s.
In practice I let Dynamic Resource Scheduling (DRS) move workloads around freely between all of my hosts in a cluster, including the spare. However, I periodically check the load on the cluster by putting my spare machine into maintenance mode and ensuring that the load on the rest of the cluster is within limits, both for RAM and CPU.
(As an aside, I did a presentation on virtualization where I used the water analogy as a demo, with four large, clear plastic cups and some red food coloring. People get what you’re talking about, plus all the people dozing off in the audience wake up.)