VMware Fault Tolerance, Determinism, and SMP

We’re all at least roughly familiar with Fault Tolerance, a feature VMware added to vSphere 4 to establish a mirrored VM on a secondary host. It’s kind of like RAID 1 for VMs. To do this, Fault Tolerance records the inputs to a primary VM, and then replays it on the secondary VM to achieve the same results. There are two important and somewhat subtle points here that help us understand why Fault Tolerance is limited to one CPU. First, the process records the inputs, not the state of the PC after the inputs happen. If you moved the mouse on the primary it moves the mouse on the secondary VM in exactly the same fashion. If you ping the …

Read More