VMware vSphere 5.5 & Dell 12G Servers: Reliable Memory Technology

A few days ago Dell released BIOS updates for their 12th generation servers. Among all the notes about preparations for the Intel E5-2600 v2 refresh there’s one line that’s of interest to those of us thinking about running vSphere 5.5 on our version 1 12G hardware:

New Memory Operating Mode setup option ‘Dell Fault Resilient Mode’

This is a patented new technology from Dell, wherein the hypervisor and system hardware can work together to place the hypervisor in a more redundant section of memory. Dell servers have shipped with a variety of tricks to protect against memory faults, things like Memory Page Retire, which will dynamically remove a page from usable memory space if it encounters an error. However, to get better reliability than that one had to enable the memory mirroring options in the BIOS.

Of course, memory mirroring is just like RAID 1 on disk: you get half the usable space. And on these E5-2600s there aren’t a ton of DIMM sockets to start with (and right now 32 GB DIMMs are 4x the price of 16 GB DIMMs), so RAM capacity is at a premium even without mirroring. The Reliable Memory Technology essentially mirrors just a part of the memory address space, and places the hypervisor & all its processes there so that even if there are other RAM errors that take VMs down the hypervisor stays up. Think “controlled emergency landing” and not “crash landing.” And you don’t lose half your RAM.

Anyhow, that’s pretty cool, especially since it’s retroactive to all the 12th generation servers. Now I just need to get the update on all my hosts…