What are P-states and how do I use them in vSphere?

VMware vSphere 4 added the ability to take advantage of Intel SpeedStep and AMD PowerNow! CPU power management features. These features are commonly known as “Dynamic Voltage and Frequency Scaling” or DVFS, and let an OS cooperate with the CPU to reduce power consumption by reducing the frequency of the CPU and the voltage at which it is operating. It reduces these things in preset tiers, and these tiers are known as P-states. On Intel CPUs they are trademarked as “SpeedStep” and on AMD they are either “Cool’n’Quiet” or “PowerNow!”

The Wikipedia article on Intel SpeedStep points out that “power consumed by a CPU with a capacitance of C, running at voltage V, and frequency f is approximately P = CV2f.”  This means if you can reduce the voltage to the CPU the power needs drop in a non-linear fashion. Furthermore, many electronic components run more efficiently at lower temperatures, and since consuming less power means less heat generated you end up seeing efficiency gains within the host as well as reduced load on data center cooling. This results in an overall reduced power bill, and potential savings in related systems like a UPS, generators, etc.

Frequency and voltage in a CPU are correlated. So are instructions per second and frequency. Basically, if you want your CPU to get more work done per second you need to increase the frequency it runs at, and to do that you need to increase the voltage. So why would you want to turn the CPU’s performance down in the first place? The thing is, CPUs are much faster than everything else in a computer system. If the CPU needs data for an operation it’ll look in cache. L1 cache operates at the CPU speed — fast but small. L2 cache operates at a fraction of the CPU speed, but still many times faster than RAM[1]. The problem is when the CPU needs data that isn’t found in cache and has to go to RAM or disk. Going to RAM means it’ll wait for thousands of clock cycles before the data is returned, because RAM is much slower than the CPU. Going to disk or network means waiting for millions of clock cycles, which is an eternity to a CPU. So while the system may be busy, the CPU might actually be idle, and that’s a great time to stop using power and generating heat.

When one process is doing I/O like that it’s also a good time for the hypervisor in vSphere (or scheduler in a regular OS) to run something else. That “something else” might not need the full performance of the CPU, either, and the frequency & voltage of the CPU can be decreased to save power in that case, too.

Given that all this trouble has been taken to add this feature to hardware and software, how do you turn it on?

1. Make sure your CPUs have this feature. According to VMware vCenter, under Configuration->Processors, my sample Dell PowerEdge R610 has Intel E5530 CPUs. I can check that by looking at Intel’s product web site, ark.intel.com, under “Xeon” processors.

2. If, in vCenter, under Configuration->Processors it has something like “Enhanced Intel SpeedStep” listed by “Power Management Technology” then you can proceed to step 3. If it says “Not Available” or something else you may need to set your BIOS to allow operating system control of the power management. On my Dell PowerEdge R610 the option is under Power Management. Set it to “OS Control” as:

Dell R610 Bios 1.3.6 - Power Management

On some older models, like the PowerEdge R900, it’s in the CPU options and called “Demand-Based Power Management.”

3. Go back in vCenter. By now the Power Management Technology should be populated with something other than “Not Available” (if that isn’t the case then check with your hardware vendor). If that’s set, go to Configuration->Advanced Settings, then Power, and change Power.CpuPolicy to “dynamic.”

vSphere Advanced Settings - Power

4. Say OK and you’re set.

I’ve added this to my checklist for bringing a new ESX host online now, and now that I’ve got it enabled I’m watching the power consumption a lot more closely. Can I tell a difference? Hard to say right now, as I don’t have enough new data for my small clusters. It still doesn’t replace Dynamic Power Management (DPM), because if you genuinely don’t need the capacity of a host shutting it completely off makes the most sense. But in the effort to be greener, every little bit helps, and it’s easy to enable.

As always, if I’ve made a mistake or you’d like to add relevant information just make a comment below. I read all my comments!

——————–

[1] This is why larger L1 & L2 caches are better, why prefetchers exist (to try prepopulating the caches with data the CPU might need), why architectures like Intel’s Nehalem add L3 caches that are shared among the cores, and why hypervisors try to schedule the same process on the same CPUs when they can (CPU affinity increases the chance that useful data is still in the caches). It’s all a big effort to keep the CPUs from waiting.

Comments on this entry are closed.

  • Regarding HP Proliant DL380 G5 servers, ours have Xeon 5430 quad-core procs, which ark.intel.com says DO have the SpeedStep feature, and they are all already set up in their BIOS for HP Dynamic Power Savings Mode, vs. the “OS Control Mode,” which you recommend. I will try this and see what happens!!

  • OH…does this power management ability apply to any other operating systems like Windows Server 2008 or Linux??

  • Tom, it should — I am probably going to write up how to set it up in Windows and Linux next week when I have had time to do some more research.

  • Updated the post with Dell PowerEdge R900 information. On Dell 10g hosts it’s called “Demand-Based Power Management” and is in the CPU options.

  • What about the power management policy within the GuestOS? On our 2008 R2 templates we set it to maximum performance.

  • What is the equivalent setting in vSphere 4.1??

    4.1 does not have Power.CpuPolicy — it does have Power.UsePStates and Power.UseCStates.

    Thank you, Tom