Heisenberg & Monitoring

From Wikipedia:

In quantum mechanics, the Heisenberg uncertainty principle states that certain pairs of physical properties, like position and momentum, cannot both be known to arbitrary precision. That is, the more precisely one property is known, the less precisely the other can be known… The measurement of position necessarily disturbs a particle’s momentum, and vice versa.

Stated a little more simply, the sheer act of measuring a particle disturbs it, such that you can only get approximate measurements.

This is also true of computing systems and monitoring. The act of watching a system consumes resources on that system, which in turn skews the numbers you get from the monitoring system. The more data you collect, the more intensive the data collection is, the more resources it consumes. The effect is quite observable on virtual machines. I’ve got some virtual machines where customers are running their own performance monitoring tools, and those tools make what would otherwise be an idle VM into something consuming quite a bit of CPU. Multiply that by the number of VMs involved, and even a 100 MHz CPU increase makes a huge difference, en masse. Especially if those tools all choose to report on the same schedule (every minute, every 5 minutes, etc. from the top of the hour). Your performance monitoring tool might actually be causing performance problems.

Running performance monitoring tools directly on virtual machines might be a bad idea anyhow. Not only do you waste resources by doing so, you also may get incomplete results because the VM itself doesn’t know the whole story. This is especially true if resource limits are in effect in your virtualization environment[0]. What the VM thinks is 100% of a vCPU might only be 25% of an actual CPU because of resource contention. Out-of-band tools like esxtop, or vCenter’s performance charts, can tell a more factual story[1]. Besides, if you really need the guest OS point of view you can always log in and use Resource Monitor or top/iostat/vmstat to find out what the virtual machine thinks. Just make sure you’re not doing all that extra work to collect the wrong data. 🙂

—————————-

[0] The VMware Descheduled Time Accounting service, which comes with the VMware Tools, can help Windows VMs by correctly accounting for time spent waiting for ESX to run the VM again. Newer & upcoming Linux distributions also can account for that with their tickless clock kernel features. But it’s usually more efficient to gather the data from the hypervisor itself.

[1] Remember, though, that esxtop and vCenter’s performance charts use resources somewhere, too (usually the ESX console OS, and/or vCenter Server & SQL Server).