This is post #12 in my December 2013 series about Linux Virtual Machine Performance Tuning. For more, please see the tag “Linux VM Performance Tuning.”
Fans of the 12th Doctor Who have often heard the phrase “the Doctor lies.” The explanation for his lies is that, because he skips around in time, he knows things that others cannot know yet. Hypervisors are like that, too. Guest OSes don’t know that they aren’t the only OS on the hardware, and the hypervisor lies to them about things like CPUs, RAM, and things like system timers because, like the Doctor, the hypervisor is skipping a VM forward in time. And that’s the rub – only the hypervisor knows what the truth is.
Many traditional performance monitoring systems involve installing an agent on the guest OS which then monitors OS metrics like CPU utilization, RAM usage, etc. With the hypervisor lying about execution times, RAM allocations, and such these metrics are inaccurate from the guest OS point of view. The hypervisor keeps similar statistics, though, and because it knows the truth those stats are correct. For example, a guest OS might report that something is using 100% of a CPU. That doesn’t mean that it is using 100% of a real CPU. Examination of the performance data from the hypervisor might indicate that there is contention on the parent host, or perhaps a CPU limit is in place for the VM.
What difference does this make?
Accurate statistics are important for system troubleshooting and sizing. Using the wrong information will lead you to make bad decisions. Furthermore, the collection of statistics isn’t free. It takes CPU, RAM, disk, and network resources to collect and process that performance information. Why do it if it’s going to be wrong? You might also save some money on licensing, depending on how guest OS agents are charged for.
So what do I do?
Remove any agents you have running on VMs that collect system performance data, or disable their ability to collect system performance stats. Gather that data directly from the hypervisor instead. If you use system tools like sysstat you may wish to comment out the cron entries in /etc/cron.d/sysstat:
# Run system activity accounting tool every 10 minutes ##*/10 * * * * root /usr/lib64/sa/sa1 -S DISK 1 1 # 0 * * * * root /usr/lib64/sa/sa1 -S DISK 600 6 & # Generate a daily summary of process accounting at 23:53 ##53 23 * * * root /usr/lib64/sa/sa2 –A
and disable the system service:
$ sudo chkconfig sysstat off