By themselves, load averages on Linux are not an indication of CPU utilization.
Classically, UNIX systems have calculated the load average by counting the number of processes that are either running on the CPU or runnable (ready & waiting for a CPU to run them). Linux does this, but it also counts the number of processes in uninterruptable sleep. Uninterruptable sleep usually means a process is blocking on I/O (waiting for disk, etc.).
As such, you can’t really use a Linux host’s load average to determine the CPU utilization of the host. If the load is high you might have an I/O problem instead of a CPU bottleneck. Then you need tools like vmstat, top, iostat, etc. to tell you what is actually going on.
This doesn’t mean load average is useless, though. From a monitoring standpoint it’s something that can tell you at a glance that something is wrong. Just remember that higher load averages aren’t necessarily bad. If your 16 CPU machine has a load of 16 that might just mean it’s being fully utilized.
It might also mean something is broken, though, too. 🙂
“Uninterruptable sleep usually means a process is blocking on I/O (waiting for disk, etc.).”
It might be worth noting that most IO is not uninterruptible: FD based IO for instance (files or sockets), where sleeping coincides with entering and leaving a syscall.
AFAIK it’s mostly VM-based IO that are uninterruptible, either mmaped file or plain old swapping in or out: since the process is stopped on a page fault.
“Just remember that higher load averages aren’t necessarily bad.”
I’ve tried to beat this point into people in the past but it just hasn’t sunk in… Maybe I just need to use a heavier object. My life would be so much easier if I could find some way of hiding the load average from my users.
“No, a load average of 4 on an 8 core machine really isn’t a problem. Please stop sending me an email every time it goes over 1.”
Kudos. An excellent summary of a subject that eludes many in this field
Don’t forget about sar from sysstat. You can get all kinds of valuable info from sar. I tried to explain how to determine if you are cpu bound or not in a post (CPU Performance Analysis in Linux) on my blog. Anyway it’s important to keep letting people know that high load doesn’t equal a problem. It sure may indicate there is a problem but its not the golden scroll.
“As such, you can’t really use a Linux host’s load average to determine the CPU utilization of the host.”
This is probably something that needs to be stated and blogged about more often. I often have clients or see new sysadmins that don’t understand that you can have low CPU usage but high load averages.
The sysstat utilities are a great tool that is often overlooked. I try to have it running on all of client’s systems. Makes it much easier to diagnose issues when you have some real data vs the “my server is slow” input.