I’ve also written about elevator=noop as part of my series on Linux performance tuning.
The Linux kernel has various ways of optimizing disk I/O. One method it uses to help speed I/O reorders requests to the disk so that when the head moves across the disk it can service those requests in an orderly, sequential manner, rather than going back and forth a lot. This is known as an “elevator,” since it’s basically what an elevator does, too. An elevator doesn’t drop people off at floor 11, then 2, then 5, then 3. Instead, it drops people off in order: 2, 3, 5, 11. Same with I/O to disks.
This approach is great, but the fatal flaw is that it assumes a single, physical disk, attached to a single physical SCSI controller in a single physical host. How does the elevator algorithm know what to do when the disk is actually a RAID array? Does it? Or, what if that one Linux kernel isn’t the only kernel running on a physical host? Does the elevator mechanism still help in virtual environments?
No, no it doesn’t. Hypervisors have elevators, too. So do disk arrays. Remember that in virtual environments the hypervisor can’t tell what is happening inside the VM[0]. It’s a black box, and all it sees is the stream of I/O requests that eventually get passed to the hypervisor. It doesn’t know if they got reordered, how they got reordered, or why. It doesn’t know how long the request has been outstanding. As a result it probably won’t make the best decision about handling those requests, adding latency and extra work for the array. Worst case, all that I/O ends up looking very random to the disk array.
Random I/O is very hard to deal with, because caching algorithms can’t predict what a host might want for data if it’s random. Arrays have limited cache memory, so they want to use it to help hosts that can benefit from it. With enough random I/O and enough cache misses the array might decide it can’t help you, and completely stop caching your I/O (reads, at least). Obviously, ineffective or disabled caching isn’t good for performance, and as a result all the VMs on your host suffer.
The Linux kernel has four different elevators, each with different properties. One of them, noop, is essentially a first-in first-out (FIFO) queue with no extra logic. And for virtual machines this is exactly what is needed. Each virtual machine can stop worrying about the disk, instead passing I/O requests along to the hypervisor to make a better decision about overall performance.
So how do you change the elevator in your Linux VM? Simply add “elevator=noop” to the kernel parameters in your boot loader’s configuration (/etc/grub.conf, for example), and restart. Easy.
——————-
[0] Environments that do full virtualization have the “black box” problem. Environments that do paravirtualization can “see” into the VM and are able to tell what is going on. That leads to better performance, but also has other problems (security, compatibility, etc.). Regardless, paravirtualization is where we’re all headed.
I’m sure you’re aware, but you can also:
a) Change it on the fly with `echo noop > /sys/block/${DEVICE}/queue/scheduler`.
b) Set the default in your kernel config for all VM kernels you roll, with `CONFIG_DEFAULT_IOSCHED=”anticipatory”`.
Incidentally I’ve seen Xen VMs change to noop on their own accord – I believe in conjunction with XFS’s write barrier tests.
Yeah, good point. I knew that but didn’t think to add it (was late at night when I wrote this). 🙂
Drivers can set their own elevator policies, too, which might be what Xen is doing. The Dell PERC RAID controller drivers appear to set it, too, which makes sense.
Other thing I wanted to mention: if you’re changing a lot of kernel options check out /sbin/grubby, which lets you automate it.
Another use case for ‘elevator=noop’ is obviously an SSD (flash disk) in the machine.