This is post #4 in my December-long series on Linux VM performance tuning, Tuningmas.
One of those timeless questions in system administration has always been “how much swap space do I configure on my server?” The old rule used to be twice the amount of memory, but does a server with 256 GB of RAM really need a half terabyte of swap? And what about VMs? Swapping on VMs is a serious performance drag. Would it be a good idea to just disable swap completely?
One thing to consider is that there’s a tunable kernel parameter, /proc/sys/vm/swappiness, that controls the tendency of Linux to scavenge inactive memory pages and swap them out. It is a number from 0 to 100, where 0 is “never swap inactive pages out” and 100 is “be very aggressive at swapping inactive pages out.” Proponents of high swappiness values argue that by getting inactive pages out of RAM you can use the RAM for more productive things, like filesystem cache. Opponents point out that when you want to use that app it’ll need to be swapped back in, which is slow. Desktop guys say they need lots of swap in order to hibernate. Server guys argue that a server should never swap. It’s a low-grade holy war.
I mention the swappiness tunable as a way to say that I subscribe to the “servers should never swap” idea, especially for VMs. Disk is hundreds of thousands of times slower than RAM, making it a poor substitute. In virtual environments swapping on one VM doesn’t just affect that VM’s performance, either. It takes bandwidth and IOPS away from other VMs. And while VMware has interesting ways to cause vSphere-level swapping to be redirected to local SSD, a guest OS’s swap partition is not visible to vSphere as a separate entity. It looks and acts like legitimate I/O coming from the VM.
The other big downside is that a swap file or partition isn’t able to be deduplicated well on disk arrays that do that sort of thing. The contents will be pretty random, and certainly different between hosts. It is possible to set up a convoluted scheme where you use ‘swapoff’ to disable swap, use dd to copy /dev/zero into the partition, mkswap, and then use swapon to re-enable it. But that’s just tedious, and you have better things to do with your time.
Despite all the negative aspects of swap, I’m not a fan of disabling it completely. For me, swap space is a tool to keep a server alive between the time my monitoring system warns me about an out-of-RAM problem and the time I log in to look. Application memory leaks are pretty common, as are workload spikes and odd user behavior. In these situations I’m willing to trade performance for time, limping along instead of rolling the dice with the Linux out-of-memory process killer or a kernel panic and subsequent filesystem checks.
Here’s what I’ve decided to do in the face of all these considerations:
1. I configure 1 GB of swap per VM. It is a good balance between limping along and wasting expensive disk space. Occasionally I’ll need to create another swap partition and enable it as a temporary workaround for a problem like a memory leak on a system we can’t reboot right away. But if a Linux VM is consistently needing more RAM I allocate more, and I’m increasingly able to do this with hot-add capabilities.
2. I configure my monitoring systems to alarm when any swapping is occurring, so that a sysadmin can step in and take action. To me, swapping is a symptom of a configuration problem, and a clear & present danger to my virtual infrastructure’s performance.
3. I leave /proc/sys/vm/swappiness set to the default of 10 on my Red Hat Enterprise Linux VMs. I could set it to 0 but then I’d have to manage that configuration change, and 10 isn’t high enough to really do anything but foreshadow a bigger problem. And I’m fine with foreshadowing; I’ll take all the warning I can get.
As with most things in life there will be differences of opinion about swap configurations, and the answer to the question of how large you should make your swap partition is, once again, “it depends.” You might also have requirements from vendors, financial restrictions, bosses that order you to do it a certain way, etc. There is no right or wrong way as long as you understand the financial, performance, and management tradeoffs in using, or not using, swap partitions.
Comments on this entry are closed.
I love an in depth analysis that ends up with “I leave [it] set to the default”.
Lots of OS defaults are ridiculous. I’ll never give anybody crap for stopping to think about them a little, even if they do end up concluding it’s not worth changing.
One thing to remember, particularly if you tend to stick to an amount of swap, is that there is a minimum you should set on VMs.
Min VM swap = (Memory Configured – Memory Reservation) x 65%
By default the balloon driver can and will try to reclaim up to 65% of guest memory if the host is suitably resource constrained. If your guest can’t swap that much, then it can cause a kernel panic. Now that 65% is configurable, and memory reservations can change on the fly, so I always make sure my guest OSes have the equivalent of configured memory just in case. Disk is relatively cheap.
One time this regularly bites people in the ass is when they increase a VM’s RAM, they forget to increase the pagefile/swap appropriately. It’s easy to forget.
Excellent point. I think I’m going to have to restructure the post a bit because I forgot that. I don’t overcommit RAM (configured – reserved = 0) so I tend to forget the consequences of other setups. DOH. Thanks, as always, Forbes.