swappiness value is being ignored (100% RAM being used)

harmonyp

Member
Nov 26, 2020
196
4
23
46
swap is not kicking in until the server reaches 100% RAM. I am not sure why but my Proxmox node is acting as if the swappiness value is set to 0

cat /proc/sys/vm/swappiness
Code:
10

free -g
Code:
       total        used        free      shared  buff/cache   available
Mem:            503         498           3           0           1           0
Swap:          250         11        239

Some things that maybe causing this that I can think of

zfs

Code:
# Set to use 10GB Min
options zfs zfs_arc_min=10737418240
# Set to use 20GB Max
options zfs zfs_arc_max=21474836480

KSM

Code:
KSM_THRES_COEF=10
KSM_SLEEP_MSEC=10
 
Last edited:
Please tell me if I'm wrong, but this is what I think is happening (on my system as well and I think it is working as designed): The kernel does not start swapping just to create free memory because unused memory is wasted memory. It only swaps because it prioritizes something else than application memory, for example filesystem cache to improve I/O throughput at the cost of application latency.
There are no buff/cache because you use ZFS, and therefore the kernel does not have to decide whether to use less memory as filesystem cache or to swap memory out (to allow for more cache and faster I/O). Because there is no other reason to use swap than actual memory allocation by applications, it only starts swapping out memory when more memory is requested by applications (and there is none available).
 
Please tell me if I'm wrong, but this is what I think is happening (on my system as well and I think it is working as designed): The kernel does not start swapping just to create free memory because unused memory is wasted memory. It only swaps because it prioritizes something else than application memory, for example filesystem cache to improve I/O throughput at the cost of application latency.
There are no buff/cache because you use ZFS, and therefore the kernel does not have to decide whether to use less memory as filesystem cache or to swap memory out (to allow for more cache and faster I/O). Because there is no other reason to use swap than actual memory allocation by applications, it only starts swapping out memory when more memory is requested by applications (and there is none available).
I forgot to add my zfs & ksm settings in the first post, updated it now.

I don't know if what you laid out is the case but it sounds about right to me. This is the first time I have had this issue and it's with my zfs node. Are there any work arounds if this is the correct design? I need free RAM at all times for short bursts where I don't want to wait for memory to be swapped out.
 
I just make sure that my containers and VMs and ZFS ARC together don't use all memory. That does not mean that they could, if they all used their allowed maximum at the same time, but I have not seen this happen in normal usage. And yes, that probably means I have 10%-20% free (and wasted) memory on average. I find it best not to overcommit memory on a hypervisor as performance of VMs goes down terribly when then are swapped. Containers are more flexible in that way, as the host knows how they use their memory. Have you tried setting vm.swappiness to 100, to see if it does start to swap earlier and keep some memory free?

EDIT: If only ZFS would be inside the kernel and ARC would be seen as buff/cache...
 
Last edited:
I just make sure that my containers and VMs and ZFS ARC together don't use all memory. That does not mean that they could, if they all used their allowed maximum at the same time, but I have not seen this happen in normal usage. And yes, that probably means I have 10%-20% free (and wasted) memory on average. I find it best not to overcommit memory on a hypervisor as performance of VMs goes down terribly when then are swapped. Containers are more flexible in that way, as the host knows how they use their memory. Have you tried setting vm.swappiness to 100, to see if it does start to swap earlier and keep some memory free?

EDIT: If only ZFS would be inside the kernel and ARC would be seen as buff/cache...
Been at 100 for a while now doesn't look like it's doing anything.

What is wrong with swapping nowadays with NVMe drives? It swaps memory in/out very fast from what I can see
 
Did you limit the amount of memory ZFS is allowed to use for ARC? If not, it will use all free memory up to 50% of your total. In my experience a min of 1GB and a max of 8GB is more than enough. You might want to tune that to your VMs I/O behavior and host memory size. This allows you to keep some memory free (if you don't overcommit VMs).
I'm fine with swapping to NVMe, although some people complain about their drives (non-enterprise) flash degrading too quickly, but I don't decide the Linux kernel swap policy nor can I fix the ZFS license issue.

EDIT: Sorry, forgot that you already mentioned that you added it in the first post.
 
Last edited:
Did you limit the amount of memory ZFS is allowed to use for ARC? If not, it will use all free memory up to 50% of your total. In my experience a min of 1GB and a max of 8GB is more than enough. You might want to tune that to your VMs I/O behavior and host memory size. This allows you to keep some memory free (if you don't overcommit VMs).
I'm fine with swapping to NVMe, although some people complain about their drives (non-enterprise) flash degrading too quickly, but I don't decide the Linux kernel swap policy nor can I fix the ZFS license issue.
Yes it's in the first post

Code:
# Set to use 10GB Min
options zfs zfs_arc_min=10737418240
# Set to use 20GB Max
options zfs zfs_arc_max=21474836480
 
It seems to me that swappiness is ignored or that it works in a way no one comprehends (or at least I don't). My much smaller server, with 16 GB RAM and just 4 VMs using about 7 GB RAM total for themselves is swapping too much. I have 5 GB swapped out with vm.swappiness at 60 (default) and more or less the same value with 0. I have disabled swap completely and now the server runs much faster and perfectly fine with half the RAM in use and half the RAM free.

So basically my much smaller server swaps too much, your bigger one never swaps, regardless of swappiness setting.
 
I also see no swap usage. Most of the time it is like 1 digit MBs and I never saw a value greater than 300MB. Server got 64GB RAM, 64GB swap and is most of the time running at 80-90% RAM utilization where KSM is saving 9-11GB RAM (so without KSM I would be over 100%). Tried it with swappiness 1 and 100 but wasn't able to see any difference. I guess PVE won't swap out KVM processes nor the ARC? Or maybe RAM can't be swapped out if KSM is running? Atleast that would explain why the host isn't swapping because most of the RAM is used by VMs and ZFS.
 
Last edited:
I also see no swap usage. Most of the time it is like 1 digit MBs and I never saw a value greater than 300MB. Server got 64GB RAM, 64GB swap and is most of the time running at 80-90% RAM utilization where KSM is saving 9-11GB RAM (so without KSM I would be over 100%). Tried it with swappiness 1 and 100 but wasn't able to see any difference. I guess PVE won't swap out KVM processes not the ARC? Or maybe RAM can't be swapped out if KSM is running? Atleast that would explain why the host isn't swapping because most of the RAM is used by VMs and ZFS.

That's really incredible because I have seen the exact opposite behaviour, with KVM memory space heavily swapped out.
 
Swappiness determines the balance between dropping filesystem cache or swapping process memory. When using ZFS, it appears that the ARC is treated like process memory, and therefore there is no normal Linux filesystem cache. Therefore vm.swappiness has no actual effect because there is no demand for memory by the filesystem cache. For ballooning, ARC also appears to be counted as process memory not cache (that can be release/resized when needed).
On systems with only LVM (or just ext4), swappiness does have an effect and lost of I/O does push process memory to swap. However, it is difficult to make good decisions about which memory to swap from the outside of a VM, and ballooning might be a better approach for VMs (without PCI passthrough).
 
For me when I first started using proxmox I disabled swap as it was swapping at 30% usage and higher. It was killing performance.

I started swapping again after a while though as I noticed I had unexplained reboots which were possibly OOM related.

On my local machine I am testing two swaps, a high priority zram backed up by a lower priority physical storage based swap, and it will start using it (with a high swappiness value) at around 50-60% ram usage.

The value of the sysctl I believe affects whether to maintain cache or not during higher ram utilisation levels, if you have little to no cache anyway this potentially makes it only swap to avoid oom. The ZFS ARC is not manipulated by it as far as I know and is treated as no cache usage.

If you want to test if swap will function ok you can use the 'stress' tool from apt to force ram load on the system.

So e.g.
Code:
stress -m 2 --vm-bytes 30G --vm-keep
will load up 60 gig of ram using 2 threads.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!