KVM processes in SWAP despite 35-55 GiB free RAM – Proxmox + ZFS

If you clear out swap regulary you are basically doing the same as not using swap at all. Then you could disable it as well. Now I wouldn't recommend this in general ( again see the already linked pieces by Kernel developer Chris Down for details ) but if you don't want to use swap in the first place why don't you disable it?
vm.swappiness = 1 is already set. So I really don't get it, why my proxmox is swapping.
This is working as expected, as explained by Chris Down:

What should my swappiness setting be?​

First, it's important to understand what vm.swappiness does. vm.swappiness is a sysctl that biases memory reclaim either towards reclamation of anonymous pages, or towards file pages. It does this using two different attributes: file_prio (our willingness to reclaim file pages) and anon_prio (our willingness to reclaim anonymous pages). vm.swappiness plays into this, as it becomes the default value for anon_prio, and it also is subtracted from the default value of 200 for file_prio, which means for a value of vm.swappiness = 50, the outcome is that anon_prio is 50, and file_prio is 150 (the exact numbers don't matter as much as their relative weight compared to the other).

This means that, in general, vm.swappiness is simply a ratio of how costly reclaiming and refaulting anonymous memory is compared to file memory for your hardware and workload. The lower the value, the more you tell the kernel that infrequently accessed anonymous pages are expensive to swap out and in on your hardware. The higher the value, the more you tell the kernel that the cost of swapping anonymous pages and file pages is similar on your hardware. The memory management subsystem will still try to mostly decide whether it swaps file or anonymous pages based on how hot the memory is, but swappiness tips the cost calculation either more towards swapping or more towards dropping filesystem caches when it could go either way. On SSDs these are basically as expensive as each other, so setting vm.swappiness = 100 (full equality) may work well. On spinning disks, swapping may be significantly more expensive since swapping in general requires random reads, so you may want to bias more towards a lower value.

The reality is that most people don't really have a feeling about which their hardware demands, so it's non-trivial to tune this value based on instinct alone – this is something that you need to test using different values. You can also spend time evaluating the memory composition of your system and core applications and their behaviour under mild memory reclamation.

When talking about vm.swappiness, an extremely important change to consider from recent(ish) times is this change to vmscan by Satoru Moriya in 2012, which changes the way that vm.swappiness = 0 is handled quite significantly.

Essentially, the patch makes it so that we are extremely biased against scanning (and thus reclaiming) any anonymous pages at all with vm.swappiness = 0, unless we are already encountering severe memory contention. As mentioned previously in this post, that's generally not what you want, since this prevents equality of reclamation prior to extreme memory pressure occurring, which may actually lead to this extreme memory pressure in the first place. vm.swappiness = 1 is the lowest you can go without invoking the special casing for anonymous page scanning implemented in that patch.

The kernel default here is vm.swappiness = 60. This value is generally not too bad for most workloads, but it's hard to have a general default that suits all workloads. As such, a valuable extension to the tuning mentioned in the "how much swap do I need" section above would be to test these systems with differing values for vm.swappiness, and monitor your application and system metrics under heavy (memory) load. Some time in the near future, once we have a decent implementation of refault detection in the kernel, you'll also be able to determine this somewhat workload-agnostically by looking at cgroup v2's page refaulting metrics.

(Chris Down in https://chrisdown.name/2018/01/02/in-defence-of-swap.html)

With other words: Even with swappiness=1 the kernel will swap just not that much as with higher settings. Setting swappiness=0 will disable it mostly exc

If you however want to use swap do avoid OOM errors but want to avoid the I/O I would try out whether using ZSWAP reduces the IO issues:
Using only zram removes disk swap I/O, but it instead simply serves to shift pressure on to the page cache. Under memory pressure, more file cache may be dropped (leading to rereads) or written back (if dirty). With disk-backed swap (or zswap), the system can often evict cold anonymous pages instead, which may reduce cache churn, and thus reduce I/O. That means that zram can actually increase total disk I/O if not well managed.

The real goal is to keep the active working set in RAM – and disk swap, used well, helps you do that by giving cold anonymous pages somewhere to go rather than forcing cold and hot data to compete for the same pool.

We have some concrete numbers to show this in practice. On Instagram, which runs on Django and is largely memory bound, we ran a test where we moved from their existing setup (with swap entirely disabled) to a setup with disk swap and zswap tiering. Django workers accumulate significant cold heap state over their lifetime, like forked processes with duplicated memory, growing request caches, Python object overhead, you get the idea. The results were twofold:

  • We achieved roughly 5:1 compression. That's a huge benefit for such a memory bound workload, and also enables us to consider further stacking workloads.
  • Enabling zswap reduced disk writes by up to 25% compared to having no swap at all(!).
As you can imagine, as a result of this test, Instagram has been using zswap for many years now.

Now, some of you may be looking at this wondering how adding swap could ever reduce disk
(Chris Downs on his experiences with zswap at Instagram quoted from https://chrisdown.name/2026/03/24/zswap-vs-zram-when-to-use-what.html )


Both pieces by Chris Down are a lot longer than my (already way to long ;) ) quotes and well worth a read. You basically need to decide whether you actually want to use swap or not. Whether enabling zswap will help you or not you need to try out for yourself.