Proxmox Node perpetually high SWAP usage

CZappe · Apr 20, 2022

I have a Proxmox node currently configured with 126 GB RAM, actively running 10 Linux and Windows VMs. Our users were reporting issues with some of these VMs which seemed to coincide with high SWAP usage on the node, even though total RAM usage was below 50%. To ease the SWAP load, I made two changes on the node:

I configured the swappiness on the node down to 10 from the default 60
I added additional swap capacity, increasing from 8GB to 56GB, by progressively adding 8GB swap files on the node's 'local' storage. This practice has worked on some of our other nodes to help reduce what I'll call "swap pressure"

Over the past several weeks, the SWAP usage on this node has crept up to utilize all available resources, so that 95-100% of SWAP remains in use at all times:

Digging deeper into the memory and SWAP utilization via smem, reveals the following:

Code:

root@node:~# smem -s swap -r | head -n 15
  PID User     Command                         Swap      USS      PSS      RSS
11938 root     /usr/bin/kvm -id 170 -name  42599700 22674700 23489570 27412972
13235 root     /usr/bin/kvm -id 667 -name  25685552  1296276  1611772  7527292
18462 root     /usr/bin/kvm -id 666 -name  25408420  2421920  2731459  7814792
21446 root     /usr/bin/kvm -id 126 -name   5640456   262200   306394   530272
15016 root     /usr/bin/kvm -id 555 -name   2643200  4955380  5481289  6655020
18239 root     /usr/bin/kvm -id 117 -name   2168480   610788   773351  1051332
24143 root     /usr/bin/kvm -id 175 -name   2144240   362392   384634   468244
 2668 root     /usr/bin/kvm -id 132 -name   1829140   603496   670910   850020
12291 root     /usr/bin/kvm -id 199 -name    550876  9824116 10101969 11293832
 5717 root     /usr/bin/kvm -id 128 -name    522652  1636084  1681112  1973628
 7727 www-data pveproxy worker (shutdown)    133792      344      548     4000
 2547 root     pvedaemon                     104684      644     6995    33196
15452 root     pvedaemon worker               90128    12996    20974    48856
 5143 root     pvedaemon worker               90004    12748    20876    48908

This seems crazy to me, given the utilization levels on the rest of the node's resources. I've reviewed the Proxmox PVE Admin documentation, taking particular note of the following sections:

3.8.8. SWAP on ZFS
11.5.2. Control Groups (based on discussion in Proxmox forum post: https://forum.proxmox.com/threads/swappiness-question.42295/page-2)

I also found an informative, detailed overview of memory and swapping in Linux here:
https://www.howtogeek.com/449691/what-is-swapiness-on-linux-and-how-to-change-it/

After all this, I'm still left scratching my head about how to best manage SWAP in the Proxmox environment. The discussion I linked above from 2018-19, indicates that a solution may involve scripting the swappiness to the Control Groups via rc.local, but that was under the PVE 5.3-X kernel and my node is running PVE 6.4-13.

What should I be doing to get my SWAP under control? Thanks!

leesteken · Apr 20, 2022

Usually, it is not a problem if swap fill slowly over time as it makes room for filesystem cache. If allocated memory is never really used again, then its good that those parts were swapped out. The problems start when the system needs to read back from swap often, which indicates that you put too much pressure on the memory or the system is making bad decisions about when/what to swap out. This shows as heavy I/O (high IO delay) and the whole system being slow to respond.
Please don't get me wrong but: Is there an concrete problem that you are trying to solve or is the red color of the graphics just bothering you?

CZappe · Apr 20, 2022

I suppose if I were to boil this down to a concrete problem it would be this:

The constant utilization of ~100% of any server resource indicates to me a need to either upgrade or reconfigure the server. In this case, my concern is that keeping the swap usage pegged will lead to perpetual IO delays and VM instability, and rapid degradation of the server's SSD storage.

That said, I'm still generally at a loss regarding how to properly manage swap utilization on a Proxmox node. Simply tuning the global swappiness value and adding additional swap space does not appear to be having a discernible effect on the node's status bars. I don't see anything that seems to suggest any other approach, however, so it seemed like a good time to ask.

...and yes, as a sysadmin, the presence of red status indicators does bother me, but I do try to get to the "why" beneath all that red before letting my blood pressure rise.

leesteken · Apr 20, 2022

Other than disabling swap, setting vm.swappiness to 0 (only swap on lack of memory), adding more memory or reducing VM memory allocation, I know how to change principle swap usage over long periods of time.

However, I'm also curious why more than half of the allocated memory is swapped. It suggests to me that there is a lot of file based I/O (maybe not high peaks but slowly over time) going on. Linux appears to swap stuff out in favor of more (read) cache for the filesystem(s). I see this more often with containers than VMs. What is the storage type used for the virtual disks? What caching mode is set for the virtual disks?

CZappe · Apr 20, 2022

I think you're probably right that what I'm seeing here is a slow-creep of file-based as opposed to anonymous page swapping. If the Proxmox kernel does indeed use the Control Group swappiness value rather than the global swappiness value, this would seem to make sense:

Code:

root@node:~# cat /proc/sys/vm/swappiness
1
root@node:~# cat /sys/fs/cgroup/memory/system.slice/pvedaemon.service/memory.swappiness
60

So for all of my fiddling, it may be that the Proxmox kernel is still using a swappiness value of 60. On the storage front, here's how the VMs on this node are configured.

8 of the 10 VMs are using storage based on a NAS device, connected over the NFS 3 protocol. The other two VMs (132 and 555) are stored on the local-lvm partition.
The VMs use mix of SCSI and VirtIO as the storage bus. All VMs use VirtIO as the SCSI controller.
All VMs have their storage caching modes set to Default (No Cache)

CZappe · Apr 20, 2022

In terms of memory allocation on this node, the total memory size allotted to the active VMs is greater than the available RAM on the node. However, the host memory usage is still below 50%, so it doesn't seem like reducing VM memory allocation would really have an impact here

Search

Search

Proxmox Node perpetually high SWAP usage

CZappe

Active Member

leesteken

Distinguished Member

CZappe

Active Member

leesteken

Distinguished Member

CZappe

Active Member

CZappe

Active Member

We value your privacy