Recommended amount of swap?

speck

New Member
May 8, 2025
21
3
3
I have a three node cluster running PVE 8.4, and things have been working well.

But two days ago, I noticed in dmesg that the OOM killer stopped one of my VMs:

Code:
Sep 02 08:57:28 proxmox-2 kernel: pvedaemon worke invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Sep 02 08:57:28 proxmox-2 kernel: CPU: 91 PID: 196197 Comm: pvedaemon worke Tainted: P           O       6.8.12-9-pve #1
Sep 02 08:57:28 proxmox-2 kernel: Hardware name: Dell Inc. PowerEdge R6725/0KRFPX, BIOS 1.1.3 02/25/2025
Sep 02 08:57:28 proxmox-2 kernel: Call Trace:
Sep 02 08:57:28 proxmox-2 kernel:  <TASK>
Sep 02 08:57:28 proxmox-2 kernel:  dump_stack_lvl+0x76/0xa0
Sep 02 08:57:28 proxmox-2 kernel:  dump_stack+0x10/0x20
Sep 02 08:57:28 proxmox-2 kernel:  dump_header+0x47/0x1f0
Sep 02 08:57:28 proxmox-2 kernel:  oom_kill_process+0x110/0x240
Sep 02 08:57:28 proxmox-2 kernel:  out_of_memory+0x26e/0x560
Sep 02 08:57:28 proxmox-2 kernel:  __alloc_pages+0x10ce/0x1320
Sep 02 08:57:28 proxmox-2 kernel:  alloc_pages_mpol+0x91/0x1f0
Sep 02 08:57:28 proxmox-2 kernel:  alloc_pages+0x54/0xb0
Sep 02 08:57:28 proxmox-2 kernel:  folio_alloc+0x15/0x40
Sep 02 08:57:28 proxmox-2 kernel:  filemap_alloc_folio+0xfd/0x110
Sep 02 08:57:28 proxmox-2 kernel:  __filemap_get_folio+0x195/0x2d0
Sep 02 08:57:28 proxmox-2 kernel:  filemap_fault+0x5d0/0xc10
Sep 02 08:57:28 proxmox-2 kernel:  __do_fault+0x3a/0x190
Sep 02 08:57:28 proxmox-2 kernel:  do_fault+0x296/0x4f0
Sep 02 08:57:28 proxmox-2 kernel:  __handle_mm_fault+0x894/0xf70
Sep 02 08:57:28 proxmox-2 kernel:  handle_mm_fault+0x18d/0x380
Sep 02 08:57:28 proxmox-2 kernel:  do_user_addr_fault+0x169/0x660
Sep 02 08:57:29 proxmox-2 kernel:  exc_page_fault+0x83/0x1b0
Sep 02 08:57:29 proxmox-2 kernel:  asm_exc_page_fault+0x27/0x30
Sep 02 08:57:29 proxmox-2 kernel: RIP: 0033:0x628c76483374
Sep 02 08:57:29 proxmox-2 kernel: Code: Unable to access opcode bytes at 0x628c7648334a.
Sep 02 08:57:29 proxmox-2 kernel: RSP: 002b:00007ffdc7820430 EFLAGS: 00010297
Sep 02 08:57:29 proxmox-2 kernel: RAX: 0000000000000002 RBX: 0000628cb07c82a0 RCX: 0000000000000000
Sep 02 08:57:29 proxmox-2 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000628cb07c82a0
Sep 02 08:57:29 proxmox-2 kernel: RBP: 0000628cb8eb85c8 R08: 0000000000000000 R09: 0000628cb8e9da68
Sep 02 08:57:29 proxmox-2 kernel: R10: 0000628cb2d4c150 R11: 00007ffdc7820950 R12: 0000628c766a38e0
Sep 02 08:57:29 proxmox-2 kernel: R13: 0000628cb07c83e0 R14: 0000000000000052 R15: 0000628c76528630
Sep 02 08:57:29 proxmox-2 kernel:  </TASK>
Sep 02 08:57:29 proxmox-2 kernel: Mem-Info:
Sep 02 08:57:29 proxmox-2 kernel: active_anon:20052567 inactive_anon:44204516 isolated_anon:0
                                   active_file:0 inactive_file:1773 isolated_file:0
                                   unevictable:52120 dirty:0 writeback:0
                                   slab_reclaimable:124971 slab_unreclaimable:387891
                                   mapped:25830 shmem:23126 pagetables:143465
                                   sec_pagetables:15638 bounce:0
                                   kernel_misc_reclaimable:0
                                   free:153117 free_pcp:733 free_cma:0
Sep 02 08:57:29 proxmox-2 kernel: Node 0 active_anon:60206284kB inactive_anon:67036072kB active_file:0kB inactive_file:1184kB unevictable:67276kB isolated(anon):0kB isolated(file):0kB mapped:39100kB dirty:0kB writeback:0kB shmem:25240kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:86706176kB writeback_tmp:0kB kernel_stack:18984kB p>
Sep 02 08:57:29 proxmox-2 kernel: Node 1 active_anon:20003984kB inactive_anon:109781992kB active_file:0kB inactive_file:5908kB unevictable:141204kB isolated(anon):0kB isolated(file):0kB mapped:64220kB dirty:0kB writeback:0kB shmem:67264kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:75548672kB writeback_tmp:0kB kernel_stack:11032kB>
Sep 02 08:57:29 proxmox-2 kernel: Node 0 DMA free:11264kB boost:0kB min:4kB low:16kB high:28kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 02 08:57:29 proxmox-2 kernel: lowmem_reserve[]: 0 735 128190 128190 128190
Sep 02 08:57:29 proxmox-2 kernel: Node 0 DMA32 free:509584kB boost:0kB min:256kB low:1008kB high:1760kB reserved_highatomic:0KB active_anon:151808kB inactive_anon:142068kB active_file:0kB inactive_file:12kB unevictable:0kB writepending:0kB present:878632kB managed:813092kB mlocked:0kB bounce:0kB free_pcp:500kB local_pcp:0kB free_>
Sep 02 08:57:29 proxmox-2 kernel: lowmem_reserve[]: 0 0 127455 127455 127455
Sep 02 08:57:29 proxmox-2 kernel: Node 0 Normal free:46692kB boost:0kB min:44668kB low:175180kB high:305692kB reserved_highatomic:2048KB active_anon:60054480kB inactive_anon:66894000kB active_file:0kB inactive_file:1064kB unevictable:67276kB writepending:0kB present:132620224kB managed:130514484kB mlocked:64204kB bounce:0kB free_>
Sep 02 08:57:29 proxmox-2 kernel: lowmem_reserve[]: 0 0 0 0 0
Sep 02 08:57:29 proxmox-2 kernel: Node 1 Normal free:44928kB boost:0kB min:45180kB low:177188kB high:309196kB reserved_highatomic:0KB active_anon:20003984kB inactive_anon:109781992kB active_file:0kB inactive_file:5908kB unevictable:141204kB writepending:0kB present:134181632kB managed:132019088kB mlocked:141204kB bounce:0kB free_>
Sep 02 08:57:29 proxmox-2 kernel: lowmem_reserve[]: 0 0 0 0 0
Sep 02 08:57:29 proxmox-2 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Sep 02 08:57:29 proxmox-2 kernel: Node 0 DMA32: 2*4kB (UM) 12*8kB (UME) 13*16kB (UME) 13*32kB (ME) 13*64kB (ME) 15*128kB (UME) 13*256kB (UME) 8*512kB (UME) 7*1024kB (UM) 4*2048kB (U) 118*4096kB (UME) = 509592kB
Sep 02 08:57:29 proxmox-2 kernel: Node 0 Normal: 734*4kB (UME) 710*8kB (ME) 431*16kB (UME) 486*32kB (UME) 203*64kB (UME) 24*128kB (UM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47384kB
Sep 02 08:57:29 proxmox-2 kernel: Node 1 Normal: 9*4kB (UME) 141*8kB (UME) 981*16kB (UM) 788*32kB (UME) 17*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 0*4096kB = 45212kB
Sep 02 08:57:29 proxmox-2 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 02 08:57:29 proxmox-2 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 02 08:57:29 proxmox-2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 02 08:57:29 proxmox-2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 02 08:57:29 proxmox-2 kernel: 58626 total pagecache pages
Sep 02 08:57:29 proxmox-2 kernel: 29370 pages in swap cache
Sep 02 08:57:29 proxmox-2 kernel: Free swap  = 44kB
Sep 02 08:57:29 proxmox-2 kernel: Total swap = 8388604kB
Sep 02 08:57:29 proxmox-2 kernel: 66924121 pages RAM
Sep 02 08:57:29 proxmox-2 kernel: 0 pages HighMem/MovableOnly
Sep 02 08:57:29 proxmox-2 kernel: 1083615 pages reserved
Sep 02 08:57:29 proxmox-2 kernel: 0 pages hwpoisoned

Today, I see that this host is still using 100% of the 8GB swap space, even though there is plenty of free RAM:
1757022373346.png

Searching these forums, it sounds like swap space may be important to coalesce pages, other people suggest swap should be avoided altogether. But, many of these posts may be somewhat dated (5+ years) so the current "best practice" may not be reflected therein.

So a couple of questions:
  1. Is the 8GB default that the Proxmox PVE installer set up the right amount for a host with 256GB of physical RAM? Short of manual partitioning I don't see a way to change that during setup. I'm currently investigating upgrading the cluster from PVE8 to 9; perhaps an ideal time to make the change?
  2. Should I increase the amount of swap available on the host? The local drive is a 400GB SSD so I could allocate more space to swap if it would be helpful, but I worry that there is some underlying mechanism that would push the swap usage to fill all available space. Is this a "buffer bloat"-like situation?
  3. What's the best way to free up this swap space? The UI is presenting the swap usage graph in red, which tells me "unhealthy", but I don't see how to nudge PVE to do whatever is required to make it "healthy" again.
  4. The hosts' value of vm.swappiness is the default of 60; is this still the recommended setting?

-Cheers,

speck
 
Last edited:
  1. Is the 8GB default that the Proxmox PVE installer set up the right amount for a host with 256GB of physical RAM? Short of manual partitioning I don't see a way to change that during setup. I'm currently investigating upgrading the cluster from PVE8 to 9; perhaps an ideal time to make the change?

Yes, in fact it's the maximum the insaller will use see https://pve.proxmox.com/wiki/Installation. if you install to ZFS noswap will be used at all since swap on a zfs volume is known for a possibility to cause trouble:
https://pve.proxmox.com/wiki/ZFS_on_Linux#zfs_swap


Now you might wonder why you would even need swap with that much (free) RAM like you have. Here is a blog post by one of the Linux kernel developers who is working on the memory management: https://chrisdown.name/2018/01/02/in-defence-of-swap.html
In short: Yes SWAP is still a good thing to have, even with that much RAM as you have ;)

Whether that need to be physical swap is a different story though. You could use zram to setup part of your RAM as a compressed RAM disc used as a swap device (I recommend zram-tools for this due do it's easily understand configuration (tell it to use a fixed % of system RAM) but the other methods explained in the wiki work as well. The effect (at the cost of a higher CPU load, you have plenty if your screenshot is to be believed) is that part of the RAM will be compressed (due to being used for the RAM based SWAP) and works even with ZFS.
The effect is that can fit more vms/lxcs in your RAM (ok this might be a problem you don't have) and a shorter IO time see @LnxBil comment in this thread on the pros and cons of zram: https://forum.proxmox.com/threads/zram-why-bother.151712/

He also mentioned that LXCs seems to have an tendency to have more swapping than VMs even if (in theory) more than enough RAM is available.

  1. Should I increase the amount of swap available on the host? The local drive is a 400GB SSD so I could allocate more space to swap if it would be helpful, but I worry that there is some underlying mechanism that would push the swap usage to fill all available space. Is this a "buffer bloat"-like situation?

Imho not needed, but I would add a zram disc set to around 10% of your system RAM . Then in most cases the RAM disc would be used for swap with the partition as last resort.
  1. What's the best way to free up this swap space? The UI is presenting the swap usage graph in red, which tells me "unhealthy", but I don't see how to nudge PVE to do whatever is required to make it "healthy" again.

Imho this is not really a problem but more in the " linux ate my ram" space, see https://www.linuxatemyram.com/ and my less in-depth, more generic https://forum.proxmox.com/threads/f...ram-usage-of-the-vms-in-the-dashboard.165943/ Note it's on RAM not swap but as far I know (like RAM) it's normal for Linux to not free swap (like caches in memory ) if not needed. To disable swap completly you would edit /etc/fstab, remove or comment the swap line and reboot. For a temporary removal of the swap you would disable it with swapoff /path_to_swap_disc and reenable with swapon /path_to_swap. Alternatively with swapoff -a and swapon -a to do this with all your swap partitions or swap files.
  1. The hosts' value of vm.swappiness is the default of 60; is this still the recommended setting?

This is a default which is indeed a good fit for most usecases including servers, workstations and user desktops. It's also (see Chris Downs piece) not, what many people imagine it to be: It's not a knob to change the propability that the system will do swapping but it's actually more complicated.

Now depending on available system ram, whether swap is on HDDs or ssds and your usecase it might be a good idea to actually have a higher value (some desktop Linuxes uses zramswap and have a higher swapiness, so the RAM disc is used at some earlier point) or lower value (many servers use 10 as a setting to still use swap although to a lesser degree than with 60. A value of 1 leads to "Don't use swap expect when needed" and a value of 0 disables swap completly (of course instead of changing swapiness to disable it you could also just use swapoff and disable it in /etc/fstab))). Chris Down wrote on this subject (what swapiness actually needs and how to choose a value fitting for your usecase) in his piece linked above.

Back to your problem: Imho you have more than enough RAM so I don't even get why the OOM killer was activated in the first place. Was a vm or lxc killed by it? If not do you have any idea which other process was killed by it? Imho the best course of action would be to find out which process was killed and why and then to find and fix the root cause for it's activation
 
Last edited:
Thanks for the insightful comments so far.


What's the best way to free up this swap space? The UI is presenting the swap usage graph in red, which tells me "unhealthy", but I don't see how to nudge PVE to do whatever is required to make it "healthy" again.
Imho this is not really a problem but more in the " linux ate my ram" space, see https://www.linuxatemyram.com/ and my less in-depth, more generic https://forum.proxmox.com/threads/f...ram-usage-of-the-vms-in-the-dashboard.165943/ Note it's on RAM not swap but as far I know (like RAM) it's normal for Linux to not free swap (like caches in memory ) if not needed. To disable swap completly you would edit /etc/fstab, remove or comment the swap line and reboot. For a temporary removal of the swap you would disable it with swapoff /path_to_swap_disc and reenable with swapon /path_to_swap. Alternatively with swapoff -a and swapon -a to do this with all your swap partitions or swap files.

I'm fine not freeing the swap space and letting the system manage it; there's no bonus check for having resources left unused. ;)

The reason I ask: at some point the Proxmox PVE team made a conscious design choice to dedicate screen space on the dashboard to display the SWAP usage and to color it yellow or red as the usage got above some threshold values. In my eyes, this means it's a resource that would be good to keep an eye on and manage to keep it in a "healthy" range. If 100% swap usage is fine and healthy, I don't understand why it would be monitored like this.


Back to your problem: Imho you have more than enough RAM so I don't even get why the OOM killer was activated in the first place. Was a vm or lxc killed by it? If not do you have any idea which other process was killed by it? Imho the best course of action would be to find out which process was killed and why and then to find and fix the root cause for it's activation

From the dmesg/systemd journal message, it was a "kvm" process that was killed, so I believe that was a VM and not a container.

Also check what's using the SWAP. Make KSM start sooner and/or change ballooning target. Investigate and potentially change the ZFS ARC size.
Then check top -co%MEM and Datacenter > Search for what uses memory. Also see the the memory history graphs of your guests.
Don't over-allocate too much. If you want to purge the SWAP for some reason the simplest way is probably to do swapoff -a && swapon -a.
Please use code blocks when sharing logs.

I've updated my original post to move the logs from a spoiler tag to a code block; thanks for the tip.

No ZFS on the system, and the VMs and containers are stored on separate storage; we are evaluating both LVMs on iSCSI and NFS. I will have to check the over-allocation of memory; right now the system seems like there is a generous amount of free RAM.

Using the smem -atkr -s swap command, I see the following, with one VM using the lion's share (7.3G) of the 8GB swap total:
Code:
root@proxmox-2:~# smem -atkr -s swap
    PID User       Command                                                   Swap     USS     PSS     RSS
 594437 root       /usr/bin/kvm -id 125 -name 2025-test,debug-threads=on     7.3G   15.3G   15.4G   24.9G
   4711 root       pvedaemon                                               116.0M    1.0M    6.9M   26.3M
1457443 root       /usr/bin/kvm -id 117 -name Win2025-Test2,debug-thre     113.8M    3.8G    3.8G    4.2G
   5727 root       /usr/bin/kvm -id 106 -name Mattermost,debug-threads=on  113.5M    3.6G    3.7G    3.9G
 195631 root       pvedaemon worker                                        100.7M   12.2M   20.4M   48.0M
 187027 root       pvedaemon worker                                        100.1M   12.7M   21.1M   49.0M
   4718 root       pve-ha-crm                                               95.1M   17.8M   18.9M   22.8M
 182550 root       pvedaemon worker                                         95.0M   21.5M   29.6M   58.6M
   4737 root       pvescheduler                                             89.4M   27.0M   27.1M   29.4M
   4731 root       pve-ha-lrm                                               69.6M   42.9M   44.0M   48.1M
1026509 100103     /usr/bin/java -Djava.awt.headless=true -jar /usr/share   51.2M    1.7G    1.7G    1.7G
1566825 nobody     /usr/libexec/pve-esxi-import-tools/esxi-folder-fuse --   20.3M  114.8M  115.3M  122.0M
   5476 root       /usr/bin/kvm -id 101 -name dc4,debug-threads=on -no-sh   19.5M    4.0G    4.0G    4.2G
3305170 root       /usr/bin/kvm -id 104 -name proxmox-backup,debug-thread    4.3M   31.9G   31.9G   31.9G
2889827 root       (sd-pam)                                                  1.1M    2.7M    3.0M    4.4M
3323409 100000     /sbin/init                                              852.0K    4.2M    5.1M    7.0M
      1 root       /sbin/init                                              840.0K    2.0M    3.8M   11.4M
   4629 www-data   nginx: worker process                                   800.0K  808.0K  848.0K    6.8M
   4630 www-data   nginx: worker process                                   796.0K  880.0K  920.0K    6.9M
[...all normal system processes below here...]

The two biggest culprits, VM IDs 125 and 117, are both running Windows Server 2025. Why it's doing that is another question (really the main question. ;))

EDIT:As for the vm.swappiness setting, the documentation suggests that 10 might be a better value than the default of 60 "if sufficient memory exists in the system".
 
Last edited:
  • Like
Reactions: Johannes S