OOM - Shut down VM

Softwald

New Member
Apr 24, 2024
20
3
3
Hi!
A few weeks ago the pve-firewall of my PVE invoked the oom-killer, which shut down one of my VMs and i couldn't turn it on again without restarting the whole PVE.
Sep 18 01:55:11 pve01 kernel: pve-firewall invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0

The OOM-Killer killed the process 1779 (KVM), here is the log:
Code:
Sep 18 01:55:11 pve01 kernel: Node 0 active_anon:22493748kB inactive_anon:9459004kB active_file:6130884kB inactive_file:17267728kB unevictable:3072kB isolated(anon):0kB isolated(file):0kB mapped:42240kB dirty:4105512kB writeback:45440kB shmem:45008kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:552960kB writeback_tmp:0kB kernel_stack:8288kB pagetables:107948kB sec_pagetables:81468kB all_unreclaimable? no
Sep 18 01:55:11 pve01 kernel: Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 18 01:55:11 pve01 kernel: lowmem_reserve[]: 0 2332 64135 64135 64135
Sep 18 01:55:11 pve01 kernel: Node 0 DMA32 free:455808kB boost:10848kB min:13304kB low:15692kB high:18080kB reserved_highatomic:0KB active_anon:832432kB inactive_anon:186084kB active_file:6780kB inactive_file:414600kB unevictable:0kB writepending:105608kB present:2513916kB managed:2447820kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 18 01:55:11 pve01 kernel: lowmem_reserve[]: 0 0 61802 61802 61802
Sep 18 01:55:11 pve01 kernel: Node 0 Normal free:1199736kB boost:0kB min:65108kB low:128392kB high:191676kB reserved_highatomic:14336KB active_anon:21661132kB inactive_anon:9272924kB active_file:6121924kB inactive_file:16855484kB unevictable:3072kB writepending:4044948kB present:64487424kB managed:63293856kB mlocked:0kB bounce:0kB free_pcp:1420kB local_pcp:0kB free_cma:0kB
Sep 18 01:55:11 pve01 kernel: lowmem_reserve[]: 0 0 0 0 0
Sep 18 01:55:11 pve01 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Sep 18 01:55:11 pve01 kernel: Node 0 DMA32: 36246*4kB (UE) 38849*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 455776kB
Sep 18 01:55:11 pve01 kernel: Node 0 Normal: 48177*4kB (UME) 126251*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1202716kB
Sep 18 01:55:11 pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 18 01:55:11 pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 18 01:55:11 pve01 kernel: 1085442 total pagecache pages
Sep 18 01:55:11 pve01 kernel: 0 pages in swap cache
Sep 18 01:55:11 pve01 kernel: Free swap  = 0kB
Sep 18 01:55:11 pve01 kernel: Total swap = 0kB
Sep 18 01:55:11 pve01 kernel: 16754333 pages RAM
Sep 18 01:55:11 pve01 kernel: 0 pages HighMem/MovableOnly
Sep 18 01:55:11 pve01 kernel: 315074 pages reserved
Sep 18 01:55:11 pve01 kernel: 0 pages hwpoisoned
Sep 18 01:55:11 pve01 kernel: Tasks state (memory values in pages):
Sep 18 01:55:11 pve01 kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
Sep 18 01:55:11 pve01 kernel: [    599]     0   599    14414      480      224      256         0   143360        0          -250 systemd-journal
Sep 18 01:55:11 pve01 kernel: [    615]     0   615     6966      704      512      192         0    73728        0         -1000 systemd-udevd
Sep 18 01:55:11 pve01 kernel: [   1166]   103  1166     1970      352       96      256         0    57344        0             0 rpcbind
Sep 18 01:55:11 pve01 kernel: [   1195]   101  1195     2314      256       96      160         0    57344        0          -900 dbus-daemon
Sep 18 01:55:11 pve01 kernel: [   1199]     0  1199    69539      256       64      192         0   102400        0             0 pve-lxc-syscall
Sep 18 01:55:11 pve01 kernel: [   1202]     0  1202     2994      576      416      160         0    69632        0             0 smartd
Sep 18 01:55:11 pve01 kernel: [   1205]     0  1205     1766      211       83      128         0    53248        0             0 ksmtuned
Sep 18 01:55:11 pve01 kernel: [   1207]     0  1207     4172      416      224      192         0    77824        0             0 systemd-logind
Sep 18 01:55:11 pve01 kernel: [   1208]     0  1208      583       96        0       96         0    40960        0         -1000 watchdog-mux
Sep 18 01:55:11 pve01 kernel: [   1216]     0  1216    60167      416      256      160         0   106496        0             0 zed
Sep 18 01:55:11 pve01 kernel: [   1217]     0  1217    38189      256       64      192         0    73728        0         -1000 lxcfs
Sep 18 01:55:11 pve01 kernel: [   1378]     0  1378     2207      224       64      160         0    57344        0             0 lxc-monitord
Sep 18 01:55:11 pve01 kernel: [   1391]     0  1391     1468      192       32      160         0    49152        0             0 agetty
Sep 18 01:55:11 pve01 kernel: [   1401]     0  1401     3855      576      320      256         0    69632        0         -1000 sshd
Sep 18 01:55:11 pve01 kernel: [   1423]   100  1423     4715      266      138      128         0    69632        0             0 chronyd
Sep 18 01:55:11 pve01 kernel: [   1433]   100  1433     2633      402      114      288         0    65536        0             0 chronyd
Sep 18 01:55:11 pve01 kernel: [   1486]     0  1486   218708      463      276      187         0   241664        0             0 rrdcached
Sep 18 01:55:11 pve01 kernel: [   1576]     0  1576    10665      293      133      160         0    73728        0             0 master
Sep 18 01:55:11 pve01 kernel: [   1578]   104  1578    10774      352      160      192         0    73728        0             0 qmgr
Sep 18 01:55:11 pve01 kernel: [   1584]     0  1584     1652      224       32      192         0    53248        0             0 cron
Sep 18 01:55:11 pve01 kernel: [   1595]     0  1595    39788    24696    24065      352       279   307200        0             0 pve-firewall
Sep 18 01:55:11 pve01 kernel: [   1607]     0  1607    38426    25622    24534      672       416   348160        0             0 pvestatd
Sep 18 01:55:11 pve01 kernel: [   1621]     0  1621    58747    34250    33962      288         0   454656        0             0 pvedaemon
Sep 18 01:55:11 pve01 kernel: [   1630]    33  1630    59090    34584    34296      288         0   466944        0             0 pveproxy
Sep 18 01:55:11 pve01 kernel: [   1637]    33  1637    20194    12896    12576      320         0   200704        0             0 spiceproxy
Sep 18 01:55:11 pve01 kernel: [   1657]     0  1657     3289      391      199      192         0    61440        0             0 swtpm
Sep 18 01:55:11 pve01 kernel: [   1665]     0  1665  2696513  2148776  2148392      384         0 20398080        0             0 kvm
Sep 18 01:55:11 pve01 kernel: [   1772]     0  1772     3289      423      231      192         0    61440        0             0 swtpm
Sep 18 01:55:11 pve01 kernel: [   1779]     0  1779 10206524  8450693  8450213      480         0 80039936        0             0 kvm
Sep 18 01:55:11 pve01 kernel: [   1914]     0  1914    54131    28524    28172      352         0   421888        0             0 pvescheduler
Sep 18 01:55:11 pve01 kernel: [ 617059]     0 617059   144978    14465     4825      256      9384   372736        0             0 pmxcfs
Sep 18 01:55:11 pve01 kernel: [ 617127]     0 617127     1298      224       96      128         0    49152        0             0 proxmox-firewal
Sep 18 01:55:11 pve01 kernel: [ 628823]     0 628823     1328      288       32      256         0    53248        0             0 qmeventd
Sep 18 01:55:11 pve01 kernel: [ 628969]     0 628969    55044    27976    27304      288       384   372736        0             0 pve-ha-lrm
Sep 18 01:55:11 pve01 kernel: [ 628982]     0 628982    55181    28112    27440      352       320   368640        0             0 pve-ha-crm
Sep 18 01:55:11 pve01 kernel: [1133067]     0 1133067    61030    34809    34393      320        96   454656        0             0 pvedaemon worke
Sep 18 01:55:11 pve01 kernel: [1152694]     0 1152694    78038    34943    34527      320        96   466944        0             0 pvedaemon worke
Sep 18 01:55:11 pve01 kernel: [1159536]     0 1159536    61003    34602    34282      288        32   450560        0             0 pvedaemon worke
Sep 18 01:55:11 pve01 kernel: [3380439]    33 3380439    20252    12868    12612      256         0   180224        0             0 spiceproxy work
Sep 18 01:55:11 pve01 kernel: [3380446]    33 3380446    59123    34601    34345      256         0   434176        0             0 pveproxy worker
Sep 18 01:55:11 pve01 kernel: [3380447]    33 3380447    59123    34601    34345      256         0   434176        0             0 pveproxy worker
Sep 18 01:55:11 pve01 kernel: [3380448]    33 3380448    59123    34601    34345      256         0   434176        0             0 pveproxy worker
Sep 18 01:55:11 pve01 kernel: [3247947]     0 3247947    19796      256       32      224         0    61440        0             0 pvefw-logger
Sep 18 01:55:11 pve01 kernel: [3259936]   104 3259936    10765      256      160       96         0    73728        0             0 pickup
Sep 18 01:55:11 pve01 kernel: [3264850]     0 3264850    58148    28869    28517      352         0   401408        0             0 task UPID:pve01
Sep 18 01:55:11 pve01 kernel: [3274823]     0 3274823  4390864    13485    13069      416         0   618496        0             0 kvm
Sep 18 01:55:11 pve01 kernel: [3274893]     0 3274893    69685    13120    13024       96         0   188416        0             0 zstd
Sep 18 01:55:11 pve01 kernel: [3283058]     0 3283058     1366      160        0      160         0    49152        0             0 sleep
Sep 18 01:55:11 pve01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=pve-firewall.service,mems_allowed=0,global_oom,task_memcg=/qemu.slice/102.scope,task=kvm,pid=1779,uid=0
Sep 18 01:55:11 pve01 kernel: Out of memory: Killed process 1779 (kvm) total-vm:40826096kB, anon-rss:33800852kB, file-rss:1920kB, shmem-rss:0kB, UID:0 pgtables:78164kB oom_score_adj:0
Sep 18 01:55:11 pve01 systemd[1]: 102.scope: A process of this unit has been killed by the OOM killer.
Sep 18 01:55:11 pve01 systemd[1]: 102.scope: Failed with result 'oom-kill'.
Sep 18 01:55:11 pve01 systemd[1]: 102.scope: Consumed 1w 21h 25min 50.101s CPU time.
Sep 18 01:55:11 pve01 kernel:  zd16: p1 p2
Sep 18 01:55:11 pve01 kernel:  zd112: p1 p2 p3 p4
Sep 18 01:55:11 pve01 kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Sep 18 01:55:11 pve01 kernel: tap102i0 (unregistering): left allmulticast mode
Sep 18 01:55:11 pve01 kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Sep 18 01:55:12 pve01 qmeventd[3283191]: Starting cleanup for 102
Sep 18 01:55:12 pve01 kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Sep 18 01:55:12 pve01 kernel: vmbr0: port 3(fwpr102p0) entered disabled state
Sep 18 01:55:12 pve01 kernel: fwln102i0 (unregistering): left allmulticast mode
Sep 18 01:55:12 pve01 kernel: fwln102i0 (unregistering): left promiscuous mode
Sep 18 01:55:12 pve01 kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Sep 18 01:55:12 pve01 kernel: fwpr102p0 (unregistering): left allmulticast mode
Sep 18 01:55:12 pve01 kernel: fwpr102p0 (unregistering): left promiscuous mode
Sep 18 01:55:12 pve01 kernel: vmbr0: port 3(fwpr102p0) entered disabled state
Sep 18 01:55:12 pve01 qmeventd[3283191]: Finished cleanup for 102
Sep 18 01:55:14 pve01 kernel: oom_reaper: reaped process 1779 (kvm), now anon-rss:0kB, file-rss:364kB, shmem-rss:0kB

Is there anything to prevent that happening or is it a one time thing?
 
@SkyZoThreaD Thanks for your reply. I checked the zfs arc max size and since it's an 8.2 installation the value is set to 10% of the installed RAM (6,27 GiB) and there should still be 14 GiB of RAM left.

@LnxBil I don't know about swap, I read that it produces a lot of workload and causes problems when you backup to external storages, which is the case here.

The OOM happend at 02:00 a.m. when no work was done on the servers and no backups were running, so there shouldn't be much load on the PVE.
 
How much memory does your system have? Do you allow at least 2GB for Proxmox and enough for ZFS (if you use that)? When any process runs out of memory, OOM will kill the biggest memory user which is usually one of your important VMs, unfortunately.
 
@leesteken
The System has 62,7 GiB of Total Memory.
I have to Windows Server VMs one with 8GiB and one with 32GiB of RAM.
2GiB for Proxmox and 6GiB for ZFS ARC.
So there are still 14GiB RAM left.
1728481283141.png
 
@LnxBil I don't know about swap, I read that it produces a lot of workload and causes problems when you backup to external storages, which is the case here.
Every OS on the planet has built in swap support, even win 3.1 did use a virtual memory page file. Swap (compressed in memory as with zram or disk swap) will solve or at least push the OOM way down the line. Performance wise, you'll need to monitor not the swap usage itself, but the swapin/swapout rate, which will impact performance.

Having the need for a lot of memory while backing up is also common. You'll read data that would not be read before and the OS tries to keep as much of it as possible in the cache. ZFS claims to be a bit smarter when it comes to this unneccesary cache eviction by backup, yet in the end you'll always have memory pressure on backup.


Is it possible that 14 GiB of RAM could become fragmented in that time?
As always "it depends". In the worst case, you can fragment it very quickly in a few minutes if your really want it hard enough. You should monitor your slab allocator via /proc/buddyinfo in order to really answer this for yourself.

Again: Swap will solve this for you, because anything "in the way" of a new allocation will swapped out and you will not see the OOM. If the data is needed again, it will be read from disk.


I have to Windows Server VMs one with 8GiB and one with 32GiB of RAM.
2GiB for Proxmox and 6GiB for ZFS ARC.
So there are still 14GiB RAM left.
Sadly, it's not that simple. Every VM also has virtualized hardware, which also needs memory for e.g. GPU memory, buffers etc. and a VM can easily need 10-15% more RAM as you configured (there was a thread about this a few years, yet I cannot find it again). If you've configured disk cache (anything except none) in the disk settings dialog, this will also need memory. The same is true for pcie passthrough, ballooning etc. All influences how much memory you actuall use. There could also be some memory leaks somewhere that will fill up you memory over time. Anything is possible.
 
  • Like
Reactions: leesteken
@LnxBil Thx for the explanation!
The OOM has only occurred once so far. I will monitor the slab allocator, and if it happens again, I will add swap to see if that solves the problem.

I have 2 ZFS-Pools, both mirrored:
One where Proxmox is installed - 500GB
One for the VMs - 2 TB
So, for adding swap: can i create an 8GiB swapfile dd if=/dev/zero of=/swapfile bs=1M count=8192 on the 500GB disk, use mkswap & swapon and everything will work just fine? Or is there more to it?
 
Last edited:
So, for adding swap: can i create an 8GiB swapfile dd if=/dev/zero of=/swapfile bs=1M count=8192 on the 500GB disk, use mkswap & swapon and everything will work just fine? Or is there more to it?
I would not put swap on a ZFS. I experienced the problems with swap-on-ZFS from the past. I do not know if a swap file instead of a zvol works better. I use a portion of the disk I use for my SLOG for swap too.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!