How to avoid an other pve kernel: Out of memory ?

Thatoo

Member
Jun 11, 2021
56
0
11
38
Hello,

I have a pve running on a 8 GB RAM computer with ARC Max size set to 3 GB and a VM set to 5 Gb (4GB min and ballooning enabled) but I got an oom-kill last night (so the backup failed and the VM remain down till I check) and I wonder how can I avoid it to happen again. Here are systemctl logs :

Code:
Jan 16 01:07:16 pve qmeventd[2679]: Finished cleanup for XXXXX
Jan 16 01:07:16 pve kernel: vmbr0: port 2(fwprXXXXXp0) entered disabled state
Jan 16 01:07:16 pve kernel: fwprXXXXXp0 (unregistering): left promiscuous mode
Jan 16 01:07:16 pve kernel: fwprXXXXXp0 (unregistering): left allmulticast mode
Jan 16 01:07:16 pve kernel: fwbrXXXXXi0: port 1(fwlnXXXXXi0) entered disabled state
Jan 16 01:07:16 pve kernel: fwlnXXXXXi0 (unregistering): left promiscuous mode
Jan 16 01:07:16 pve kernel: fwlnXXXXXi0 (unregistering): left allmulticast mode
Jan 16 01:07:16 pve kernel: vmbr0: port 2(fwprXXXXXp0) entered disabled state
Jan 16 01:07:16 pve kernel: fwbrXXXXXi0: port 1(fwlnXXXXXi0) entered disabled state
Jan 16 01:07:16 pve qmeventd[2679]: Starting cleanup for XXXXX
Jan 16 01:07:15 pve pvescheduler[4194276]: INFO: Backup job finished with errors
Jan 16 01:07:15 pve pvescheduler[4194276]: ERROR: Backup of VM XXXXX failed - VM XXXXX not running
Jan 16 01:07:15 pve pvescheduler[4194276]: VM XXXXX qmp command failed - VM XXXXX not running
Jan 16 01:07:15 pve pvescheduler[4194276]: VM XXXXX qmp command failed - VM XXXXX not running
Jan 16 01:07:15 pve pvescheduler[4194276]: VM XXXXX qmp command failed - VM XXXXX not running
Jan 16 01:07:15 pve systemd[1]: qemu.slice: A process of this unit has been killed by the OOM killer.
Jan 16 01:07:15 pve kernel: fwbrXXXXXi0: port 2(tapXXXXXi0) entered disabled state
Jan 16 01:07:15 pve kernel: tapXXXXXi0 (unregistering): left allmulticast mode
Jan 16 01:07:15 pve kernel: fwbrXXXXXi0: port 2(tapXXXXXi0) entered disabled state
Jan 16 01:07:15 pve systemd[1]: removable-device-attach@1y9iV0-HAjp-fIc5-bZJf-FNEq-rEPM-uINSEf.service: Deactivated successfully.
Jan 16 01:07:15 pve systemd[1]: removable-device-attach@4cf433a2-ebb2-4195-8823-97ed713c0ddb.service: Deactivated successfully.
Jan 16 01:07:15 pve systemd[1]: Started removable-device-attach@1y9iV0-HAjp-fIc5-bZJf-FNEq-rEPM-uINSEf.service - Try to mount the removable device of a dat>
Jan 16 01:07:15 pve systemd[1]: Started removable-device-attach@4cf433a2-ebb2-4195-8823-97ed713c0ddb.service - Try to mount the removable device of a datas>
Jan 16 01:07:15 pve lvm[2665]: /dev/zd16p5 excluded: device is rejected by filter config.
Jan 16 01:07:15 pve kernel:  zd16: p1 p2 < p5 >
Jan 16 01:07:15 pve proxmox-backup-proxy[11665]: removing failed backup
Jan 16 01:07:15 pve proxmox-backup-proxy[11665]: backup failed: connection error: connection reset
Jan 16 01:07:15 pve systemd[1]: XXXXX.scope: Consumed 18h 43min 54.880s CPU time.
Jan 16 01:07:15 pve systemd[1]: XXXXX.scope: Failed with result 'oom-kill'.
Jan 16 01:07:15 pve systemd[1]: XXXXX.scope: A process of this unit has been killed by the OOM killer.
Jan 16 01:07:15 pve kernel: Out of memory: Killed process 1263431 (kvm) total-vm:7772612kB, anon-rss:4777928kB, file-rss:2152kB, shmem-rss:0kB, UID:0 pgtab>
Jan 16 01:07:15 pve kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=proxmox-backup-proxy.service,mems_allowed=0,global_oom,task_memcg=/q>
Jan 16 01:07:15 pve kernel: [   2631]     0  2631     1366      192        0      192         0    49152        0             0 sleep
Jan 16 01:07:15 pve kernel: [4194276]     0 4194276    60373    30020    29764      256         0   397312        0             0 task UPID:pve:0
Jan 16 01:07:15 pve kernel: [4190228]   104 4190228    10764      192      160       32         0    77824        0             0 pickup
Jan 16 01:07:15 pve kernel: [4177396]    33 4177396    60845    35759    35567      192         0   430080        0             0 pveproxy worker
Jan 16 01:07:15 pve kernel: [4177395]    33 4177395    60845    35727    35535      192         0   430080        0             0 pveproxy worker
Jan 16 01:07:15 pve kernel: [4177394]    33 4177394    60845    35727    35535      192         0   430080        0             0 pveproxy worker
Jan 16 01:07:15 pve kernel: [4177393]    33 4177393    22038    13012    12884      128         0   200704        0             0 spiceproxy work
Jan 16 01:07:15 pve kernel: [4177389]     0 4177389    19796       96       32       64         0    61440        0             0 pvefw-logger
Jan 16 01:07:15 pve kernel: [1280750]     0 1280750    62587    35734    35510      160        64   434176        0             0 pvedaemon worke
Jan 16 01:07:15 pve kernel: [1280657]     0 1280657    62585    35702    35478      192        32   434176        0             0 pvedaemon worke
Jan 16 01:07:15 pve kernel: [1275499]     0 1275499    62617    35702    35478      128        96   434176        0             0 pvedaemon worke
Jan 16 01:07:15 pve kernel: [1263431]     0 1263431  1943153  1195020  1194482      538         0 12320768        0             0 kvm
Jan 16 01:07:15 pve kernel: [  11665]    34 11665   612108    25521    25167      354         0  1409024        0             0 proxmox-backup-
Jan 16 01:07:15 pve kernel: [  11655]     0 11655   180891     3062     2767      295         0   253952        0             0 proxmox-backup-
Jan 16 01:07:15 pve kernel: [   7862]     0  7862   190652     7003     6939       64         0   311296        0             0 fail2ban-server
Jan 16 01:07:15 pve kernel: [   5872]     0  5872     3858      384      320       64         0    73728        0         -1000 sshd
Jan 16 01:07:15 pve kernel: [   1069]     0  1069    56058    29286    29094      192         0   372736        0             0 pvescheduler
Jan 16 01:07:15 pve kernel: [   1064]     0  1064    56996    28909    28205      224       480   376832        0             0 pve-ha-lrm
Jan 16 01:07:15 pve kernel: [   1062]    33  1062    21807    12832    12704      128         0   217088        0             0 spiceproxy
Jan 16 01:07:15 pve kernel: [   1051]    33  1051    60641    35524    35332      192         0   475136        0             0 pveproxy
Jan 16 01:07:15 pve kernel: [   1050]     0  1050    57110    28946    28338      160       448   393216        0             0 pve-ha-crm
Jan 16 01:07:15 pve kernel: [   1042]     0  1042    60279    35189    35029      160         0   413696        0             0 pvedaemon
Jan 16 01:07:15 pve kernel: [   1017]     0  1017    50297    25739    25003      256       480   352256        0             0 pve-firewall
Jan 16 01:07:15 pve kernel: [   1016]     0  1016    51715    27195    26331      416       448   372736        0             0 pvestatd
Jan 16 01:07:15 pve kernel: [   1006]     0  1006     1597      224       96      128         0    57344        0             0 proxmox-firewal
Jan 16 01:07:15 pve kernel: [   1005]     0  1005     1652      160       32      128         0    57344        0             0 cron
Jan 16 01:07:15 pve kernel: [   1000]   104  1000    10858      224      160       64         0    81920        0             0 qmgr
Jan 16 01:07:15 pve kernel: [    998]     0   998    10665      165      133       32         0    73728        0             0 master
Jan 16 01:07:15 pve kernel: [    927]     0   927   203893    14152     2997      192     10963   442368        0             0 pmxcfs
Jan 16 01:07:15 pve kernel: [    905]     0   905   200275      403      307       96         0   225280        0             0 rrdcached
Jan 16 01:07:15 pve kernel: [    871]   100   871     2633      179      115       64         0    65536        0             0 chronyd
Jan 16 01:07:15 pve kernel: [    866]   100   866     4715      202      138       64         0    69632        0             0 chronyd
Jan 16 01:07:15 pve kernel: [    817]     0   817     1468       96       32       64         0    57344        0             0 agetty
Jan 16 01:07:15 pve kernel: [    800]     0   800     2207      160       64       96         0    61440        0             0 lxc-monitord
Jan 16 01:07:15 pve kernel: [    690]     0   690    38189       96       32       64         0    65536        0         -1000 lxcfs
Jan 16 01:07:15 pve kernel: [    687]     0   687    60170      352      320       32         0   106496        0             0 zed
Jan 16 01:07:15 pve kernel: [    683]     0   683      583       32        0       32         0    40960        0         -1000 watchdog-mux
Jan 16 01:07:15 pve kernel: [    682]     0   682    12494      288      256       32         0   106496        0             0 systemd-logind
Jan 16 01:07:15 pve kernel: [    681]     0   681     1327      128       32       96         0    49152        0             0 qmeventd
Jan 16 01:07:15 pve kernel: [    675]     0   675     2958      448      384       64         0    69632        0             0 smartd
Jan 16 01:07:15 pve kernel: [    671]     0   671     1766      179       51      128         0    53248        0             0 ksmtuned
Jan 16 01:07:15 pve kernel: [    669]     0   669    69539      128       64       64         0   106496        0             0 pve-lxc-syscall
Jan 16 01:07:15 pve kernel: [    664]   101   664     2319      192      160       32         0    65536        0          -900 dbus-daemon
Jan 16 01:07:15 pve kernel: [    642]   103   642     1970      160       96       64         0    57344        0             0 rpcbind
Jan 16 01:07:15 pve kernel: [    426]     0   426     7027      665      576       89         0    81920        0         -1000 systemd-udevd
Jan 16 01:07:15 pve kernel: [    401]     0   401    16497      320      256       64         0   135168        0          -250 systemd-journal
Jan 16 01:07:15 pve kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
Jan 16 01:07:15 pve kernel: Tasks state (memory values in pages):
Jan 16 01:07:15 pve kernel: 0 pages hwpoisoned
Jan 16 01:07:15 pve kernel: 63434 pages reserved
Jan 16 01:07:15 pve kernel: 0 pages HighMem/MovableOnly
Jan 16 01:07:15 pve kernel: 2073250 pages RAM
Jan 16 01:07:15 pve kernel: Total swap = 0kB
Jan 16 01:07:15 pve kernel: Free swap  = 0kB
Jan 16 01:07:15 pve kernel: 0 pages in swap cache
Jan 16 01:07:15 pve kernel: 12566 total pagecache pages
Jan 16 01:07:15 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 16 01:07:15 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 16 01:07:15 pve kernel: Node 0 Normal: 5926*4kB (UMH) 10264*8kB (UMEH) 1761*16kB (UMH) 177*32kB (UMH) 56*64kB (UMH) 27*128kB (UMH) 0*256kB 0*512kB 0*10>
Jan 16 01:07:15 pve kernel: Node 0 DMA32: 4022*4kB (UMH) 3950*8kB (UMH) 785*16kB (UMH) 623*32kB (UMH) 490*64kB (UM) 132*128kB (UM) 1*256kB (U) 0*512kB 0*10>
Jan 16 01:07:15 pve kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14336kB
Jan 16 01:07:15 pve kernel: lowmem_reserve[]: 0 0 0 0 0
Jan 16 01:07:15 pve kernel: Node 0 Normal free:149736kB boost:0kB min:38208kB low:47760kB high:57312kB reserved_highatomic:18432KB active_anon:1301964kB in>
Jan 16 01:07:15 pve kernel: lowmem_reserve[]: 0 0 4401 4401 4401
Jan 16 01:07:15 pve kernel: Node 0 DMA32 free:128256kB boost:0kB min:29240kB low:36548kB high:43856kB reserved_highatomic:6144KB active_anon:1159112kB inac>
Jan 16 01:07:15 pve kernel: lowmem_reserve[]: 0 3368 7770 7770 7770
Jan 16 01:07:15 pve kernel: Node 0 DMA free:14336kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB acti>
Jan 16 01:07:15 pve kernel: Node 0 active_anon:2444864kB inactive_anon:2820300kB active_file:1492kB inactive_file:2728kB unevictable:124kB isolated(anon):0>
Jan 16 01:07:15 pve kernel: active_anon:725661 inactive_anon:590630 isolated_anon:0
                             active_file:486 inactive_file:762 isolated_file:0
                             unevictable:31 dirty:6 writeback:3
                             slab_reclaimable:2421 slab_unreclaimable:76508
                             mapped:12206 shmem:11572 pagetables:5673
                             sec_pagetables:2411 bounce:0
                             kernel_misc_reclaimable:0
                                                          free:62183 free_pcp:36 free_cma:0
Jan 16 01:07:15 pve kernel: Mem-Info:
Jan 16 01:07:15 pve kernel:  </TASK>
Jan 16 01:07:15 pve kernel: R13: 000073e5a4036b38 R14: 0000000000004000 R15: 000073e5d3a700b8
Jan 16 01:07:15 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jan 16 01:07:15 pve kernel: RBP: 0000000000000000 R08: 000073e5d3a700b8 R09: 0000000000000000
Jan 16 01:07:15 pve kernel: RDX: 0000000000004000 RSI: 000073e5fc0721a0 RDI: 000073e5d3a71000
Jan 16 01:07:15 pve kernel: RAX: 000073e5d3a700b8 RBX: 0000000000390000 RCX: 00000000000030b8
Jan 16 01:07:15 pve kernel: RSP: 002b:000073e562df2a68 EFLAGS: 00010202
Jan 16 01:07:15 pve kernel: Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 2>
Jan 16 01:07:15 pve kernel: RIP: 0033:0x73e611a51c4a
Jan 16 01:07:15 pve kernel:  asm_exc_page_fault+0x27/0x30
Jan 16 01:07:15 pve kernel:  exc_page_fault+0x83/0x1b0
Jan 16 01:07:15 pve kernel:  do_user_addr_fault+0x169/0x660
Jan 16 01:07:15 pve kernel:  handle_mm_fault+0x18d/0x380
Jan 16 01:07:15 pve kernel:  __handle_mm_fault+0xbf1/0xf20
Jan 16 01:07:15 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
Jan 16 01:07:15 pve kernel:  do_anonymous_page+0x21e/0x740
Jan 16 01:07:15 pve kernel:  vma_alloc_folio+0x64/0xe0
Jan 16 01:07:15 pve kernel:  ? __mod_lruvec_state+0x36/0x50
Jan 16 01:07:15 pve kernel:  alloc_pages_mpol+0x91/0x1f0
Jan 16 01:07:15 pve kernel:  __alloc_pages+0x10ce/0x1320
Jan 16 01:07:15 pve kernel:  out_of_memory+0x26e/0x560
Jan 16 01:07:15 pve kernel:  oom_kill_process+0x110/0x240
Jan 16 01:07:15 pve kernel:  dump_header+0x47/0x1f0
Jan 16 01:07:15 pve kernel:  dump_stack+0x10/0x20
Jan 16 01:07:15 pve kernel:  dump_stack_lvl+0x76/0xa0
Jan 16 01:07:15 pve kernel:  <TASK>
Jan 16 01:07:15 pve kernel: Call Trace:
Jan 16 01:07:15 pve kernel: Hardware name: LENOVO 10M8S93W00/3102, BIOS M16KT53A 11/27/2018
Jan 16 01:07:15 pve kernel: CPU: 1 PID: 501 Comm: tokio-runtime-w Tainted: P           O       6.8.12-5-pve #1
Jan 16 01:07:15 pve kernel: tokio-runtime-w invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
Jan 16 01:04:41 pve proxmox-backup-proxy[11665]: rrd journal successfully committed (25 files in 0.008 seconds)
Jan 16 01:00:02 pve proxmox-backup-proxy[11665]: add blob "/mnt/datastore/PBS_local/vm/XXXXX/2025-01-16T00:00:01Z/qemu-server.conf.blob" (467 bytes, comp:>
Jan 16 01:00:02 pve proxmox-backup-proxy[11665]: created new fixed index 1 ("vm/XXXXX/2025-01-16T00:00:01Z/drive-scsi0.img.fidx")
Jan 16 01:00:01 pve proxmox-backup-proxy[11665]: download 'drive-scsi0.img.fidx' from previous backup.
Jan 16 01:00:01 pve proxmox-backup-proxy[11665]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
Jan 16 01:00:01 pve proxmox-backup-proxy[11665]: download 'index.json.blob' from previous backup.
Jan 16 01:00:01 pve proxmox-backup-proxy[11665]: starting new backup on datastore 'PBS_local' from ::ffff:192.168.XXX.XXX: "vm/XXXXX/2025-01-16T00:00:01Z"
 
I have a pve running on a 8 GB RAM computer with ARC Max size set to 3 GB and a VM set to 5 Gb (4GB min and ballooning enabled) but I got an oom-kill last night (so the backup failed and the VM remain down till I check) and I wonder how can I avoid it to happen again.
Proxmox also needs 1 GB. Reduce the VM max. memory (ballooning won't help in your case) and the ARC further (like 1GB or don't use ZFS and keep 1GB free for cache), or install more memory.
 
What would you recommend instead of ZFS to be able to benefit from snapshot and proxmox backup?
I don't understand the backup part of your question but LVM-thin also supports snapshots: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_storage Or plain qcow2 files on directory storage.
I personally prefer ZFS (since BTRFS has the strict coupling of CRC and COW) and a small ARC might be enough for you: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_limit_memory_usage . But the RAM size suggest you have very week consumer hardware in general and SSD in particular.
 
Sorry, I meant Proxmox Backup Server.

Thank you for your answer.

I don't understand what you meant by
you have very weak consumer hardware in general and SSD in particular
pve and the VM are indeed running on a 1TB SSD. The VM has a 900GB disk on the ZFS pool.
Do you mean that with such conf, an ARC Max size set to 2GB would be enough and I can keep the VM to 4GB/5GB or I should reduce both, the ARC Max size to 2GB and the VM to 3GB/4GB and it will probably be working well because of "very weak consumer hardware"?

How can you tell about "very weak consumer hardware"? I'd like to understand to be able to see it by myself next time.
 
Sorry, I meant Proxmox Backup Server.
That's not related to storage type, so I still don't understand.
I don't understand what you meant by
"you have very weak consumer hardware in general and SSD in particular"
I mean that you are not using enterprise SSD with PLP but are probably using a consumer QLC SSD drive.
Do you mean that with such conf, an ARC Max size set to 2GB would be enough
I think 512MB or 1GB is enough but just test it yourself for your workload.
and I can keep the VM to 4GB/5GB
Since you stated that the minimum for the VM is 4GB, I would set it to 4GB. Ballooning won't really do anything for you since it kicks in at 80% host memory usage, which is probably all the time for your system. And you are only running one VM (which is contrary to virtualization) so it cannot share with other VMs.
it will probably be working well because of "very weak consumer hardware"?
How can you tell about "very weak consumer hardware"? I'd like to understand to be able to see it by myself next time.
ZFS write performance and write-amplification is terrible on consumer QLC drives. Proxmox VE eats QLC drives quite quickly and there are enough threads about that already on this forum. You did not share any information about the drive but I assume (since you onyl have 8GB of RAM) that you use "very weak consumer hardware".
 
You could turn of COW and CRC with mount options on BTRFS but not on ZFS. So it's the other way around.
Yes but I want CRC without COW and that is not possible on BTRFS, so I might as well stick to ZFS.

EDIT: I guess I'm not capable to explain to you want I want and/or why BTRFS is not better for what I want than ZFS. But that's why I stay on ZFS.
 
Last edited:
if it is perhaps a simple computer (and not a server), with little RAM, recent consumer disk (which seem to me to be worsening in actual performance when seen with sustained use, for example virtualization and databases) and since it needs snapshots I would recommend lvm-thin

EDIT:
from log posted I saw:
Code:
Jan 16 01:07:15 pve kernel: Hardware name: LENOVO 10M8S93W00/3102, BIOS M16KT53A 11/27/2018
from a fast search https://browser.geekbench.com/v6/cpu/4979506
is it really an old i3?
 
Last edited:
You're way overcommitting with that tiny amount of RAM available.

https://github.com/kneutron/ansitest/blob/master/proxmox/oom-killer-adjust.sh

Reduce ZFS ARC size to 1GB; you can add an l2arc cache device to a spinner zpool with something as cheap as a 64GB USB3 PNY thumbdrive. If it dies, it's disposable / easily replaced / won't kill your pool.

zpool add zpoolname cache /dev/disk/by-id/blah


Upgrade your RAM to at least 16GB; if that hardware is incapable for what you are trying to achieve, then you arguably should look into investing in a better potato.

https://www.aliexpress.us/item/3256806259128837.html#nav-specification
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!