VM getting OOM killed by proxmox

aveekkumar

New Member
Dec 3, 2022
6
0
1
Hi Community ,

We have a Proxmox node (1.48 TiB ) with 2 VMs(750.00 GiB Memory on each) . We have been seeing the VMs being OOM killed by kvm.

VMs are running ubuntu 22.04 and have the following settings to protect from over committing memory .

Code:
root@dhq5:~# sysctl -a | grep vm.overcommit
vm.overcommit_kbytes = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 100

However, the proxmox is invoking oom-kill . Can you please advise if we should reduce the memory on the VMs by 5-10G which will leave enough memory for proxmox.
Code:
[Mon Jan  8 22:44:18 2024] pve-firewall invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0

[Mon Jan  8 22:44:18 2024] CPU: 47 PID: 2502 Comm: pve-firewall Tainted: P           O      5.4.106-1-pve #1

[Mon Jan  8 22:44:18 2024] Hardware name: Penguin Computing Relion XE1112/MR91-FS0-ZB, BIOS R08 11/18/2019

[Mon Jan  8 22:44:18 2024] Call Trace:

[Mon Jan  8 22:44:18 2024]  dump_stack+0x6d/0x8b

[Mon Jan  8 22:44:18 2024]  dump_header+0x4f/0x1e1

[Mon Jan  8 22:44:18 2024]  oom_kill_process.cold.33+0xb/0x10

[Mon Jan  8 22:44:18 2024]  out_of_memory+0x1ad/0x490

[Mon Jan  8 22:44:18 2024]  __alloc_pages_slowpath+0xd40/0xe30

[Mon Jan  8 22:44:18 2024]  ? dmu_buf_rele_array.part.6+0x52/0x60 [zfs]

[Mon Jan  8 22:44:18 2024]  __alloc_pages_nodemask+0x2df/0x330

[Mon Jan  8 22:44:18 2024]  alloc_pages_current+0x81/0xe0

[Mon Jan  8 22:44:18 2024]  __get_free_pages+0x11/0x40

[Mon Jan  8 22:44:18 2024]  pgd_alloc+0x36/0x1e0

[Mon Jan  8 22:44:18 2024]  mm_init+0x195/0x280

[Mon Jan  8 22:44:18 2024]  dup_mm+0x68/0x5c0

[Mon Jan  8 22:44:18 2024]  ? __lock_task_sighand+0x28/0x70

[Mon Jan  8 22:44:18 2024]  copy_process+0x18a9/0x1b60

[Mon Jan  8 22:44:18 2024]  _do_fork+0x85/0x350

[Mon Jan  8 22:44:18 2024]  ? recalc_sigpending+0x1b/0x60

[Mon Jan  8 22:44:18 2024]  ? __set_task_blocked+0x72/0x90

[Mon Jan  8 22:44:18 2024]  __x64_sys_clone+0x8f/0xb0

[Mon Jan  8 22:44:18 2024]  do_syscall_64+0x57/0x190

[Mon Jan  8 22:44:18 2024]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[Mon Jan  8 22:44:18 2024] RIP: 0033:0x7ff1f95117be

[Mon Jan  8 22:44:18 2024] Code: db 0f 85 25 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b6 00 00 00 41 89 c4 85 c0 0f 85 c3 00 00

[Mon Jan  8 22:44:18 2024] RSP: 002b:00007ffe524dccb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038

[Mon Jan  8 22:44:18 2024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff1f95117be

[Mon Jan  8 22:44:18 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011

[Mon Jan  8 22:44:18 2024] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff1f94101c0

[Mon Jan  8 22:44:18 2024] R10: 00007ff1f9410490 R11: 0000000000000246 R12: 000056331b1e5bc8

[Mon Jan  8 22:44:18 2024] R13: 00007ffe524dccf0 R14: 000056331a741260 R15: 0000000000000000

[Mon Jan  8 22:44:18 2024] Mem-Info:

[Mon Jan  8 22:44:18 2024] active_anon:385572393 inactive_anon:288200 isolated_anon:0

                            active_file:175 inactive_file:203 isolated_file:0

                            unevictable:40548 dirty:3 writeback:13 unstable:0

                            slab_reclaimable:62903 slab_unreclaimable:3854645

                            mapped:26907 shmem:686369 pagetables:769206 bounce:0

                            free:446142 free_pcp:614 free_cma:0

[Mon Jan  8 22:44:18 2024] Node 0 active_anon:772684752kB inactive_anon:780872kB active_file:468kB inactive_file:388kB unevictable:3316kB isolated(anon):0kB isolated(file):0kB mapped:57340kB dirty:0kB writeback:40kB shmem:1766588

kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 646621184kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

[Mon Jan  8 22:44:18 2024] Node 1 active_anon:769605428kB inactive_anon:371928kB active_file:232kB inactive_file:440kB unevictable:158876kB isolated(anon):0kB isolated(file):0kB mapped:50356kB dirty:12kB writeback:12kB shmem:9788

88kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 69619712kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

[Mon Jan  8 22:44:18 2024] Node 0 DMA free:15884kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15900kB mlocked:0kB kernel

_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[Mon Jan  8 22:44:18 2024] lowmem_reserve[]: 0 1563 772617 772617 772617

[Mon Jan  8 22:44:18 2024] Node 0 DMA32 free:1648720kB min:88kB low:1688kB high:3288kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1717652kB managed:1648724kB mlock

ed:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[Mon Jan  8 22:44:18 2024] lowmem_reserve[]: 0 0 771054 771054 771054

[Mon Jan  8 22:44:18 2024] Node 0 Normal free:54112kB min:44920kB low:834476kB high:1624032kB active_anon:772685256kB inactive_anon:780872kB active_file:468kB inactive_file:388kB unevictable:3316kB writepending:40kB present:80216

0640kB managed:789567488kB mlocked:3316kB kernel_stack:6088kB pagetables:1786456kB bounce:0kB free_pcp:2212kB local_pcp:0kB free_cma:0kB

[Mon Jan  8 22:44:18 2024] lowmem_reserve[]: 0 0 0 0 0

[Mon Jan  8 22:44:18 2024] Node 1 Normal free:63900kB min:45096kB low:837812kB high:1630528kB active_anon:769605932kB inactive_anon:371928kB active_file:232kB inactive_file:440kB unevictable:158876kB writepending:24kB present:805

306368kB managed:792716652kB mlocked:158876kB kernel_stack:10728kB pagetables:1290368kB bounce:0kB free_pcp:2168kB local_pcp:0kB free_cma:0kB

[Mon Jan  8 22:44:18 2024] lowmem_reserve[]: 0 0 0 0 0

[Mon Jan  8 22:44:18 2024] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB

[Mon Jan  8 22:44:18 2024] Node 0 DMA32: 8*4kB (UM) 6*8kB (UM) 6*16kB (UM) 5*32kB (UM) 8*64kB (UM) 6*128kB (UM) 6*256kB (UM) 6*512kB (UM) 8*1024kB (UM) 6*2048kB (UM) 396*4096kB (M) = 1648720kB

[Mon Jan  8 22:44:18 2024] Node 0 Normal: 2980*4kB (UMH) 2431*8kB (UMH) 907*16kB (UMEH) 287*32kB (UMEH) 5*64kB (H) 3*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55768kB

[Mon Jan  8 22:44:18 2024] Node 1 Normal: 5328*4kB (UMH) 262*8kB (UMH) 6*16kB (UE) 1320*32kB (UEH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 65744kB

[Mon Jan  8 22:44:18 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB

[Mon Jan  8 22:44:18 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

[Mon Jan  8 22:44:18 2024] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB

[Mon Jan  8 22:44:18 2024] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

[Mon Jan  8 22:44:18 2024] 689899 total pagecache pages

[Mon Jan  8 22:44:18 2024] 0 pages in swap cache

[Mon Jan  8 22:44:18 2024] Swap cache stats: add 0, delete 0, find 0/0

[Mon Jan  8 22:44:18 2024] Free swap  = 0kB

[Mon Jan  8 22:44:18 2024] Total swap = 0kB

[Mon Jan  8 22:44:18 2024] 402300162 pages RAM

[Mon Jan  8 22:44:18 2024] 0 pages HighMem/MovableOnly

[Mon Jan  8 22:44:18 2024] 6312971 pages reserved

[Mon Jan  8 22:44:18 2024] 0 pages cma reserved

[Mon Jan  8 22:44:18 2024] 0 pages hwpoisoned

[Mon Jan  8 22:44:18 2024] Tasks state (memory values in pages):

[Mon Jan  8 22:44:18 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name

[Mon Jan  8 22:44:18 2024] [   1651]     0  1651    40709    25201   360448        0             0 systemd-journal

[Mon Jan  8 22:44:18 2024] [   1660]     0  1660     5775      637    65536        0         -1000 systemd-udevd

[Mon Jan  8 22:44:18 2024] [   1966]   106  1966     1727      428    49152        0             0 rpcbind

[Mon Jan  8 22:44:18 2024] [   2024]   104  2024     2386      664    53248        0          -900 dbus-daemon

[Mon Jan  8 22:44:18 2024] [   2025]     0  2025    37717      267    57344        0             0 lxcfs

[Mon Jan  8 22:44:18 2024] [   2028]     0  2028     3060      703    69632        0             0 smartd

[Mon Jan  8 22:44:18 2024] [   2031]     0  2031      535      227    40960        0         -1000 watchdog-mux

[Mon Jan  8 22:44:18 2024] [   2034]     0  2034     4880      578    77824        0             0 systemd-logind

[Mon Jan  8 22:44:18 2024] [   2036]     0  2036    41547      712    77824        0             0 zed

[Mon Jan  8 22:44:18 2024] [   2041]     0  2041     1022      329    45056        0             0 qmeventd

[Mon Jan  8 22:44:18 2024] [   2042]     0  2042   406938      415   249856        0             0 pve-lxc-syscall

[Mon Jan  8 22:44:18 2024] [   2053]     0  2053     1681      312    53248        0             0 ksmtuned

[Mon Jan  8 22:44:18 2024] [   2147]     0  2147     1823      201    53248        0             0 lxc-monitord

[Mon Jan  8 22:44:18 2024] [   2161]     0  2161      568      141    45056        0             0 none

[Mon Jan  8 22:44:18 2024] [   2163]     0  2163     3962      610    65536        0         -1000 sshd

[Mon Jan  8 22:44:18 2024] [   2165]     0  2165     1722      388    53248        0             0 iscsid

[Mon Jan  8 22:44:18 2024] [   2167]     0  2167     1848     1321    57344        0           -17 iscsid

[Mon Jan  8 22:44:18 2024] [   2187]     0  2187     1402      379    45056        0             0 agetty

[Mon Jan  8 22:44:18 2024] [   2250]     0  2250   146324      780   176128        0             0 rrdcached

[Mon Jan  8 22:44:18 2024] [   2391]     0  2391    10868      640    86016        0             0 master

[Mon Jan  8 22:44:18 2024] [   2399]     0  2399     2125      513    53248        0             0 cron

[Mon Jan  8 22:44:18 2024] [   2502]     0  2502    76527    21776   303104        0             0 pve-firewall

[Mon Jan  8 22:44:18 2024] [   2503]     0  2503    76173    21553   315392        0             0 pvestatd

[Mon Jan  8 22:44:18 2024] [   2530]     0  2530    88822    30700   442368        0             0 pvedaemon

[Mon Jan  8 22:44:18 2024] [   2538]     0  2538    84515    24505   376832        0             0 pve-ha-crm

[Mon Jan  8 22:44:18 2024] [   2540]    33  2540    89241    31314   438272        0             0 pveproxy

[Mon Jan  8 22:44:18 2024] [   2547]    33  2547    17627    12845   180224        0             0 spiceproxy

[Mon Jan  8 22:44:18 2024] [   2549]     0  2549    84420    24406   368640        0             0 pve-ha-lrm

[Mon Jan  8 22:44:18 2024] [  29203]     0 29203   181764     2522   184320        0             0 node_exporter

[Mon Jan  8 22:44:18 2024] [  29766]     0 29766    73879      940    94208        0             0 rsyslogd

[Mon Jan  8 22:44:18 2024] [  27452]   110 27452     7564      725    94208        0             0 lldpd

[Mon Jan  8 22:44:18 2024] [  27454]   110 27454     7594      619    90112        0             0 lldpd

[Mon Jan  8 22:44:18 2024] [  31464]     0 31464     4941      673    81920        0             0 wpa_supplicant

[Mon Jan  8 22:44:18 2024] [  31738]     0 31738    83282     1332   135168        0             0 NetworkManager

[Mon Jan  8 22:44:18 2024] [  31788]     0 31788    58956      847    94208        0             0 polkitd

[Mon Jan  8 22:44:18 2024] [  31824]     0 31824    79589      879   118784        0             0 ModemManager

[Mon Jan  8 22:44:18 2024] [  12199]   107 12199    10969      672    77824        0             0 qmgr

[Mon Jan  8 22:44:18 2024] [  47409]     0 47409   143516    44653   425984        0             0 corosync

[Mon Jan  8 22:44:18 2024] [  47418]     0 47418   205668    12530   405504        0             0 pmxcfs

[Mon Jan  8 22:44:18 2024] [   8260]   108  8260    14488    12827   151552        0             0 rpc.statd

[Mon Jan  8 22:44:18 2024] [   1067]     0  1067 197602803 195926247 1574490112        0             0 kvm

[Mon Jan  8 22:44:18 2024] [  46759]     0 46759     1028      602    45056        0             0 ptp4l

[Mon Jan  8 22:44:18 2024] [  46762]     0 46762     1019      591    53248        0             0 phc2sys

[Mon Jan  8 22:44:18 2024] [  47602]     0 47602    91283    31601   450560        0             0 pvedaemon worke

[Mon Jan  8 22:44:18 2024] [   4426]     0  4426 197519747 195905039 1574113280        0             0 kvm

[Mon Jan  8 22:44:18 2024] [  12063]     0 12063    90846    31129   442368        0             0 pvedaemon worke

[Mon Jan  8 22:44:18 2024] [  43655]     0 43655    90847    30875   442368        0             0 pvedaemon worke

[Mon Jan  8 22:44:18 2024] [  30266]    33 30266    17688    12484   176128        0             0 spiceproxy work

[Mon Jan  8 22:44:18 2024] [  30268]     0 30268    21543      378    65536        0             0 pvefw-logger

[Mon Jan  8 22:44:18 2024] [  30273]    33 30273    91292    31472   438272        0             0 pveproxy worker

[Mon Jan  8 22:44:18 2024] [  30274]    33 30274    91298    31685   438272        0             0 pveproxy worker

[Mon Jan  8 22:44:18 2024] [  30275]    33 30275    91292    31593   438272        0             0 pveproxy worker

[Mon Jan  8 22:44:18 2024] [  38934]   107 38934    10957      627    86016        0             0 pickup

[Mon Jan  8 22:44:18 2024] [  28277]     0 28277     1314      169    45056        0             0 sleep

[Mon Jan  8 22:44:18 2024] [  28363]     0 28363     5775      600    61440        0             0 systemd-udevd

[Mon Jan  8 22:44:18 2024] [  28472]     0 28472     5775      580    61440        0             0 systemd-udevd

[Mon Jan  8 22:44:18 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/qemu.slice/96126.scope,task=kvm,pid=1067,uid=0

[Mon Jan  8 22:44:18 2024] Out of memory: Killed process 1067 (kvm) total-vm:790411212kB, anon-rss:783702608kB, file-rss:2376kB, shmem-rss:4kB, UID:0 pgtables:1537588kB oom_score_adj:0

[Mon Jan  8 22:45:56 2024] oom_reaper: reaped process 1067 (kvm), now anon-rss:0kB, file-rss:188kB, shmem-rss:4kB
 
Hi,
We have a Proxmox node (1.48 TiB ) with 2 VMs(750.00 GiB Memory on each) . We have been seeing the VMs being OOM killed by kvm.
you have to consider that the host will also need memory for IO/network cache etc, host processes will also need memory and even QEMU processes need a bit more memory than is assigned to the guest (in particular during certain operations like backup). If you are using ZFS as a storage, please also see: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage
 
Thank you Fiona for sharing your insights. Both the VMs on the node are hosted on NFS storage. Do you have any recommendations on how much memory to reserve for host processes when NFS is used as storage.

Or can we apply the same thumb rule of ZFS : As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage.
 
Disabling overcommit can be problematic in general. Some apps allocate a large amount of virtual memory that they never use. If overcommit is disabled you will need a lot more real memory or swap if you run anything that does that on the host or in a container.
 
  • Like
Reactions: fiona
Thank you Fiona for sharing your insights. Both the VMs on the node are hosted on NFS storage. Do you have any recommendations on how much memory to reserve for host processes when NFS is used as storage.
It's difficult to give a general suggestion, because it depends on the workload. Just my personal guess, but since you have rather heavy guests, and much RAM available in total, I'd give at least 10 GiB to the host. If you expect much guest IO to be done or backups or snapshots, I'd even go for something like 30 GiB. Disclaimer: I'm not a sysadmin ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!