Hi there,
Please help me identify the problem with our Proxmox is killing just one specific VM with OOM.
Background:
- The host has 16 CPUs, and 64 GB of RAM.
- We have 15 guests on the host.
- The VM that is being shut down uses the most resources, 7 CPUs and 16 GB RAM.
- The problem only started recently and is random.
- At the end of the post is the OOM message.
Where am I confused?
I don't understand from other posts on the forum which memory is running out. Is it the VM host or the VM guest? RAM or swap?
More information:
Please help me identify the problem with our Proxmox is killing just one specific VM with OOM.
Background:
- The host has 16 CPUs, and 64 GB of RAM.
- We have 15 guests on the host.
- The VM that is being shut down uses the most resources, 7 CPUs and 16 GB RAM.
- The problem only started recently and is random.
- At the end of the post is the OOM message.
Where am I confused?
I don't understand from other posts on the forum which memory is running out. Is it the VM host or the VM guest? RAM or swap?
More information:
# cat /proc/sys/vm/swappiness
60
Code:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043642] kvm invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043646] CPU: 7 PID: 23619 Comm: kvm Tainted: P O 5.4.73-1-pve #1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043647] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 3.2 11/19/2019
Mar 25 02:11:39 proxmox1 kernel: [38030728.043648] Call Trace:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043657] dump_stack+0x6d/0x9a
Mar 25 02:11:39 proxmox1 kernel: [38030728.043661] dump_header+0x4f/0x1e1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043664] oom_kill_process.cold.33+0xb/0x10
Mar 25 02:11:39 proxmox1 kernel: [38030728.043666] out_of_memory+0x1ad/0x490
Mar 25 02:11:39 proxmox1 kernel: [38030728.043671] __alloc_pages_slowpath+0xd40/0xe30
Mar 25 02:11:39 proxmox1 kernel: [38030728.043675] ? __switch_to_asm+0x34/0x70
Mar 25 02:11:39 proxmox1 kernel: [38030728.043678] __alloc_pages_nodemask+0x2df/0x330
Mar 25 02:11:39 proxmox1 kernel: [38030728.043682] alloc_pages_current+0x81/0xe0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043685] __get_free_pages+0x11/0x40
Mar 25 02:11:39 proxmox1 kernel: [38030728.043688] __pollwait+0x94/0xd0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043692] eventfd_poll+0x32/0x70
Mar 25 02:11:39 proxmox1 kernel: [38030728.043694] do_sys_poll+0x253/0x510
Mar 25 02:11:39 proxmox1 kernel: [38030728.043698] ? poll_initwait+0x40/0x40
Mar 25 02:11:39 proxmox1 kernel: [38030728.043700] ? poll_select_finish+0x210/0x210
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043716] ? poll_select_finish+0x210/0x210
Mar 25 02:11:39 proxmox1 kernel: [38030728.043719] __x64_sys_ppoll+0xad/0xf0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043723] do_syscall_64+0x57/0x190
Mar 25 02:11:39 proxmox1 kernel: [38030728.043726] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 25 02:11:39 proxmox1 kernel: [38030728.043729] RIP: 0033:0x7f26db3ab916
Mar 25 02:11:39 proxmox1 kernel: [38030728.043731] Code: 7c 24 08 e8 5c 7e 01 00 41 b8 08 00 00 00 4c 8b 54 24 18 48 89 da 41 89 c1 48 8b 74 24 10 48 8b 7c 24 08 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 28 44 89 cf 89 44 24 08 e8 86 7e 01 00 8b 44
Mar 25 02:11:39 proxmox1 kernel: [38030728.043732] RSP: 002b:00007ffc0f54a550 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
Mar 25 02:11:39 proxmox1 kernel: [38030728.043734] RAX: ffffffffffffffda RBX: 00007ffc0f54a570 RCX: 00007f26db3ab916
Mar 25 02:11:39 proxmox1 kernel: [38030728.043736] RDX: 00007ffc0f54a570 RSI: 000000000000004e RDI: 00007f26cdce5c00
Mar 25 02:11:39 proxmox1 kernel: [38030728.043737] RBP: 00007ffc0f54a5e0 R08: 0000000000000008 R09: 0000000000000000
Mar 25 02:11:39 proxmox1 kernel: [38030728.043738] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26ce3b4a00
Mar 25 02:11:39 proxmox1 kernel: [38030728.043739] R13: 00007f26ce3b4a00 R14: 00007ffc0f54a5dc R15: 0000000000000000
Mar 25 02:11:39 proxmox1 kernel: [38030728.043757] Mem-Info:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] active_anon:13333827 inactive_anon:2329387 isolated_anon:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] active_file:405 inactive_file:539 isolated_file:1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] unevictable:19055 dirty:0 writeback:3 unstable:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] slab_reclaimable:80446 slab_unreclaimable:411196
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] mapped:12018 shmem:21693 pagetables:44793 bounce:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] free:83112 free_pcp:1526 free_cma:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043768] Node 0 active_anon:53335308kB inactive_anon:9317548kB active_file:1620kB inactive_file:2156kB unevictable:76220kB isolated(anon):0kB isolated(file):4kB mapped:48072kB dirty:0kB writeback:12kB shmem:86772kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2682880kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 25 02:11:39 proxmox1 kernel: [38030728.043769] Node 0 DMA free:15888kB min:16kB low:28kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15972kB managed:15888kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043774] lowmem_reserve[]: 0 1814 64146 64146 64146
Mar 25 02:11:39 proxmox1 kernel: [38030728.043777] Node 0 DMA32 free:251172kB min:1908kB low:3764kB high:5620kB active_anon:1344576kB inactive_anon:250824kB active_file:0kB inactive_file:312kB unevictable:24kB writepending:0kB present:1965312kB managed:1899776kB mlocked:24kB kernel_stack:32kB pagetables:1268kB bounce:0kB free_pcp:1360kB local_pcp:8kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043781] lowmem_reserve[]: 0 0 62331 62331 62331
Mar 25 02:11:39 proxmox1 kernel: [38030728.043785] Node 0 Normal free:65388kB min:65656kB low:129480kB high:193304kB active_anon:51990952kB inactive_anon:9066896kB active_file:1568kB inactive_file:1348kB unevictable:76196kB writepending:8kB present:65011712kB managed:63835828kB mlocked:76196kB kernel_stack:8496kB pagetables:177904kB bounce:0kB free_pcp:4756kB local_pcp:548kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043790] lowmem_reserve[]: 0 0 0 0 0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043792] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043802] Node 0 DMA32: 6381*4kB (UMEH) 5594*8kB (UMEH) 2955*16kB (UMEH) 1052*32kB (UMEH) 445*64kB (UMH) 261*128kB (UH) 94*256kB (UMH) 20*512kB (UH) 4*1024kB (UH) 0*2048kB 0*4096kB = 251508kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043813] Node 0 Normal: 3*4kB (MH) 939*8kB (UMEH) 2718*16kB (UEH) 361*32kB (UEH) 5*64kB (H) 5*128kB (H) 3*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64292kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043825] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043826] 82105 total pagecache pages
Mar 25 02:11:39 proxmox1 kernel: [38030728.043829] 56428 pages in swap cache
Mar 25 02:11:39 proxmox1 kernel: [38030728.043830] Swap cache stats: add 13680729, delete 13624293, find 675686965/679062801
Mar 25 02:11:39 proxmox1 kernel: [38030728.043831] Free swap = 0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043832] Total swap = 8388604kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043833] 16748249 pages RAM
Mar 25 02:11:39 proxmox1 kernel: [38030728.043833] 0 pages HighMem/MovableOnly
Mar 25 02:11:39 proxmox1 kernel: [38030728.043834] 310376 pages reserved
Mar 25 02:11:39 proxmox1 kernel: [38030728.043834] 0 pages cma reserved
Mar 25 02:11:39 proxmox1 kernel: [38030728.043835] 0 pages hwpoisoned
Mar 25 02:11:39 proxmox1 kernel: [38030728.043836] Tasks state (memory values in pages):
Mar 25 02:11:39 proxmox1 kernel: [38030728.043837] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 25 02:11:39 proxmox1 kernel: [38030728.043846] [ 549] 0 549 32193 5618 290816 16988 0 systemd-journal
Mar 25 02:11:39 proxmox1 kernel: [38030728.043849] [ 557] 0 557 53185 18406 200704 0 -1000 dmeventd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043852] [ 569] 0 569 5718 625 65536 200 -1000 systemd-udevd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043855] [ 818] 106 818 1749 595 49152 100 0 rpcbind
Mar 25 02:11:39 proxmox1 kernel: [38030728.043857] [ 821] 100 821 23270 805 81920 188 0 systemd-timesyn
Mar 25 02:11:39 proxmox1 kernel: [38030728.043859] [ 849] 0 849 136554 518 118784 63 0 pve-lxc-syscall
Mar 25 02:11:39 proxmox1 kernel: [38030728.043861] [ 851] 0 851 56455 635 81920 42 0 rsyslogd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043863] [ 855] 0 855 37717 350 65536 41 0 lxcfs
Mar 25 02:11:39 proxmox1 kernel: [38030728.043865] [ 858] 104 858 2281 655 61440 58 -900 dbus-daemon
Mar 25 02:11:39 proxmox1 kernel: [38030728.043868] [ 861] 0 861 4879 273 77824 181 0 systemd-logind
Mar 25 02:11:39 proxmox1 kernel: [38030728.043870] [ 862] 0 862 3158 521 57344 313 0 smartd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043872] [ 867] 0 867 41689 560 81920 233 0 zed
Mar 25 02:11:39 proxmox1 kernel: [38030728.043874] [ 872] 0 872 535 171 40960 5 -1000 watchdog-mux
Mar 25 02:11:39 proxmox1 kernel: [38030728.043877] [ 882] 0 882 1682 367 53248 21 0 ksmtuned
Mar 25 02:11:39 proxmox1 kernel: [38030728.043879] [ 886] 0 886 1022 369 49152 4 0 qmeventd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043881] [ 950] 0 950 954 93 40960 6 0 lxc-monitord
Mar 25 02:11:39 proxmox1 kernel: [38030728.043883] [ 961] 0 961 568 170 45056 17 0 none
Mar 25 02:11:39 proxmox1 kernel: [38030728.043885] [ 965] 0 965 1722 45 57344 15 0 iscsid
Mar 25 02:11:39 proxmox1 kernel: [38030728.043888] [ 966] 0 966 1848 1253 57344 0 -17 iscsid
Mar 25 02:11:39 proxmox1 kernel: [38030728.043890] [ 972] 0 972 3962 409 73728 181 -1000 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043892] [ 1026] 0 1026 293786 306 241664 209 0 rrdcached
Mar 25 02:11:39 proxmox1 kernel: [38030728.043894] [ 1056] 0 1056 379395 7306 483328 717 0 pmxcfs
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043919] [ 1727] 0 1727 1403 318 45056 25 0 agetty
Mar 25 02:11:39 proxmox1 kernel: [38030728.043922] [ 18413] 0 18413 727057 501394 5451776 31285 0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043924] [ 10483] 0 10483 1868484 1545292 14823424 39976 0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043927] [ 10783] 0 10783 1276146 1053046 10002432 8710 0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043929] [ 9880] 0 9880 758051 477820 5959680 54007 0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043932] [ 29805] 110 29805 291594 5899 2379776 278796 0 snmpd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043934] [ 9834] 0 9834 2321128 1890371 18239488 214415 0 kvm
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043957] [ 12139] 0 12139 4458610 4188872 35303424 15771 0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043959] [ 25309] 0 25309 90380 4492 434176 25873 0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043961] [ 25310] 0 25310 90381 4515 434176 25583 0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043963] [ 29536] 0 29536 21543 125 69632 0 0 pvefw-logger
Mar 25 02:11:39 proxmox1 kernel: [38030728.043966] [ 29548] 33 29548 91740 25826 446464 5686 0 pveproxy worker
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043970] [ 29550] 33 29550 88668 24087 413696 6253 0 pveproxy worker
Mar 25 02:11:39 proxmox1 kernel: [38030728.043972] [ 29551] 33 29551 17619 7438 167936 5168 0 spiceproxy work
Mar 25 02:11:39 proxmox1 kernel: [38030728.043974] [ 664] 0 664 90408 5002 434176 25297 0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043977] [ 19258] 107 19258 10958 718 81920 0 0 pickup
Mar 25 02:11:39 proxmox1 kernel: [38030728.043980] [ 23225] 0 23225 1315 98 49152 0 0 sleep
Mar 25 02:11:39 proxmox1 kernel: [38030728.043982] [ 23259] 0 23259 4176 701 73728 0 0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043984] [ 23260] 105 23260 3962 422 69632 0 0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043986] [ 23262] 0 23262 3962 464 73728 0 0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043989] [ 23264] 0 23264 5718 261 57344 196 0 systemd-udevd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043991] [ 23265] 105 23265 3962 591 73728 0 0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043992] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=12139,uid=0
Mar 25 02:11:39 proxmox1 kernel: [38030728.044023] Out of memory: Killed process 12139 (kvm) total-vm:17834440kB, anon-rss:16753760kB, file-rss:1724kB, shmem-rss:4kB, UID:0 pgtables:34476kB oom_score_adj:0
Mar 25 02:11:40 proxmox1 kernel: [38030729.660594] oom_reaper: reaped process 12139 (kvm), now anon-rss:0kB, file-rss:60kB, shmem-rss:4kB
Mar 25 02:11:41 proxmox1 kernel: [38030729.726974] fwbr100i0: port 2(tap100i0) entered disabled state
Mar 25 02:11:41 proxmox1 kernel: [38030729.727154] fwbr100i0: port 2(tap100i0) entered disabled state
Mar 25 02:11:41 proxmox1 systemd[1]: 100.scope: Succeeded.
Mar 25 02:11:42 proxmox1 qmeventd[874]: Starting cleanup for 100
Mar 25 02:11:42 proxmox1 kernel: [38030730.874109] fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.874164] vmbr0: port 2(fwpr100p0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.874417] device fwln100i0 left promiscuous mode
Mar 25 02:11:42 proxmox1 kernel: [38030730.874421] fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.894961] device fwpr100p0 left promiscuous mode
Mar 25 02:11:42 proxmox1 kernel: [38030730.894964] vmbr0: port 2(fwpr100p0) entered disabled state
Mar 25 02:11:42 proxmox1 qmeventd[874]: Finished cleanup for 100