Please help identify OOM kernel crash - Host RAM, Host swap, VM RAM, or VM Swap?

eugenevdm

Member
Dec 13, 2020
55
11
13
53
Hi there,

Please help me identify the problem with our Proxmox is killing just one specific VM with OOM.

Background:

- The host has 16 CPUs, and 64 GB of RAM.
- We have 15 guests on the host.
- The VM that is being shut down uses the most resources, 7 CPUs and 16 GB RAM.
- The problem only started recently and is random.
- At the end of the post is the OOM message.

Where am I confused?

I don't understand from other posts on the forum which memory is running out. Is it the VM host or the VM guest? RAM or swap?

More information:
host_summary.png


# cat /proc/sys/vm/swappiness 60

Code:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043642] kvm invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043646] CPU: 7 PID: 23619 Comm: kvm Tainted: P           O      5.4.73-1-pve #1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043647] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 3.2 11/19/2019
Mar 25 02:11:39 proxmox1 kernel: [38030728.043648] Call Trace:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043657]  dump_stack+0x6d/0x9a
Mar 25 02:11:39 proxmox1 kernel: [38030728.043661]  dump_header+0x4f/0x1e1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043664]  oom_kill_process.cold.33+0xb/0x10
Mar 25 02:11:39 proxmox1 kernel: [38030728.043666]  out_of_memory+0x1ad/0x490
Mar 25 02:11:39 proxmox1 kernel: [38030728.043671]  __alloc_pages_slowpath+0xd40/0xe30
Mar 25 02:11:39 proxmox1 kernel: [38030728.043675]  ? __switch_to_asm+0x34/0x70
Mar 25 02:11:39 proxmox1 kernel: [38030728.043678]  __alloc_pages_nodemask+0x2df/0x330
Mar 25 02:11:39 proxmox1 kernel: [38030728.043682]  alloc_pages_current+0x81/0xe0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043685]  __get_free_pages+0x11/0x40
Mar 25 02:11:39 proxmox1 kernel: [38030728.043688]  __pollwait+0x94/0xd0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043692]  eventfd_poll+0x32/0x70
Mar 25 02:11:39 proxmox1 kernel: [38030728.043694]  do_sys_poll+0x253/0x510
Mar 25 02:11:39 proxmox1 kernel: [38030728.043698]  ? poll_initwait+0x40/0x40
Mar 25 02:11:39 proxmox1 kernel: [38030728.043700]  ? poll_select_finish+0x210/0x210
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043716]  ? poll_select_finish+0x210/0x210
Mar 25 02:11:39 proxmox1 kernel: [38030728.043719]  __x64_sys_ppoll+0xad/0xf0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043723]  do_syscall_64+0x57/0x190
Mar 25 02:11:39 proxmox1 kernel: [38030728.043726]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 25 02:11:39 proxmox1 kernel: [38030728.043729] RIP: 0033:0x7f26db3ab916
Mar 25 02:11:39 proxmox1 kernel: [38030728.043731] Code: 7c 24 08 e8 5c 7e 01 00 41 b8 08 00 00 00 4c 8b 54 24 18 48 89 da 41 89 c1 48 8b 74 24 10 48 8b 7c 24 08 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 28 44 89 cf 89 44 24 08 e8 86 7e 01 00 8b 44
Mar 25 02:11:39 proxmox1 kernel: [38030728.043732] RSP: 002b:00007ffc0f54a550 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
Mar 25 02:11:39 proxmox1 kernel: [38030728.043734] RAX: ffffffffffffffda RBX: 00007ffc0f54a570 RCX: 00007f26db3ab916
Mar 25 02:11:39 proxmox1 kernel: [38030728.043736] RDX: 00007ffc0f54a570 RSI: 000000000000004e RDI: 00007f26cdce5c00
Mar 25 02:11:39 proxmox1 kernel: [38030728.043737] RBP: 00007ffc0f54a5e0 R08: 0000000000000008 R09: 0000000000000000
Mar 25 02:11:39 proxmox1 kernel: [38030728.043738] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26ce3b4a00
Mar 25 02:11:39 proxmox1 kernel: [38030728.043739] R13: 00007f26ce3b4a00 R14: 00007ffc0f54a5dc R15: 0000000000000000
Mar 25 02:11:39 proxmox1 kernel: [38030728.043757] Mem-Info:
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764] active_anon:13333827 inactive_anon:2329387 isolated_anon:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764]  active_file:405 inactive_file:539 isolated_file:1
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764]  unevictable:19055 dirty:0 writeback:3 unstable:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764]  slab_reclaimable:80446 slab_unreclaimable:411196
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764]  mapped:12018 shmem:21693 pagetables:44793 bounce:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043764]  free:83112 free_pcp:1526 free_cma:0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043768] Node 0 active_anon:53335308kB inactive_anon:9317548kB active_file:1620kB inactive_file:2156kB unevictable:76220kB isolated(anon):0kB isolated(file):4kB mapped:48072kB dirty:0kB writeback:12kB shmem:86772kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2682880kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 25 02:11:39 proxmox1 kernel: [38030728.043769] Node 0 DMA free:15888kB min:16kB low:28kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15972kB managed:15888kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043774] lowmem_reserve[]: 0 1814 64146 64146 64146
Mar 25 02:11:39 proxmox1 kernel: [38030728.043777] Node 0 DMA32 free:251172kB min:1908kB low:3764kB high:5620kB active_anon:1344576kB inactive_anon:250824kB active_file:0kB inactive_file:312kB unevictable:24kB writepending:0kB present:1965312kB managed:1899776kB mlocked:24kB kernel_stack:32kB pagetables:1268kB bounce:0kB free_pcp:1360kB local_pcp:8kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043781] lowmem_reserve[]: 0 0 62331 62331 62331
Mar 25 02:11:39 proxmox1 kernel: [38030728.043785] Node 0 Normal free:65388kB min:65656kB low:129480kB high:193304kB active_anon:51990952kB inactive_anon:9066896kB active_file:1568kB inactive_file:1348kB unevictable:76196kB writepending:8kB present:65011712kB managed:63835828kB mlocked:76196kB kernel_stack:8496kB pagetables:177904kB bounce:0kB free_pcp:4756kB local_pcp:548kB free_cma:0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043790] lowmem_reserve[]: 0 0 0 0 0
Mar 25 02:11:39 proxmox1 kernel: [38030728.043792] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043802] Node 0 DMA32: 6381*4kB (UMEH) 5594*8kB (UMEH) 2955*16kB (UMEH) 1052*32kB (UMEH) 445*64kB (UMH) 261*128kB (UH) 94*256kB (UMH) 20*512kB (UH) 4*1024kB (UH) 0*2048kB 0*4096kB = 251508kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043813] Node 0 Normal: 3*4kB (MH) 939*8kB (UMEH) 2718*16kB (UEH) 361*32kB (UEH) 5*64kB (H) 5*128kB (H) 3*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64292kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043825] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043826] 82105 total pagecache pages
Mar 25 02:11:39 proxmox1 kernel: [38030728.043829] 56428 pages in swap cache
Mar 25 02:11:39 proxmox1 kernel: [38030728.043830] Swap cache stats: add 13680729, delete 13624293, find 675686965/679062801
Mar 25 02:11:39 proxmox1 kernel: [38030728.043831] Free swap  = 0kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043832] Total swap = 8388604kB
Mar 25 02:11:39 proxmox1 kernel: [38030728.043833] 16748249 pages RAM
Mar 25 02:11:39 proxmox1 kernel: [38030728.043833] 0 pages HighMem/MovableOnly
Mar 25 02:11:39 proxmox1 kernel: [38030728.043834] 310376 pages reserved
Mar 25 02:11:39 proxmox1 kernel: [38030728.043834] 0 pages cma reserved
Mar 25 02:11:39 proxmox1 kernel: [38030728.043835] 0 pages hwpoisoned
Mar 25 02:11:39 proxmox1 kernel: [38030728.043836] Tasks state (memory values in pages):
Mar 25 02:11:39 proxmox1 kernel: [38030728.043837] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Mar 25 02:11:39 proxmox1 kernel: [38030728.043846] [    549]     0   549    32193     5618   290816    16988             0 systemd-journal
Mar 25 02:11:39 proxmox1 kernel: [38030728.043849] [    557]     0   557    53185    18406   200704        0         -1000 dmeventd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043852] [    569]     0   569     5718      625    65536      200         -1000 systemd-udevd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043855] [    818]   106   818     1749      595    49152      100             0 rpcbind
Mar 25 02:11:39 proxmox1 kernel: [38030728.043857] [    821]   100   821    23270      805    81920      188             0 systemd-timesyn
Mar 25 02:11:39 proxmox1 kernel: [38030728.043859] [    849]     0   849   136554      518   118784       63             0 pve-lxc-syscall
Mar 25 02:11:39 proxmox1 kernel: [38030728.043861] [    851]     0   851    56455      635    81920       42             0 rsyslogd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043863] [    855]     0   855    37717      350    65536       41             0 lxcfs
Mar 25 02:11:39 proxmox1 kernel: [38030728.043865] [    858]   104   858     2281      655    61440       58          -900 dbus-daemon
Mar 25 02:11:39 proxmox1 kernel: [38030728.043868] [    861]     0   861     4879      273    77824      181             0 systemd-logind
Mar 25 02:11:39 proxmox1 kernel: [38030728.043870] [    862]     0   862     3158      521    57344      313             0 smartd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043872] [    867]     0   867    41689      560    81920      233             0 zed
Mar 25 02:11:39 proxmox1 kernel: [38030728.043874] [    872]     0   872      535      171    40960        5         -1000 watchdog-mux
Mar 25 02:11:39 proxmox1 kernel: [38030728.043877] [    882]     0   882     1682      367    53248       21             0 ksmtuned
Mar 25 02:11:39 proxmox1 kernel: [38030728.043879] [    886]     0   886     1022      369    49152        4             0 qmeventd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043881] [    950]     0   950      954       93    40960        6             0 lxc-monitord
Mar 25 02:11:39 proxmox1 kernel: [38030728.043883] [    961]     0   961      568      170    45056       17             0 none
Mar 25 02:11:39 proxmox1 kernel: [38030728.043885] [    965]     0   965     1722       45    57344       15             0 iscsid
Mar 25 02:11:39 proxmox1 kernel: [38030728.043888] [    966]     0   966     1848     1253    57344        0           -17 iscsid
Mar 25 02:11:39 proxmox1 kernel: [38030728.043890] [    972]     0   972     3962      409    73728      181         -1000 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043892] [   1026]     0  1026   293786      306   241664      209             0 rrdcached
Mar 25 02:11:39 proxmox1 kernel: [38030728.043894] [   1056]     0  1056   379395     7306   483328      717             0 pmxcfs
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043919] [   1727]     0  1727     1403      318    45056       25             0 agetty
Mar 25 02:11:39 proxmox1 kernel: [38030728.043922] [  18413]     0 18413   727057   501394  5451776    31285             0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043924] [  10483]     0 10483  1868484  1545292 14823424    39976             0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043927] [  10783]     0 10783  1276146  1053046 10002432     8710             0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043929] [   9880]     0  9880   758051   477820  5959680    54007             0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043932] [  29805]   110 29805   291594     5899  2379776   278796             0 snmpd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043934] [   9834]     0  9834  2321128  1890371 18239488   214415             0 kvm
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043957] [  12139]     0 12139  4458610  4188872 35303424    15771             0 kvm
Mar 25 02:11:39 proxmox1 kernel: [38030728.043959] [  25309]     0 25309    90380     4492   434176    25873             0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043961] [  25310]     0 25310    90381     4515   434176    25583             0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043963] [  29536]     0 29536    21543      125    69632        0             0 pvefw-logger
Mar 25 02:11:39 proxmox1 kernel: [38030728.043966] [  29548]    33 29548    91740    25826   446464     5686             0 pveproxy worker
...
Mar 25 02:11:39 proxmox1 kernel: [38030728.043970] [  29550]    33 29550    88668    24087   413696     6253             0 pveproxy worker
Mar 25 02:11:39 proxmox1 kernel: [38030728.043972] [  29551]    33 29551    17619     7438   167936     5168             0 spiceproxy work
Mar 25 02:11:39 proxmox1 kernel: [38030728.043974] [    664]     0   664    90408     5002   434176    25297             0 pvedaemon worke
Mar 25 02:11:39 proxmox1 kernel: [38030728.043977] [  19258]   107 19258    10958      718    81920        0             0 pickup
Mar 25 02:11:39 proxmox1 kernel: [38030728.043980] [  23225]     0 23225     1315       98    49152        0             0 sleep
Mar 25 02:11:39 proxmox1 kernel: [38030728.043982] [  23259]     0 23259     4176      701    73728        0             0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043984] [  23260]   105 23260     3962      422    69632        0             0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043986] [  23262]     0 23262     3962      464    73728        0             0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043989] [  23264]     0 23264     5718      261    57344      196             0 systemd-udevd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043991] [  23265]   105 23265     3962      591    73728        0             0 sshd
Mar 25 02:11:39 proxmox1 kernel: [38030728.043992] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=12139,uid=0
Mar 25 02:11:39 proxmox1 kernel: [38030728.044023] Out of memory: Killed process 12139 (kvm) total-vm:17834440kB, anon-rss:16753760kB, file-rss:1724kB, shmem-rss:4kB, UID:0 pgtables:34476kB oom_score_adj:0
Mar 25 02:11:40 proxmox1 kernel: [38030729.660594] oom_reaper: reaped process 12139 (kvm), now anon-rss:0kB, file-rss:60kB, shmem-rss:4kB
Mar 25 02:11:41 proxmox1 kernel: [38030729.726974] fwbr100i0: port 2(tap100i0) entered disabled state
Mar 25 02:11:41 proxmox1 kernel: [38030729.727154] fwbr100i0: port 2(tap100i0) entered disabled state
Mar 25 02:11:41 proxmox1 systemd[1]: 100.scope: Succeeded.
Mar 25 02:11:42 proxmox1 qmeventd[874]: Starting cleanup for 100
Mar 25 02:11:42 proxmox1 kernel: [38030730.874109] fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.874164] vmbr0: port 2(fwpr100p0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.874417] device fwln100i0 left promiscuous mode
Mar 25 02:11:42 proxmox1 kernel: [38030730.874421] fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 25 02:11:42 proxmox1 kernel: [38030730.894961] device fwpr100p0 left promiscuous mode
Mar 25 02:11:42 proxmox1 kernel: [38030730.894964] vmbr0: port 2(fwpr100p0) entered disabled state
Mar 25 02:11:42 proxmox1 qmeventd[874]: Finished cleanup for 100
 
the host is running out of memory..
 
> the host is running out of memory..

Do you mean RAM Are these then proposed solutions?

1. Reduce some of the VM's guest RAM usage.
2. Add more RAM to the host?

I'm just a bit unsure what you mean by memory, as I see the host also needs some swap space: "KSM Sharing" Swap usage.
 
RAM should never be filled up to 98%. So give the guests less RAM or start using ballooning (where the host will steals RAM from the VM as soon as your hosts RAM exceeds 80%) or buy more RAM. In general you don't want to overprovision your RAM. If you just got 64GB of RAM, don't give that VMs more than something like 60GB. Or even less if you use ZFS. So that would be 4GB RAM per guest and I qould guess you gave them way more.
 
Last edited:
> the host is running out of memory..

Do you mean RAM Are these then proposed solutions?

1. Reduce some of the VM's guest RAM usage.
2. Add more RAM to the host?

I'm just a bit unsure what you mean by memory, as I see the host also needs some swap space: "KSM Sharing" Swap usage.
yes, memory == RAM (that's what the M in RAM stands for ;))

you have severely overprovisioned the availabled resources:
- actual memory is fully used
- SWAP space is fully used (that is also stuff that wants to live in memory if possible!)
- KSM has another 10GB of memory saved

the solution is to not overprovision like that, so yeah, either reduce memory footprint (which on a hypervisor will be mainly memory allocated to guests) or increase available memory by adding more RAM
 
I have similar issue with Swap usage maxed out but I don't think I'm over provisioned as you may see below:
1649781251489.png
Any ideas why the Swap is maxed out? And how much of a performance hit do I take with it like this?
 
Any ideas why the Swap is maxed out? And how much of a performance hit do I take with it like this?
When the system identified data in ram as "swappable" (eg, not in high demand) it will place it in swap proactively. You can control this behavior with the "swappiness" tunable, but its probably doing what it should and doesnt require fixing.
 
  • Like
Reactions: Curt Hall
ok, I did lower the swappiness, and did a reboot, now the swap usage is 0 again, weird.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!