OOM reaper reaps even though there is AMPLE memory

AngryAdm

Member
Sep 5, 2020
145
25
18
93
Code:
The server has 512GB Ram.
It has 5 running vms with following allocated memory:
1:2GB
2:4GB
3:32GB
4:64GB
5:16GB

Please explain how this server can run OOM when it clearly has sufficient memory to run the meager load it is running.
Even with 2 ZFS pools with default value it will never fill ram. The below graph looks the same for average/max for aprox. 3 months. No deviation as it only runs these 5 vms.


[3134156.023918] libceph: osd6 (1)172.16.0.3:6821 socket closed (con state OPEN)
[3157331.811755] dmesg invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=0
[3157331.812316] CPU: 7 PID: 281164 Comm: dmesg Tainted: P           O      5.11.22-5-pve #1
[3157331.812741] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
[3157331.813150] Call Trace:
[3157331.813553]  dump_stack+0x70/0x8b
[3157331.813954]  dump_header+0x4f/0x1f6
[3157331.814346]  oom_kill_process.cold+0xb/0x10
[3157331.814729]  out_of_memory+0x1cf/0x520
[3157331.815117]  __alloc_pages_slowpath.constprop.0+0xc6d/0xd60
[3157331.815518]  __alloc_pages_nodemask+0x2e0/0x310
[3157331.815926]  alloc_pages_current+0x87/0x110
[3157331.816336]  kmalloc_order+0x2d/0xa0
[3157331.816748]  kmalloc_order_trace+0x1d/0x80
[3157331.817161]  devkmsg_open+0x4f/0x130
[3157331.817577]  memory_open+0x48/0x60
[3157331.817996]  chrdev_open+0xf7/0x220
[3157331.818415]  ? cdev_device_add+0x90/0x90
[3157331.818838]  do_dentry_open+0x156/0x370
[3157331.819262]  vfs_open+0x2d/0x30
[3157331.819679]  path_openat+0xa79/0x10d0
[3157331.820099]  ? unlock_page_memcg+0x38/0x70
[3157331.820527]  ? page_add_file_rmap+0xd3/0x140
[3157331.820961]  do_filp_open+0x8c/0x130
[3157331.821387]  ? __check_object_size+0x13f/0x150
[3157331.821826]  do_sys_openat2+0x9b/0x150
[3157331.822256]  __x64_sys_openat+0x56/0x90
[3157331.822679]  do_syscall_64+0x38/0x90
[3157331.823097]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[3157331.823510] RIP: 0033:0x7f82bdea6be7
[3157331.823915] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
[3157331.824809] RSP: 002b:00007ffd2fe5dc40 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[3157331.825262] RAX: ffffffffffffffda RBX: 000055e10e774660 RCX: 00007f82bdea6be7
[3157331.825725] RDX: 0000000000000800 RSI: 000055e10e7743d7 RDI: 00000000ffffff9c
[3157331.826194] RBP: 000055e10e7743d7 R08: 0000000000090800 R09: 000055e10e77578f
[3157331.826672] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000800
[3157331.827154] R13: 000055e10e77a020 R14: 0000000000000001 R15: 0000000000000000
[3157331.827657] Mem-Info:
[3157331.828159] active_anon:30278860 inactive_anon:600647 isolated_anon:0
                  active_file:1732260 inactive_file:30438149 isolated_file:0
                  unevictable:93652 dirty:1 writeback:123
                  slab_reclaimable:190270 slab_unreclaimable:1711323
                  mapped:31102 shmem:16677 pagetables:67058 bounce:0
                  free:831798 free_pcp:601 free_cma:0
[3157331.831279] Node 0 active_anon:121115440kB inactive_anon:2402588kB active_file:6929040kB inactive_file:121754108kB unevictable:374608kB isolated(anon):0kB isolated(file):0kB mapped:124408kB dirty:4kB writeback:492kB shmem:66708kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 91336704kB writeback_tmp:0kB kernel_stack:12416kB pagetables:268232kB all_unreclaimable? no
[3157331.833124] Node 0 DMA free:11796kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15892kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[3157331.834526] lowmem_reserve[]: 0 2554 515583 515583 515583
[3157331.835308] Node 0 DMA32 free:2052348kB min:452kB low:3064kB high:5676kB reserved_highatomic:0KB active_anon:604160kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2742768kB managed:2672072kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[3157331.836964] lowmem_reserve[]: 0 0 513029 513029 513029
[3157331.837767] Node 0 Normal free:1260196kB min:91448kB low:616788kB high:1142128kB reserved_highatomic:2048KB active_anon:120511280kB inactive_anon:2402588kB active_file:6929040kB inactive_file:121756520kB unevictable:374608kB writepending:496kB present:533972992kB managed:525349464kB mlocked:374608kB bounce:0kB free_pcp:2608kB local_pcp:0kB free_cma:0kB
[3157331.839477] lowmem_reserve[]: 0 0 0 0 0
[3157331.840357] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11796kB
[3157331.841268] Node 0 DMA32: 125*4kB (UM) 481*8kB (UM) 462*16kB (UME) 417*32kB (UME) 392*64kB (UME) 396*128kB (UME) 337*256kB (UME) 243*512kB (UME) 136*1024kB (UM) 62*2048kB (UME) 360*4096kB (UME) = 2052348kB
[3157331.843064] Node 0 Normal: 9636*4kB (UE) 74662*8kB (UE) 38817*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1256912kB
[3157331.844015] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[3157331.844962] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[3157331.845911] 32193152 total pagecache pages
[3157331.846890] 0 pages in swap cache
[3157331.847832] Swap cache stats: add 0, delete 0, find 0/0
[3157331.848774] Free swap  = 0kB
[3157331.849712] Total swap = 0kB
[3157331.850646] 134182939 pages RAM
[3157331.851592] 0 pages HighMem/MovableOnly
[3157331.852559] 2173582 pages reserved
[3157331.853524] 0 pages hwpoisoned
[3157331.854495] Tasks state (memory values in pages):
[3157331.855447] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[3157331.856425] [   1200]     0  1200    88504     4764   724992        0          -250 systemd-journal
[3157331.857410] [   1315]     0  1315     5732     1014    65536        0         -1000 systemd-udevd
[3157331.858420] [   2530]   103  2530     1960      575    49152        0             0 rpcbind
[3157331.859445] [   2532]     0  2532     4547     1801    73728        0             0 ceph-crash
[3157331.860425] [   2561]   102  2561     2056      703    57344        0          -900 dbus-daemon
[3157331.861405] [   2564]     0  2564    93128      340    86016        0             0 lxcfs
[3157331.862392] [   2565]     0  2565   272369      512   188416        0             0 pve-lxc-syscall
[3157331.863403] [   2571]     0  2571    55185      880    73728        0             0 rsyslogd
[3157331.864388] [   2573]     0  2573     3013     1155    65536        0             0 smartd
[3157331.865370] [   2574]     0  2574     1742      452    49152        0             0 ksmtuned
[3157331.866350] [   2577]     0  2577     5538     1326    65536        0             0 systemd-logind
[3157331.867335] [   2580]     0  2580      543      179    36864        0         -1000 watchdog-mux
[3157331.868295] [   2585]     0  2585    40993     1059    77824        0             0 zed
[3157331.869238] [   2588]     0  2588     1051      366    49152        0             0 qmeventd
[3157331.870192] [   3268]     0  3268     1887      591    57344        0             0 lxc-monitord
[3157331.871124] [   3285]     0  3285     2873      132    57344        0             0 iscsid
[3157331.872047] [   3286]     0  3286     2999     2928    61440        0           -17 iscsid
[3157331.872986] [   3288]     0  3288     3323     1227    65536        0         -1000 sshd
[3157331.873894] [   3292]   101  3292     4743      790    61440        0             0 chronyd
[3157331.874813] [   3294]   101  3294     2695      679    61440        0             0 chronyd
[3157331.875719] [   3309]     0  3309      576      128    40960        0             0 bpfilter_umh
[3157331.876615] [   3321]     0  3321     1446      408    49152        0             0 agetty
[3157331.877508] [   3395]     0  3395   181748     1133   176128        0             0 rrdcached
[3157331.878396] [   3439]     0  3439   208388    15767   462848        0             0 pmxcfs
[3157331.879293] [   3640]     0  3640     9996      973    77824        0             0 master
[3157331.880189] [   3680]     0  3680   195241    96904   839680        0             0 corosync
[3157331.881097] [   3684]     0  3684     1671      621    53248        0             0 cron
[3157331.881960] [   3822]     0  3822    66421    21625   282624        0             0 pve-firewall
[3157331.882805] [   3824]     0  3824    67265    23457   299008        0             0 pvestatd
[3157331.883607] [   3847]     0  3847    87541    30204   405504        0             0 pvedaemon
[3157331.884378] [   3856]     0  3856    81060    24089   331776        0             0 pve-ha-crm
[3157331.885121] [   4083]    33  4083    87862    35175   430080        0             0 pveproxy
[3157331.885861] [   4090]    33  4090    17301    14318   176128        0             0 spiceproxy
[3157331.886588] [   4092]     0  4092    80982    24125   344064        0             0 pve-ha-lrm
[3157331.887276] [ 967092]   106 967092    10076     1264    69632        0             0 qmgr
[3157331.887977] [1366438]     0 1366438  1797451   770871 10240000        0             0 kvm
[3157331.888619] [1369243]     0 1369243  1034979   334976  5808128        0             0 kvm
[3157331.889243] [1703229]     0 1703229    90788    31848   425984        0             0 task UPID:pve02
[3157331.889834] [1703230]     0 1703230    22519     1284    77824        0             0 termproxy
[3157331.890406] [1703234]     0 1703234     1774      950    57344        0             0 login
[3157331.890960] [1703349]     0 1703349     4105     2192    69632        0             0 systemd
[3157331.891480] [1703350]     0 1703350    42401     1187   102400        0             0 (sd-pam)
[3157331.891986] [1703391]     0 1703391     1993      908    57344        0             0 bash
[3157331.892489] [ 641829]     0 641829  9798630  8399755 69095424        0             0 kvm
[3157331.892978] [ 662789]    33 662789    93621    34356   446464        0             0 pveproxy worker
[3157331.893463] [1095456]     0 1095456  5583853  4218138 36122624        0             0 kvm
[3157331.893952] [ 740766]     0 740766 18414525 16974551 143527936        0             0 kvm
[3157331.894441] [2536871]     0 2536871    89664    31535   421888        0             0 pvedaemon worke
[3157331.894947] [2871790]     0 2871790    89771    31617   425984        0             0 pvedaemon worke
[3157331.895445] [2985227]     0 2985227    89723    31532   425984        0             0 pvedaemon worke
[3157331.895941] [1668449]    33 1668449    17331    12929   176128        0             0 spiceproxy work
[3157331.896441] [1668451]     0 1668451    20035      435    57344        0             0 pvefw-logger
[3157331.896941] [1668455]    33 1668455    87895    31323   409600        0             0 pveproxy worker
[3157331.897446] [1668456]    33 1668456    87895    31323   409600        0             0 pveproxy worker
[3157331.897944] [1668468]    33 1668468    87895    31323   409600        0             0 pveproxy worker
[3157331.898425] [3910659]   106 3910659    10064     1349    77824        0             0 pickup
[3157331.898912] [ 278026]     0 278026     1326      129    49152        0             0 sleep
[3157331.899391] [ 281154]     0 281154     3614     2056    65536        0             0 sshd
[3157331.899863] [ 281161]     0 281161     1993     1053    53248        0             0 bash
[3157331.900331] [ 281164]     0 281164     1385      201    45056        0             0 dmesg
[3157331.900826] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/201.scope,task=kvm,pid=740766,uid=0
[3157331.901376] Out of memory: Killed process 740766 (kvm) total-vm:73658100kB, anon-rss:67886652kB, file-rss:11552kB, shmem-rss:0kB, UID:0 pgtables:140164kB oom_score_adj:0
[3157333.887835] oom_reaper: reaped process 740766 (kvm), now anon-rss:0kB, file-rss:68kB, shmem-rss:0kB
[3157333.983076] fwbr201i0: port 2(tap201i0) entered disabled state
[3157334.011347] fwbr201i0: port 2(tap201i0) entered disabled state
[3157334.538217] fwbr201i0: port 1(fwln201i0) entered disabled state
[3157334.541070] vmbr0: port 4(fwpr201p0) entered disabled state
[3157334.566469] device fwln201i0 left promiscuous mode
[3157334.567593] fwbr201i0: port 1(fwln201i0) entered disabled state
[3157334.622859] device fwpr201p0 left promiscuous mode
[3157334.624005] vmbr0: port 4(fwpr201p0) entered disabled state
[3157420.988253] device tap201i0 entered promiscuous mode
[3157421.042276] fwbr201i0: port 1(fwln201i0) entered blocking state
[3157421.043120] fwbr201i0: port 1(fwln201i0) entered disabled state
[3157421.044837] device fwln201i0 entered promiscuous mode
[3157421.045848] fwbr201i0: port 1(fwln201i0) entered blocking state
[3157421.046534] fwbr201i0: port 1(fwln201i0) entered forwarding state
[3157421.054058] vmbr0: port 4(fwpr201p0) entered blocking state
[3157421.055085] vmbr0: port 4(fwpr201p0) entered disabled state
[3157421.056020] device fwpr201p0 entered promiscuous mode
[3157421.056860] vmbr0: port 4(fwpr201p0) entered blocking state
[3157421.057508] vmbr0: port 4(fwpr201p0) entered forwarding state











Capture.PNG
 
Code:
[3157331.811755] dmesg invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=0
...

[3157331.843064] Node 0 Normal: 9636*4kB (UE) 74662*8kB (UE) 38817*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1256912kB

request for a Normal allocation of size 4k*2^3 == 32k, and your memory is so fragmented (and you don't have swap that could help with that) that there is no continuous 32k block available..
 
So you are saying with 160GB free memory it cannot find a place to write 32k continously? That is hopefully not the problem, as that would be absurd.

I am also curious why my buffers/cache in free -m showed 160GB used by "buffers/cache" considering ALL drives are in ZFS pools, I do not see how this can happen considering arc is not counted in cache/buffers
 
[3157331.862392] [ 2565] 0 2565 272369 512 188416 0 0 pve-lxc-syscall

Is this odd, i do not use containers, at all?
 
So you are saying with 160GB free memory it cannot find a place to write 32k continously? That is hopefully not the problem, as that would be absurd.

that is for sure the problem - the kernel tells you that is the reason it OOM-killed. now WHY your memory is so badly fragmented is a good question.

I am also curious why my buffers/cache in free -m showed 160GB used by "buffers/cache" considering ALL drives are in ZFS pools, I do not see how this can happen considering arc is not counted in cache/buffers

page cache is around 122GB at time of the kill - and just because all local drives are on ZFS doesn't mean that nothing uses the page cache (e.g., NFS mounts would use the page cache as well, certain shmem stuff and tmpfs does as well, ...).

[3157331.862392] [ 2565] 0 2565 272369 512 188416 0 0 pve-lxc-syscall

Is this odd, i do not use containers, at all?

that service is always started, but obviously it won't do anything when no containers are running.
 
Had an issue like this long time ago in a Linux server, kernel 3.x IIRC (not Proxmox). That server had no swap and had like 32GB or RAM. It was a Java app server, so it had quite a few high mem usage processes. Also, it ran some monitoring software which ran some other binaries to gather info on app and server status. Those binaries created/updated/deleted *a lot* of files every minute, causing severe fragmentation on the buffers/cache memory, leaving almost no contiguous allocation space. After some days/weeks, OOM would kill some of the Java apps as it couldn't allocate memory.

At the time this did help:

- Tell the kernel to compact memory every hour or so: echo 1 > /proc/sys/vm/compact_memory
- Set a relatively small swap file, maybe 4GB. That way the kernel will start using it before invoking OOM killer. Set vm.swappiness=1.
- Monitor swap usage and when it was over 2GB we would either restart some Java app when convenient or drop disk caches (sync && echo 3 > /proc/sys/vm/drop_caches).
- Monitor memory fragmentation with echo m > /proc/sysrq-trigger
- Eventually modified some monitoring checks to use less file operations to reduce memory fragmentation.

After that the server ran mostly without issues or major changes for a few years.

If I'm not getting the wrong numbers, you are using:

160GB buffers/cache
118GB VMs reserved RAM
256GB ARC cache (as you mention default config)

That is like 22GB over 512GB total of the server. Adding to that the mentioned memory fragmentation issue, seems like the kernel either has to OOM kill something or enter panic :)
 
  • Like
Reactions: itNGO and Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!