Wir testen gerade unsere Proxmox-Server, die zukünftig unsere Groupware beheimaten sollen.
Jetzt sind wir beim Testen auf ein Problem gestoßen, das reproduzierbar auftritt.
Als Test haben wir einen dd gestartet, der von /dev/urandom etwa 400 GB in eine Datei schreibt.
Dabei steigt der Speicherverbrauch kontinuierlich an, bis er ein paar Minuten am Maximum von 48 GB bleibt, bevor die VM dann gekillt wird.
dmesg spuckt Folgendes aus:
Bei der Suche nach dem Problem sind wir auf folgenden Thread gestoßen: https://forum.proxmox.com/threads/oom-reaper-reaps-even-though-there-is-ample-memory.101828/
Handelt es sich hier um das selbe Problem oder ist die Ursache in unserem Fall eine andere?
Jetzt sind wir beim Testen auf ein Problem gestoßen, das reproduzierbar auftritt.
Als Test haben wir einen dd gestartet, der von /dev/urandom etwa 400 GB in eine Datei schreibt.
Dabei steigt der Speicherverbrauch kontinuierlich an, bis er ein paar Minuten am Maximum von 48 GB bleibt, bevor die VM dann gekillt wird.
dmesg spuckt Folgendes aus:
Code:
[518667.057268] pvedaemon worke invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[518667.057627] CPU: 2 PID: 537093 Comm: pvedaemon worke Tainted: P O 5.13.19-5-pve #1
[518667.057929] Hardware name: Thomas-Krenn.AG 2HE AMD Single-CPU RA1208-AIEPN Server/H12SSL-NT, BIOS 2.3 10/20/2021
[518667.058236] Call Trace:
[518667.058539] <TASK>
[518667.058842] dump_stack+0x7d/0x9c
[518667.059146] dump_header+0x4f/0x1f6
[518667.059448] oom_kill_process.cold+0xb/0x10
[518667.059749] out_of_memory+0x1cf/0x530
[518667.060050] __alloc_pages_slowpath.constprop.0+0xc96/0xd80
[518667.060352] __alloc_pages+0x30e/0x330
[518667.060646] alloc_pages+0x87/0x110
[518667.060939] pagecache_get_page+0x2c2/0x560
[518667.061230] filemap_fault+0x5cd/0x880
[518667.061521] __do_fault+0x3c/0xe0
[518667.061812] __handle_mm_fault+0xfca/0x16f0
[518667.062104] handle_mm_fault+0xda/0x2c0
[518667.062394] do_user_addr_fault+0x1bb/0x660
[518667.062686] ? __x64_sys_close+0x12/0x40
[518667.063295] exc_page_fault+0x7d/0x170
[518667.063847] ? asm_exc_page_fault+0x8/0x30
[518667.064330] asm_exc_page_fault+0x1e/0x30
[518667.064797] RIP: 0033:0x55c234853ffe
[518667.065262] Code: Unable to access opcode bytes at RIP 0x55c234853fd4.
[518667.065732] RSP: 002b:00007ffeb71c2780 EFLAGS: 00010246
[518667.066202] RAX: 000055c2392a36f0 RBX: 000055c23c356350 RCX: 000055c23c347a00
[518667.066681] RDX: 000000000000001a RSI: 000055c23c356350 RDI: 000055c2351d02a0
[518667.067160] RBP: 000055c2348dafb0 R08: 0000000000000000 R09: 000055c2351d02a0
[518667.067640] R10: 000055c23c356348 R11: 0000000000000001 R12: 000055c2347584a0
[518667.068117] R13: 0000000000000000 R14: 0000000000000000 R15: 000055c2351d02a0
[518667.068595] </TASK>
[518667.069070] Mem-Info:
[518667.069532] active_anon:7619419 inactive_anon:5193264 isolated_anon:0
active_file:67 inactive_file:74 isolated_file:0
unevictable:37759 dirty:0 writeback:15
slab_reclaimable:15347 slab_unreclaimable:2132976
mapped:23731 shmem:22337 pagetables:27202 bounce:0
free:81323 free_pcp:167 free_cma:0
[518667.072370] Node 0 active_anon:30477676kB inactive_anon:20773056kB active_file:200kB inactive_file:332kB unevictable:151036kB isolated(anon):0kB isolated(file):0kB mapped:94856kB dirty:0kB writeback:60kB shmem:89348kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 30392320kB writeback_tmp:0kB kernel_stack:22352kB pagetables:108808kB all_unreclaimable? no
[518667.073563] Node 0 DMA free:11264kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[518667.074451] lowmem_reserve[]: 0 2558 64088 64088 64088
[518667.074757] Node 0 DMA32 free:247324kB min:2696kB low:5316kB high:7936kB reserved_highatomic:0KB active_anon:2351104kB inactive_anon:65556kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2742068kB managed:2673608kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[518667.075739] lowmem_reserve[]: 0 0 61529 61529 61529
[518667.076063] Node 0 Normal free:67000kB min:64868kB low:127872kB high:190876kB reserved_highatomic:2048KB active_anon:28126572kB inactive_anon:20707500kB active_file:0kB inactive_file:436kB unevictable:151036kB writepending:60kB present:64209920kB managed:63013376kB mlocked:151036kB bounce:0kB free_pcp:1396kB local_pcp:264kB free_cma:0kB
[518667.077071] lowmem_reserve[]: 0 0 0 0 0
[518667.077415] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[518667.078140] Node 0 DMA32: 11*4kB (UM) 12*8kB (UM) 13*16kB (UM) 12*32kB (M) 15*64kB (UM) 13*128kB (M) 15*256kB (UM) 15*512kB (UM) 17*1024kB (UM) 49*2048kB (UM) 28*4096kB (UM) = 247324kB
[518667.078912] Node 0 Normal: 12127*4kB (UME) 1023*8kB (ME) 502*16kB (UMEH) 5*32kB (H) 4*64kB (H) 3*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 66548kB
[518667.079715] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[518667.080128] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[518667.080538] 26711 total pagecache pages
[518667.080948] 0 pages in swap cache
[518667.081354] Swap cache stats: add 0, delete 0, find 0/0
[518667.081766] Free swap = 0kB
[518667.082176] Total swap = 0kB
[518667.082585] 16741996 pages RAM
[518667.082994] 0 pages HighMem/MovableOnly
[518667.083400] 316410 pages reserved
[518667.083807] 0 pages hwpoisoned
[518667.084213] Tasks state (memory values in pages):
[518667.084620] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[518667.085044] [ 1529] 0 1529 24258 807 208896 0 -250 systemd-journal
[518667.085462] [ 1621] 0 1621 5648 818 65536 0 -1000 systemd-udevd
[518667.085888] [ 3874] 103 3874 1960 477 53248 0 0 rpcbind
[518667.086306] [ 3883] 102 3883 2052 585 53248 0 -900 dbus-daemon
[518667.086725] [ 3887] 0 3887 37728 307 57344 0 0 lxcfs
[518667.087147] [ 3890] 0 3890 813207 524 458752 0 0 pve-lxc-syscall
[518667.087565] [ 3894] 0 3894 1742 345 45056 0 0 ksmtuned
[518667.087985] [ 3897] 0 3897 55185 852 77824 0 0 rsyslogd
[518667.088401] [ 3903] 0 3903 1051 311 45056 0 0 qmeventd
[518667.088815] [ 3905] 0 3905 2987 855 65536 0 0 smartd
[518667.089227] [ 3931] 0 3931 5530 820 73728 0 0 systemd-logind
[518667.089638] [ 3959] 0 3959 543 246 40960 0 -1000 watchdog-mux
[518667.090048] [ 3968] 0 3968 59429 681 77824 0 0 zed
[518667.090456] [ 4467] 0 4467 1137 273 49152 0 0 lxc-monitord
[518667.090866] [ 4484] 0 4484 2873 132 57344 0 0 iscsid
[518667.091257] [ 4485] 0 4485 2999 2977 61440 0 -17 iscsid
[518667.091637] [ 4490] 0 4490 3323 1018 69632 0 -1000 sshd
[518667.092007] [ 4514] 0 4514 1446 410 45056 0 0 agetty
[518667.092368] [ 4534] 101 4534 4743 578 57344 0 0 chronyd
[518667.092719] [ 4542] 101 4542 2695 442 57344 0 0 chronyd
[518667.093065] [ 4585] 0 4585 181715 681 184320 0 0 rrdcached
[518667.093404] [ 4605] 0 4605 161080 19673 471040 0 0 pmxcfs
[518667.093733] [ 4686] 0 4686 9996 606 73728 0 0 master
[518667.094051] [ 4693] 0 4693 139351 41011 401408 0 0 corosync
[518667.094368] [ 4694] 0 4694 1671 536 57344 0 0 cron
[518667.094676] [ 4774] 0 4774 67542 21316 270336 0 0 pve-firewall
[518667.094977] [ 4775] 0 4775 67214 21405 282624 0 0 pvestatd
[518667.095265] [ 4779] 0 4779 576 144 40960 0 0 bpfilter_umh
[518667.095548] [ 4803] 0 4803 80892 23994 356352 0 0 pvescheduler
[518667.095825] [ 4807] 0 4807 86020 30299 385024 0 0 pvedaemon
[518667.096092] [ 4825] 0 4825 82305 24098 348160 0 0 pve-ha-crm
[518667.096348] [ 4833] 33 4833 86357 31799 421888 0 0 pveproxy
[518667.096595] [ 4903] 33 4903 18515 13368 192512 0 0 spiceproxy
[518667.096833] [ 4907] 0 4907 82216 23710 348160 0 0 pve-ha-lrm
[518667.097064] [1831631] 33 1831631 18573 12501 180224 0 0 spiceproxy work
[518667.097290] [1831657] 0 1831657 20035 280 53248 0 0 pvefw-logger
[518667.097521] [3855034] 0 3855034 3974 1158 69632 0 0 systemd
[518667.097746] [3855035] 0 3855035 42265 1037 98304 0 0 (sd-pam)
[518667.097971] [3990206] 0 3990206 3613 788 69632 0 0 sshd
[518667.098196] [3990334] 0 3990334 2164 834 53248 0 0 bash
[518667.098420] [3008821] 0 3008821 3614 875 65536 0 0 sshd
[518667.098646] [3040597] 0 3040597 1993 746 57344 0 0 bash
[518667.098875] [3598882] 0 3598882 13050121 12533512 101351424 0 0 kvm
[518667.099104] [3904888] 33 3904888 89603 33180 430080 0 0 pveproxy worker
[518667.099338] [3933864] 0 3933864 1679 682 49152 0 0 watch
[518667.099573] [ 537093] 0 537093 88184 31357 409600 0 0 pvedaemon worke
[518667.099812] [ 538232] 0 538232 88184 31305 405504 0 0 pvedaemon worke
[518667.100044] [ 540233] 0 540233 88120 31116 401408 0 0 pvedaemon worke
[518667.100269] [ 541499] 33 541499 88529 31948 413696 0 0 pveproxy worker
[518667.100495] [ 543118] 0 543118 88120 30675 393216 0 0 task UPID:anaxe
[518667.100721] [ 543120] 0 543120 80191 24599 372736 0 0 qm
[518667.100946] [ 933165] 106 933165 10064 704 69632 0 0 pickup
[518667.101172] [ 933167] 106 933167 10076 661 77824 0 0 qmgr
[518667.101394] [1744749] 33 1744749 89696 32394 417792 0 0 pveproxy worker
[518667.101622] [1744845] 33 1744845 88517 31602 413696 0 0 pveproxy worker
[518667.101849] [2071934] 0 2071934 3648 996 73728 0 0 sshd
[518667.102077] [2072062] 0 2072062 80190 24575 376832 0 0 qm
[518667.102306] [2608672] 0 2608672 1326 119 45056 0 0 sleep
[518667.102540] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=pvedaemon.service,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=3598882,uid=0
[518667.103055] Out of memory: Killed process 3598882 (kvm) total-vm:52200484kB, anon-rss:50131896kB, file-rss:2152kB, shmem-rss:0kB, UID:0 pgtables:98976kB oom_score_adj:0
[518667.764495] oom_reaper: reaped process 3598882 (kvm), now anon-rss:0kB, file-rss:84kB, shmem-rss:0kB
Bei der Suche nach dem Problem sind wir auf folgenden Thread gestoßen: https://forum.proxmox.com/threads/oom-reaper-reaps-even-though-there-is-ample-memory.101828/
Handelt es sich hier um das selbe Problem oder ist die Ursache in unserem Fall eine andere?