PVE7: Backup fails because OOM-Killer stops smbd in LXC-Container

quotengrote

New Member
Jul 10, 2021
9
0
1
38
Hi all,

with Proxmox 7 the nightly Backup fails because the OOM-Killer stops smbd in an LXC-Container.
This didnt happen with 6.4.
The backup Server is a LXC-Container with bind-mounts on this host.

Logs Container:
Code:
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: A process of this unit has been killed by the OOM killer.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Failed with result 'oom-kill'.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Consumed 2min 59.923s CPU time.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: A process of this unit has been killed by the OOM killer.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Failed with result 'oom-kill'.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Consumed 2min 59.923s CPU time.

Logs PVE:
Code:
Jul  9 23:09:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:09:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:10:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:10:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: #: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 300: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: =: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: every: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 5: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: minutes: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdc: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdd: WDC WD80EZAZ-11TDBA0: 44 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sde: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdf: MTFDDAK256MBF-1AN15ABHA: 48 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: #: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 300: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: =: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: every: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 5: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: minutes: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdc: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdd: WDC WD80EZAZ-11TDBA0: 44 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sde: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdf: MTFDDAK256MBF-1AN15ABHA: 48 C
Jul  9 23:10:26 pve2.grote.lan kernel: smbd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jul  9 23:10:26 pve2.grote.lan kernel: CPU: 5 PID: 2714208 Comm: smbd Tainted: P           O      5.11.22-1-pve #1
Jul  9 23:10:26 pve2.grote.lan kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
Jul  9 23:10:26 pve2.grote.lan kernel: Call Trace:
Jul  9 23:10:26 pve2.grote.lan kernel: dump_stack+0x70/0x8b
Jul  9 23:10:26 pve2.grote.lan kernel: dump_header+0x4f/0x1f6
Jul  9 23:10:26 pve2.grote.lan kernel: oom_kill_process.cold+0xb/0x10
Jul  9 23:10:26 pve2.grote.lan kernel: out_of_memory+0x1cf/0x520
Jul  9 23:10:26 pve2.grote.lan kernel: mem_cgroup_out_of_memory+0x139/0x150
Jul  9 23:10:26 pve2.grote.lan kernel: try_charge+0x750/0x7b0
Jul  9 23:10:26 pve2.grote.lan kernel: mem_cgroup_charge+0x8a/0x280
Jul  9 23:10:26 pve2.grote.lan kernel: __add_to_page_cache_locked+0x34b/0x3a0
Jul  9 23:10:26 pve2.grote.lan kernel: ? scan_shadow_nodes+0x30/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: add_to_page_cache_lru+0x4d/0xd0
Jul  9 23:10:26 pve2.grote.lan kernel: pagecache_get_page+0x161/0x3b0
Jul  9 23:10:26 pve2.grote.lan kernel: filemap_fault+0x6da/0xa30
Jul  9 23:10:26 pve2.grote.lan kernel: __do_fault+0x3c/0xe0
Jul  9 23:10:26 pve2.grote.lan kernel: handle_mm_fault+0x1516/0x1a70
Jul  9 23:10:26 pve2.grote.lan kernel: do_user_addr_fault+0x1a3/0x450
Jul  9 23:10:26 pve2.grote.lan kernel: exc_page_fault+0x6c/0x150
Jul  9 23:10:26 pve2.grote.lan kernel: ? asm_exc_page_fault+0x8/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: asm_exc_page_fault+0x1e/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: RIP: 0033:0x7f70aece7c6f
Jul  9 23:10:26 pve2.grote.lan kernel: Code: Unable to access opcode bytes at RIP 0x7f70aece7c45.
Jul  9 23:10:26 pve2.grote.lan kernel: RSP: 002b:00007f70ab1b5b20 EFLAGS: 00010293
Jul  9 23:10:26 pve2.grote.lan kernel: RAX: 0000000000008000 RBX: 000055c79a334300 RCX: 00007f70aece7c6f
Jul  9 23:10:26 pve2.grote.lan kernel: RDX: 0000000000008000 RSI: 000055c79a354500 RDI: 0000000000000028
Jul  9 23:10:26 pve2.grote.lan kernel: RBP: 000055c79a2aa9d0 R08: 0000000000000000 R09: 00000000ffffffff
Jul  9 23:10:26 pve2.grote.lan kernel: R10: 0000000000138000 R11: 0000000000000293 R12: 00007f70ab1b5ba0
Jul  9 23:10:26 pve2.grote.lan kernel: R13: 000055c79a2aaa08 R14: 000055c79a2ec890 R15: 00007f70ae7ea4f0
Jul  9 23:10:26 pve2.grote.lan kernel: memory: usage 2098124kB, limit 2097152kB, failcnt 121146
Jul  9 23:10:26 pve2.grote.lan kernel: swap: usage 65536kB, limit 65536kB, failcnt 6
Jul  9 23:10:26 pve2.grote.lan kernel: Memory cgroup stats for /lxc/109:

Jul  9 23:10:26 pve2.grote.lan kernel: anon 2100375552#012file 7974912#012kernel_stack 2654208#012pagetables 8921088#012percpu 1489152#012sock 4157440#012shmem 2838528#012file_mapped 6352896#012file_dirty 540672#012file_writeback 0#012anon_thp 0#012file_thp 0#012shmem_thp 0#012inactive_anon 617844736#012active_anon 1502101504#012inactive_file 1949696#012active_file 1724416#012unevictable 0#012slab_reclaimable 6146368#012slab_unreclaimable 8557584#012slab 14703952#012workingset_refault_anon 4719#012workingset_refault_file 20742#012workingset_activate_anon 4620#012workingset_activate_file 18475#012workingset_restore_anon 0#012workingset_restore_file 15869#012workingset_nodereclaim 10710#012pgfault 46305758#012pgmajfault 742136#012pgrefill 105541#012pgscan 1554942#012pgsteal 677806#012pgactivate 476127#012pgdeactivate 97780#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 0#012thp_collapse_alloc 0
Jul  9 23:10:26 pve2.grote.lan kernel: Tasks state (memory values in pages):
Jul  9 23:10:26 pve2.grote.lan kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jul  9 23:10:26 pve2.grote.lan kernel: [1303878]     0 1303878    25778      662    81920        0             0 systemd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304144]     0 1304144      716       36    49152        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304146]     0 1304146      716       37    40960        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304488]     0 1304488     9538      123    61440        0             0 master
Jul  9 23:10:26 pve2.grote.lan kernel: [1304490]   104 1304490     9618      126    69632        0             0 qmgr
Jul  9 23:10:26 pve2.grote.lan kernel: [ 720641]   104 720641     9605      125    61440        0             0 pickup
Jul  9 23:10:26 pve2.grote.lan kernel: [1303956]     0 1303956    72939     5564   548864        0          -250 systemd-journal
Jul  9 23:10:26 pve2.grote.lan kernel: [1303994]   101 1303994     6653      232    77824        0             0 systemd-network
Jul  9 23:10:26 pve2.grote.lan kernel: [1304029]   102 1304029     5990     1035    81920        0             0 systemd-resolve
Jul  9 23:10:26 pve2.grote.lan kernel: [1304104]     0 1304104    59560     1392   102400      210             0 accounts-daemon
Jul  9 23:10:26 pve2.grote.lan kernel: [1304126]     0 1304126     1006       52    49152        0             0 atd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304149]   113 1304149     3312       88    49152        0             0 chronyd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304153]   113 1304153     1267       58    49152        0             0 chronyd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304107]     0 1304107     1012       78    45056        0             0 cron
Jul  9 23:10:26 pve2.grote.lan kernel: [1304110]   103 1304110     1943      233    53248        0          -900 dbus-daemon
Jul  9 23:10:26 pve2.grote.lan kernel: [1304127]     0 1304127    98398     2585   131072      182             0 f2b/server
Jul  9 23:10:26 pve2.grote.lan kernel: [1304183]     0 1304183     4468     1610    77824      766             0 munin-node
Jul  9 23:10:26 pve2.grote.lan kernel: [1304115]     0 1304115     6613     1742    86016      196             0 networkd-dispat
Jul  9 23:10:26 pve2.grote.lan kernel: [1304116]     0 1304116     7748      526    90112        0             0 nmbd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304142]     0 1304142     3073      244    61440        0         -1000 sshd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304124]     0 1304124     4269      313    69632        0             0 systemd-logind
Jul  9 23:10:26 pve2.grote.lan kernel: [1304143]     0 1304143      716       36    45056        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304163]     0 1304163    26310     1710   106496      213             0 unattended-upgr
Jul  9 23:10:26 pve2.grote.lan kernel: [  14568]     0 14568   285600      506  1540096        0             0 rsyslogd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 212985]     0 212985    11543      646   131072      399             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213000]     0 213000    10952      290   122880      390             0 smbd-notifyd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213001]     0 213001    10956      295   122880      383             0 cleanupd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213003]     0 213003    11539      281   126976      434             0 lpqd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213158]     0 213158    23590     1356   163840     1022             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213443]     0 213443    11604      516   131072      438             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213460]     0 213460    11604      345   131072      530             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213461]     0 213461    70716     1222   180224     1292             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213474]     0 213474    11604      499   131072      451             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213483]     0 213483   965471   495890  4644864     8438             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [1306874]  1001 1306874     4539      309    69632        0             0 systemd
Jul  9 23:10:26 pve2.grote.lan kernel: [1306875]  1001 1306875     5663      578    77824      126             0 (sd-pam)
Jul  9 23:10:26 pve2.grote.lan kernel: [1307016]  1001 1307016     1248      220    49152        3             0 tmux: server
Jul  9 23:10:26 pve2.grote.lan kernel: [1307017]  1001 1307017     1379      438    53248        0             0 bash
Jul  9 23:10:26 pve2.grote.lan kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0,oom_memcg=/lxc/109,task_memcg=/lxc/109/ns/system.slice/smbd.service,task=smbd,pid=213483,uid=0
Jul  9 23:10:26 pve2.grote.lan kernel: Memory cgroup out of memory: Killed process 213483 (smbd) total-vm:3861884kB, anon-rss:1982276kB, file-rss:0kB, shmem-rss:1284kB, UID:0 pgtables:4536kB oom_score_adj:0
Jul  9 23:10:26 pve2.grote.lan kernel: oom_reaper: reaped process 213483 (smbd), now anon-rss:0kB, file-rss:0kB, shmem-rss:1284kB
Jul  9 23:10:26 pve2.grote.lan kernel: CIFS: VFS: \\192.168.2.36 Error -32 sending data on socket to server
Jul  9 23:10:27 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online
Jul  9 23:10:27 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online
Jul  9 23:10:37 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online

Memory Consumption of Host:
2021-07-10 10_10_57-Greenshot.png
 
I have build the "Fileserver"-LXC new and set
Code:
zfs zfs_arc_max
to 8 instead 16GB. Will see next night if it works.
 
Seems to be working, i think ZFS wasnt freeing the Memory fast enough, so that the OOM-Killer had too kill some processes.
 
After moving the VM to another node it works again...

This is the VM:
Code:
mg@pve3:~$ cat /etc/pve/qemu-server/116.conf
cat: /etc/pve/qemu-server/116.conf: Permission denied
mg@pve3:~$ sudo cat /etc/pve/qemu-server/116.conf
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 8192
name: docker
net0: virtio=D6:FB:A1:81:F7:BE,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: ZFS_VM_ZVOL:vm-116-disk-0,discard=on,format=raw,size=150G
scsihw: virtio-scsi-pci
smbios1: uuid=1318b43b-543e-4ba2-8286-bd2fab148f7b
sockets: 1
startup: order=200
vmgenid: f54f385c-864b-4784-9bf6-c4f037103c6d