PVE7: Backup fails because OOM-Killer stops smbd in LXC-Container

quotengrote

New Member
Jul 10, 2021
9
0
1
37
Hi all,

with Proxmox 7 the nightly Backup fails because the OOM-Killer stops smbd in an LXC-Container.
This didnt happen with 6.4.
The backup Server is a LXC-Container with bind-mounts on this host.

Logs Container:
Code:
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: A process of this unit has been killed by the OOM killer.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Failed with result 'oom-kill'.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Consumed 2min 59.923s CPU time.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: A process of this unit has been killed by the OOM killer.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Failed with result 'oom-kill'.
Jul  9 23:10:26 fileserver2.grote.lan systemd[1]: smbd.service: Consumed 2min 59.923s CPU time.

Logs PVE:
Code:
Jul  9 23:09:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:09:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:09:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:10:00 pve2.grote.lan systemd[1]: Starting Proxmox VE replication runner...
Jul  9 23:10:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: pvesr.service: Succeeded.
Jul  9 23:10:01 pve2.grote.lan systemd[1]: Finished Proxmox VE replication runner.
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: #: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 300: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: =: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: every: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 5: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: minutes: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdc: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdd: WDC WD80EZAZ-11TDBA0: 44 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sde: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdf: MTFDDAK256MBF-1AN15ABHA: 48 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: #: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 300: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: =: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: every: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: 5: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: minutes: open: No such file or directory
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdc: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdd: WDC WD80EZAZ-11TDBA0: 44 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sde: WDC WD80EZAZ-11TDBA0: 42 C
Jul  9 23:10:12 pve2.grote.lan hddtemp[4666]: /dev/sdf: MTFDDAK256MBF-1AN15ABHA: 48 C
Jul  9 23:10:26 pve2.grote.lan kernel: smbd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jul  9 23:10:26 pve2.grote.lan kernel: CPU: 5 PID: 2714208 Comm: smbd Tainted: P           O      5.11.22-1-pve #1
Jul  9 23:10:26 pve2.grote.lan kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
Jul  9 23:10:26 pve2.grote.lan kernel: Call Trace:
Jul  9 23:10:26 pve2.grote.lan kernel: dump_stack+0x70/0x8b
Jul  9 23:10:26 pve2.grote.lan kernel: dump_header+0x4f/0x1f6
Jul  9 23:10:26 pve2.grote.lan kernel: oom_kill_process.cold+0xb/0x10
Jul  9 23:10:26 pve2.grote.lan kernel: out_of_memory+0x1cf/0x520
Jul  9 23:10:26 pve2.grote.lan kernel: mem_cgroup_out_of_memory+0x139/0x150
Jul  9 23:10:26 pve2.grote.lan kernel: try_charge+0x750/0x7b0
Jul  9 23:10:26 pve2.grote.lan kernel: mem_cgroup_charge+0x8a/0x280
Jul  9 23:10:26 pve2.grote.lan kernel: __add_to_page_cache_locked+0x34b/0x3a0
Jul  9 23:10:26 pve2.grote.lan kernel: ? scan_shadow_nodes+0x30/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: add_to_page_cache_lru+0x4d/0xd0
Jul  9 23:10:26 pve2.grote.lan kernel: pagecache_get_page+0x161/0x3b0
Jul  9 23:10:26 pve2.grote.lan kernel: filemap_fault+0x6da/0xa30
Jul  9 23:10:26 pve2.grote.lan kernel: __do_fault+0x3c/0xe0
Jul  9 23:10:26 pve2.grote.lan kernel: handle_mm_fault+0x1516/0x1a70
Jul  9 23:10:26 pve2.grote.lan kernel: do_user_addr_fault+0x1a3/0x450
Jul  9 23:10:26 pve2.grote.lan kernel: exc_page_fault+0x6c/0x150
Jul  9 23:10:26 pve2.grote.lan kernel: ? asm_exc_page_fault+0x8/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: asm_exc_page_fault+0x1e/0x30
Jul  9 23:10:26 pve2.grote.lan kernel: RIP: 0033:0x7f70aece7c6f
Jul  9 23:10:26 pve2.grote.lan kernel: Code: Unable to access opcode bytes at RIP 0x7f70aece7c45.
Jul  9 23:10:26 pve2.grote.lan kernel: RSP: 002b:00007f70ab1b5b20 EFLAGS: 00010293
Jul  9 23:10:26 pve2.grote.lan kernel: RAX: 0000000000008000 RBX: 000055c79a334300 RCX: 00007f70aece7c6f
Jul  9 23:10:26 pve2.grote.lan kernel: RDX: 0000000000008000 RSI: 000055c79a354500 RDI: 0000000000000028
Jul  9 23:10:26 pve2.grote.lan kernel: RBP: 000055c79a2aa9d0 R08: 0000000000000000 R09: 00000000ffffffff
Jul  9 23:10:26 pve2.grote.lan kernel: R10: 0000000000138000 R11: 0000000000000293 R12: 00007f70ab1b5ba0
Jul  9 23:10:26 pve2.grote.lan kernel: R13: 000055c79a2aaa08 R14: 000055c79a2ec890 R15: 00007f70ae7ea4f0
Jul  9 23:10:26 pve2.grote.lan kernel: memory: usage 2098124kB, limit 2097152kB, failcnt 121146
Jul  9 23:10:26 pve2.grote.lan kernel: swap: usage 65536kB, limit 65536kB, failcnt 6
Jul  9 23:10:26 pve2.grote.lan kernel: Memory cgroup stats for /lxc/109:

Jul  9 23:10:26 pve2.grote.lan kernel: anon 2100375552#012file 7974912#012kernel_stack 2654208#012pagetables 8921088#012percpu 1489152#012sock 4157440#012shmem 2838528#012file_mapped 6352896#012file_dirty 540672#012file_writeback 0#012anon_thp 0#012file_thp 0#012shmem_thp 0#012inactive_anon 617844736#012active_anon 1502101504#012inactive_file 1949696#012active_file 1724416#012unevictable 0#012slab_reclaimable 6146368#012slab_unreclaimable 8557584#012slab 14703952#012workingset_refault_anon 4719#012workingset_refault_file 20742#012workingset_activate_anon 4620#012workingset_activate_file 18475#012workingset_restore_anon 0#012workingset_restore_file 15869#012workingset_nodereclaim 10710#012pgfault 46305758#012pgmajfault 742136#012pgrefill 105541#012pgscan 1554942#012pgsteal 677806#012pgactivate 476127#012pgdeactivate 97780#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 0#012thp_collapse_alloc 0
Jul  9 23:10:26 pve2.grote.lan kernel: Tasks state (memory values in pages):
Jul  9 23:10:26 pve2.grote.lan kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jul  9 23:10:26 pve2.grote.lan kernel: [1303878]     0 1303878    25778      662    81920        0             0 systemd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304144]     0 1304144      716       36    49152        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304146]     0 1304146      716       37    40960        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304488]     0 1304488     9538      123    61440        0             0 master
Jul  9 23:10:26 pve2.grote.lan kernel: [1304490]   104 1304490     9618      126    69632        0             0 qmgr
Jul  9 23:10:26 pve2.grote.lan kernel: [ 720641]   104 720641     9605      125    61440        0             0 pickup
Jul  9 23:10:26 pve2.grote.lan kernel: [1303956]     0 1303956    72939     5564   548864        0          -250 systemd-journal
Jul  9 23:10:26 pve2.grote.lan kernel: [1303994]   101 1303994     6653      232    77824        0             0 systemd-network
Jul  9 23:10:26 pve2.grote.lan kernel: [1304029]   102 1304029     5990     1035    81920        0             0 systemd-resolve
Jul  9 23:10:26 pve2.grote.lan kernel: [1304104]     0 1304104    59560     1392   102400      210             0 accounts-daemon
Jul  9 23:10:26 pve2.grote.lan kernel: [1304126]     0 1304126     1006       52    49152        0             0 atd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304149]   113 1304149     3312       88    49152        0             0 chronyd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304153]   113 1304153     1267       58    49152        0             0 chronyd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304107]     0 1304107     1012       78    45056        0             0 cron
Jul  9 23:10:26 pve2.grote.lan kernel: [1304110]   103 1304110     1943      233    53248        0          -900 dbus-daemon
Jul  9 23:10:26 pve2.grote.lan kernel: [1304127]     0 1304127    98398     2585   131072      182             0 f2b/server
Jul  9 23:10:26 pve2.grote.lan kernel: [1304183]     0 1304183     4468     1610    77824      766             0 munin-node
Jul  9 23:10:26 pve2.grote.lan kernel: [1304115]     0 1304115     6613     1742    86016      196             0 networkd-dispat
Jul  9 23:10:26 pve2.grote.lan kernel: [1304116]     0 1304116     7748      526    90112        0             0 nmbd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304142]     0 1304142     3073      244    61440        0         -1000 sshd
Jul  9 23:10:26 pve2.grote.lan kernel: [1304124]     0 1304124     4269      313    69632        0             0 systemd-logind
Jul  9 23:10:26 pve2.grote.lan kernel: [1304143]     0 1304143      716       36    45056        0             0 agetty
Jul  9 23:10:26 pve2.grote.lan kernel: [1304163]     0 1304163    26310     1710   106496      213             0 unattended-upgr
Jul  9 23:10:26 pve2.grote.lan kernel: [  14568]     0 14568   285600      506  1540096        0             0 rsyslogd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 212985]     0 212985    11543      646   131072      399             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213000]     0 213000    10952      290   122880      390             0 smbd-notifyd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213001]     0 213001    10956      295   122880      383             0 cleanupd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213003]     0 213003    11539      281   126976      434             0 lpqd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213158]     0 213158    23590     1356   163840     1022             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213443]     0 213443    11604      516   131072      438             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213460]     0 213460    11604      345   131072      530             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213461]     0 213461    70716     1222   180224     1292             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213474]     0 213474    11604      499   131072      451             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [ 213483]     0 213483   965471   495890  4644864     8438             0 smbd
Jul  9 23:10:26 pve2.grote.lan kernel: [1306874]  1001 1306874     4539      309    69632        0             0 systemd
Jul  9 23:10:26 pve2.grote.lan kernel: [1306875]  1001 1306875     5663      578    77824      126             0 (sd-pam)
Jul  9 23:10:26 pve2.grote.lan kernel: [1307016]  1001 1307016     1248      220    49152        3             0 tmux: server
Jul  9 23:10:26 pve2.grote.lan kernel: [1307017]  1001 1307017     1379      438    53248        0             0 bash
Jul  9 23:10:26 pve2.grote.lan kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0,oom_memcg=/lxc/109,task_memcg=/lxc/109/ns/system.slice/smbd.service,task=smbd,pid=213483,uid=0
Jul  9 23:10:26 pve2.grote.lan kernel: Memory cgroup out of memory: Killed process 213483 (smbd) total-vm:3861884kB, anon-rss:1982276kB, file-rss:0kB, shmem-rss:1284kB, UID:0 pgtables:4536kB oom_score_adj:0
Jul  9 23:10:26 pve2.grote.lan kernel: oom_reaper: reaped process 213483 (smbd), now anon-rss:0kB, file-rss:0kB, shmem-rss:1284kB
Jul  9 23:10:26 pve2.grote.lan kernel: CIFS: VFS: \\192.168.2.36 Error -32 sending data on socket to server
Jul  9 23:10:27 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online
Jul  9 23:10:27 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online
Jul  9 23:10:37 pve2.grote.lan pvestatd[4853]: storage 'SMB_FILESERVER2' is not online

Memory Consumption of Host:
2021-07-10 10_10_57-Greenshot.png
 
I have build the "Fileserver"-LXC new and set
Code:
zfs zfs_arc_max
to 8 instead 16GB. Will see next night if it works.
 
Seems to be working, i think ZFS wasnt freeing the Memory fast enough, so that the OOM-Killer had too kill some processes.
 
After moving the VM to another node it works again...

This is the VM:
Code:
mg@pve3:~$ cat /etc/pve/qemu-server/116.conf
cat: /etc/pve/qemu-server/116.conf: Permission denied
mg@pve3:~$ sudo cat /etc/pve/qemu-server/116.conf
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 8192
name: docker
net0: virtio=D6:FB:A1:81:F7:BE,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: ZFS_VM_ZVOL:vm-116-disk-0,discard=on,format=raw,size=150G
scsihw: virtio-scsi-pci
smbios1: uuid=1318b43b-543e-4ba2-8286-bd2fab148f7b
sockets: 1
startup: order=200
vmgenid: f54f385c-864b-4784-9bf6-c4f037103c6d
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!