The server freezes when LXC backup running

Jun 2, 2016
19
0
21
48
Hi, All!
The server freezes when LXC backup running. it's not network issue, and the DiskIO / CPU load is not high.

syslog.png summary.png syslog.png summary.png

# pveversion -v
proxmox-ve: 5.1-41 (running kernel: 4.13.8-3-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13.13-6-pve: 4.13.13-41
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-1-pve: 4.4.35-77
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-3
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-3
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9


Storage: NAS 10Gbit/S.
 
Kernel pve-kernel-4.13.13-6-pve: 4.13.13-41 ist installiert, aber offenbar wurde der reboot vergessen?
(running kernel: 4.13.8-3-pve)
 
I think this is a kernel issue with LXC. Not backup.
I also have had several crashes on LXC container that runs a high load VPN.
Latest software and kernels.
I tried to move it to different server with no change.
High-Bandwidth load on LXC can crash host kernel.
Now why does this sound so familiar?
 
Last edited:
Crashed again last night.

Here is one of the last log entries from crashed VPN server log. I'm wondering why the client log is talking about KVM when it's on LXC container?

Code:
vpn kernel: kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable
 
Im witnessing a lot of crashes recently on my primary LXC host too but what it seems to be fully random after phases of high IO load (e.g. vzdump). I do get almost daily crash-reboots (freeze) without any indication of the error, just everything stops and a few minutes later => Reboot (like in initials post screenshot).
Albeit running it nested in a KVM, this has never been an issue in the past years under various hypervisors (e.g. KVM, Hyper-V).
I do get almost daily crash-reboots without any indication of the error. I just see that it could be related to ZFS / ARC.
I do have a high usage for SLAB / Unreclaimable with a constant increase of ecryptfs_auth_tok_list_item. I have already limited ARC max size to prevent any memory shortage. Eventually also a topic with corosync, but thats pure speculations.
KVM is patched against Meltdown, Proxmox running on latest kernel.

pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13: 5.1-42
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-3
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
pve-zsync: 1.6-15
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9
 
Both server hardware that I tested the VPN LXC had plenty of free resources so it's not those. Even Bandwidth was not tapping out (barely 20M/s. I'm thinking it's ones again a kernel/bridge issue with mis-handling of TX packet's.
Wouldn't be the first time!
The odd thing about this is that there are no kernel panic messages anywhere. When I look at the dead host on ILO there is nothing exceptional on the screen. Just dead login for Proxmox.
 
Last edited:
im still experiencing many "reboots" (crashes) on my kvm-machine - even when the server is idling.
As i have seen many similar topics in this forum (unintended reboots) in the last couple of days there could be a broader issue.
I see that there was a new kernel released earlier today, will update to this. Im also considering moving to the 4.15 - can't really handle almost daily random reboots.
Im still suspecting ZFS in combination with mem management/meltdown issues, but i lack evidence.
https://github.com/zfsonlinux/zfs/issues/7335
 
Im using Proxmox inside a KVM, but thats nothing new, i have been running it nested on Parallels, KVM and Hyper-V for years, self hosted and and even hosted within OVHs public cloud (mostly container based ofc).
My CPU:Xeon E5-2680V4
Network: Virtio (could change this) to e1000 or RTL..
HDD: SCSI
Host and Proxmox are patched against meltdown/spectre.
 
I moved the VPN LXC container on a nested (KVM) Proxmox for better results.
There were no problems for many days until last night.
NestedProxmox.png
 
hmm, i do have 2 LXC containers running OpenVPN on my affected host. My other Proxmox servers are stable (nested and HW).
So maybe its something with the VPN ?
BTW: I upgraded to 4.15. line 2 days, lets see.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!