Restarting an LXC crashing PVE

FuriousGeorge

Renowned Member
Sep 25, 2012
84
2
73
I've noticed that PVE can be somewhat fickle when I restart or start containers.

I entered a console's container just now and ran "reboot". The container did not come back, and I noticed that the status of the container became inconsistent, i.e. the icon shows that it is running, but the stop and shutdown options are grayed out, as if it were not. Right clicking on it, however, does show active "stop" and "shutdown" options, so I tried to stop it.

At this point the node became unavailable. Other nodes show it grayed out with "?" next to it and the containers/vms/storage within it. If I try to log into the node directly, I get a "Login Failed. Please try again." message. I can ssh via putty, but I can't seem to do anything to remedy the situation or get better diagnostic info.

Code:
Apr 07 16:30:38 ads-main-proxmox-3 systemd[1]: Starting PVE LXC Container: 105...
Apr 07 16:30:41 ads-main-proxmox-3 ovs-vsctl[11138]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port veth105i0
Apr 07 16:30:41 ads-main-proxmox-3 ovs-vsctl[11139]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Start operation timed out. Terminating.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: Failed to start PVE LXC Container: 105.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Unit entered failed state.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Failed with result 'timeout'.



lxc-start -F 105 just hangs with no output, as does trying to restart the pve-container@105 service


Code:
# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9


I also have a kworker thread pinned at 100%, just like:

https://bugzilla.proxmox.com/show_bug.cgi?id=1943


I'm going to try a dist-upgrade. Assuming my issue is the same as the bug, does it persist on all versions of Proxmox 5?
 
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve) pve-manager: 5.1-51 (running version: 5.1-51/96be5354)

That's like stone age old and has tons of security issues (e.g., the whole spectre/meltdown stuff got into the 4.15 kernel (which was released at start of 2018!!)) get you first to latest Proxmox VE 5.4. Else we cannot help at all.


Also note that Proxmox VE 6.1 is already released and the 5.x stable branch will be EOL in a few months..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!