I've noticed that PVE can be somewhat fickle when I restart or start containers.
I entered a console's container just now and ran "reboot". The container did not come back, and I noticed that the status of the container became inconsistent, i.e. the icon shows that it is running, but the stop and shutdown options are grayed out, as if it were not. Right clicking on it, however, does show active "stop" and "shutdown" options, so I tried to stop it.
At this point the node became unavailable. Other nodes show it grayed out with "?" next to it and the containers/vms/storage within it. If I try to log into the node directly, I get a "Login Failed. Please try again." message. I can ssh via putty, but I can't seem to do anything to remedy the situation or get better diagnostic info.
lxc-start -F 105 just hangs with no output, as does trying to restart the pve-container@105 service
I also have a kworker thread pinned at 100%, just like:
https://bugzilla.proxmox.com/show_bug.cgi?id=1943
I'm going to try a dist-upgrade. Assuming my issue is the same as the bug, does it persist on all versions of Proxmox 5?
I entered a console's container just now and ran "reboot". The container did not come back, and I noticed that the status of the container became inconsistent, i.e. the icon shows that it is running, but the stop and shutdown options are grayed out, as if it were not. Right clicking on it, however, does show active "stop" and "shutdown" options, so I tried to stop it.
At this point the node became unavailable. Other nodes show it grayed out with "?" next to it and the containers/vms/storage within it. If I try to log into the node directly, I get a "Login Failed. Please try again." message. I can ssh via putty, but I can't seem to do anything to remedy the situation or get better diagnostic info.
Code:
Apr 07 16:30:38 ads-main-proxmox-3 systemd[1]: Starting PVE LXC Container: 105...
Apr 07 16:30:41 ads-main-proxmox-3 ovs-vsctl[11138]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port veth105i0
Apr 07 16:30:41 ads-main-proxmox-3 ovs-vsctl[11139]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Start operation timed out. Terminating.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: Failed to start PVE LXC Container: 105.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Unit entered failed state.
Apr 07 16:32:08 ads-main-proxmox-3 systemd[1]: pve-container@105.service: Failed with result 'timeout'.
lxc-start -F 105 just hangs with no output, as does trying to restart the pve-container@105 service
Code:
# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
I also have a kworker thread pinned at 100%, just like:
https://bugzilla.proxmox.com/show_bug.cgi?id=1943
I'm going to try a dist-upgrade. Assuming my issue is the same as the bug, does it persist on all versions of Proxmox 5?