Rebooting a CT makes the whole thing stall

neuron

Active Member
Mar 15, 2019
25
3
43
43
Having an urgent issue here where one of my nodes is stalled out (at least in the web UI). All I did was restart one of my containers, a simple cPanel-DNSONLY box due to upgrade and it's stalling out the whole node where none of the names show and it's all grayed out for the VMs and CTs. Thankfully 'qm list' shows all the VMs and qm status <vmid> shows 'running' for all of them. However when I do 'pct list' or 'pct status 105' it just hangs. I've tried the following:

root@pve1:~# pct stop 105
trying to acquire lock...
can't lock file '/run/lock/lxc/pve-config-105.lock' - got timeout
root@pve1:~# pct stop 105 --skiplock
trying to acquire lock...
can't lock file '/run/lock/lxc/pve-config-105.lock' - got timeout
root@pve1:~# pct unlock 105
trying to acquire lock...
can't lock file '/run/lock/lxc/pve-config-105.lock' - got timeout

Is there anything else I can try short of rebooting the node which ideally I would like to avoid since these are production VMs running?
 
Package versions:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-11 (running version: 5.3-11/d4907f84)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-47
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-23
pve-cluster: 5.0-33
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Hi,

first, update your system, we can only help you with the current version and do no support outdated ones.
But I guess you got a process in your container what is uninterruptable like waiting for IO.
 
I ended up rebooting the whole node after hours. Had no choice. I have yet to update it too.

I have never felt the containers as being reliable on Proxmox. Wishing I built all my Linux CTs as VMs instead. Am I wrong?
 
Normally programs should not block.
So if your programs often do this a KVM is a better fitting technic.
But this does not say PVE CT is not reliable.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!