Issues with getting LXC container to start again.

mediocre

New Member
Nov 27, 2024
2
1
3
I had an unresponsive container, and wasn't able to stop it. So i killed the lxc-start process. Afterwards it will not boot again and I cannot figure out why. A Reboot of the host machine doesn't seem to work either. This is a cluster setup, and the storage is setup on a ceph pool.

Here's the debug log when trying to start it:
Code:
lxc-start 221 20241127215714.642 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 221 20241127215714.642 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 221 20241127215714.642 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
lxc-start 221 20241127215714.643 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "221", config section "lxc"

Code:
 pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-7
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
pve-kernel-5.4: 6.4-5
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.60-1-pve: 5.15.60-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.101-1-pve: 5.4.101-1
pve-kernel-4.15: 5.4-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 17.2.7-pve3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

Here Is the lxc.conf
Code:
cat /etc/pve/lxc/221.conf
arch: amd64
cores: 8
hostname: hostname
memory: 16384
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=D6:32:EC:B5:F1:F3,ip=dhcp,ip6=dhcp,type=veth
ostype: ubuntu
rootfs: nvme:vm-221-disk-0,size=32G
swap: 4096
unprivileged: 1

Anyone have an idea to try here?
 
So after playing with this for awhile, i found that the host machine it was on still had its RBD volume mapped. And it wouldn't unmap it. Just showing it as being in use/busy. Trying to force unmap it just hung forever and it never let it go. Ended up having to reboot the host node that had the volume mapped. After that node went down I was able to get the CT to start on another node right away.

So the issue seems to be that the rbd volume gets stuck for whatever reason and the host cant unmap it. Does anyone have any idea what might cause that, or a fix going forward that doesnt require rebooting the host node?
 
  • Like
Reactions: Kingneutron