LXC can't start after force stop

voarsh

Member
Nov 20, 2020
218
19
23
28
Like this post: https://forum.proxmox.com/threads/u...ced-to-reboot-node-manually.57148/post-263512
It was necessary for me to actually kill the LXC start process using ps faxuw.

Unfortunately, now it won't turn back on.
Any ideas for me?

debug log:
lxc-start 140 20210222202600.401 INFO lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor lxc-start 140 20210222202600.401 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "140", config section "lxc" lxc-start 140 20210222202601.996 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 140 lxc pre-start produced output: failed to remove directory '/sys/fs/cgroup/devices/lxc/140/ns/docker/153f724dd7b304a4a9042b652ff11c4c0bb238ecfeb2de94626bcbd13e646704': Device or resource busy lxc-start 140 20210222202602.185 ERROR conf - conf.c:run_buffer:323 - Script exited with status 16 lxc-start 140 20210222202602.187 ERROR start - start.c:lxc_init:797 - Failed to run lxc.hook.pre-start for container "140" lxc-start 140 20210222202602.188 ERROR start - start.c:__lxc_start:1896 - Failed to initialize container "140" lxc-start 140 20210222202602.191 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "140", config section "lxc" lxc-start 140 20210222202602.524 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "140", config section "lxc" lxc-start 140 20210222202603.984 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 140 lxc post-stop produced output: umount: /var/lib/lxc/140/rootfs: not mounted lxc-start 140 20210222202603.984 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 140 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/140/rootfs' failed: exit code 1 lxc-start 140 20210222202604.728 ERROR conf - conf.c:run_buffer:323 - Script exited with status 1 lxc-start 140 20210222202604.739 ERROR start - start.c:lxc_end:964 - Failed to run lxc.hook.post-stop for container "140" lxc-start 140 20210222202604.746 ERROR lxc_start - tools/lxc_start.c:main:308 - The container failed to start lxc-start 140 20210222202604.750 ERROR lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options
 
Last edited:
hi,

lxc-start 140 20210222202601.996 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 140 lxc pre-start produced output: failed to remove directory '/sys/fs/cgroup/devices/lxc/140/ns/docker/153f724dd7b304a4a9042b652ff11c4c0bb238ecfeb2de94626bcbd13e646704': Device or resource busy

here's the error message that seems most relevant.

could you also post the container configuration? pct config CTID
 
here's the error message that seems most relevant.

could you also post the container configuration? pct config CTID
Yes I was thinking that too.
I tried to remove /sys/fs/cgroup/devices/lxc/140/* but got operation not permitted.
I might have to reboot the host, which is not really something I want to do at this time.

arch: amd64 cores: 32 features: nesting=1 hostname: API memory: 3412 net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.100.1,hwaddr=2E:F6:27:66:51:34,ip=192.168.100.15/24,type=veth onboot: 1 ostype: ubuntu rootfs: FourTBpveIPC2Expansion:140/vm-140-disk-0.raw,size=25G swap: 512
 
are you sure the container is dead?

please post:
* pct list
* ps aux | grep CTID
 
root@HPProliantDL360PGen8:~# pct list VMID Status Lock Name 100 running bitwarden 102 stopped test 106 stopped test2 111 stopped gitlab 116 running photoprism 123 stopped nfsspeedtestFourteenTBExpansionUSB4 124 stopped nfsspeedtest2 128 stopped Seagate1TBSpeedtest 132 stopped python 135 running zab2 136 stopped power 137 stopped tt-unity 138 stopped prometheus 140 stopped API 141 running fileserver 143 running homelabos 145 running nextcloud 146 running mayan 151 stopped kibitzr 152 running beehive 153 running api2 1299 stopped HPBay8speedtest

ps aux | grep CTID

root@HPProliantDL360PGen8:~# ps aux | grep 140 root 140 0.0 0.0 0 0 ? S Feb13 2:47 [ksoftirqd/21] root 1138 0.0 0.0 2140 1220 ? Ss Feb13 1:07 /usr/sbin/watchdog-mux root 1161 0.0 0.0 166756 2140 ? Ssl Feb13 0:00 /usr/sbin/zed -F 100111 6187 0.1 0.0 114088 6364 ? S Feb20 4:08 /usr/sbin/zabbix_server: preprocessing worker #2 started root 11140 0.3 0.0 22936 2072 ? S 13:03 0:04 /lib/systemd/systemd-udevd root 42355 0.0 0.0 6072 2452 pts/12 S+ 13:30 0:00 grep 140 root 43331 0.0 0.0 9512 2140 ? S 12:00 0:00 /usr/sbin/CRON -f root 45243 0.0 0.0 15848 1408 ? Ss Feb18 0:00 /usr/sbin/sshd -D daemon 62918 7.4 0.0 404140 15196 ? Sl Feb18 519:01 /usr/bin/python2.7 /usr/bin/pagekite --pidfile /var/run/pagekite.pid --clean --runas=daemon:daemon --logfile=/var/log/pagekite/pagekite.log --optdir=/etc/pagekite.d --noloop 100000 63578 0.0 0.0 108700 584 ? Sl Feb18 0:53 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/ecbf5b165181b3ac46def49f00cfc59be90ef62afd14005404d3fbf1cc7bfd33 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc 100000 64955 0.0 0.0 169588 5140 ? Ss Feb20 2:51 /sbin/init
 
ok can you show also: find /sys/fs/cgroup/devices/lxc/140?

you could probably remove these directories with find /sys/fs/cgroup/*/lxc/140* -depth -type d -print -delete if it's causing trouble for the container start
 
ok can you show also: find /sys/fs/cgroup/devices/lxc/140?

you could probably remove these directories with find /sys/fs/cgroup/*/lxc/140* -depth -type d -print -delete if it's causing trouble for the container start
I solved the specific issue I had by restarting the host machine, not ideal by any means.

However, I am having the same issue again, and I am not able to run find /sys/fs/cgroup/*/lxc/159* -depth -type d -print -delete
I continue to get Device or resource busy.
 
I have one particular Ubuntu container (my plex media server) which after more than two years of flawless performance, it'll just stop working at random, suddenly.. I can ssh into the container, do some basic commands and everything is responsive. Tried to do an apt update/upgrade, it just stopped responding. SSH'd into it again and issued 'sudo reboot' and it terminated my ssh session, however 30 min later I still couldn't ssh into it. Fire up the pve web portal, select ct101, console... it's a cursor that'll move around if you type, but the machine is deader than a doornail. Left it, came back 22 hours later and still hadn't restarted. Three hours of googling trying to figure out how the hell you can force kill a container and successfully restart it without rebooting the entire node, Zero success. Fine. Rebooted the node.

Cloned the container, got it up and running again, updated, etc. Three days go by and it does 100000000% the exact same thing, AGAIN. Here I sit in the exact same predicament with absolutely no idea how to begin fixing this. Hours looking through pve and the container's logs have turned up nothing. This box is a 2 cpu 12 core xeon with 128gb of ram (>50% ram free) that basically just idles all day long. Bare metal runs on 2 2tb SAS disks in a zpool mirror with 92% free, and a 50tb array of 8tb SAS disks in mirrored vdevs with >10% free space.

Anyone have a shot in the dark clue how to fix this? If rebuilding the entire container from scratch is the only way to resolve this, I swear by Zeus I'm scrapping the entirety of this god damn homelab.
 
I have one particular Ubuntu container (my plex media server) which after more than two years of flawless performance, it'll just stop working at random, suddenly.. I can ssh into the container, do some basic commands and everything is responsive. Tried to do an apt update/upgrade, it just stopped responding. SSH'd into it again and issued 'sudo reboot' and it terminated my ssh session, however 30 min later I still couldn't ssh into it. Fire up the pve web portal, select ct101, console... it's a cursor that'll move around if you type, but the machine is deader than a doornail. Left it, came back 22 hours later and still hadn't restarted. Three hours of googling trying to figure out how the hell you can force kill a container and successfully restart it without rebooting the entire node, Zero success. Fine. Rebooted the node.

Cloned the container, got it up and running again, updated, etc. Three days go by and it does 100000000% the exact same thing, AGAIN. Here I sit in the exact same predicament with absolutely no idea how to begin fixing this. Hours looking through pve and the container's logs have turned up nothing. This box is a 2 cpu 12 core xeon with 128gb of ram (>50% ram free) that basically just idles all day long. Bare metal runs on 2 2tb SAS disks in a zpool mirror with 92% free, and a 50tb array of 8tb SAS disks in mirrored vdevs with >10% free space.

Anyone have a shot in the dark clue how to fix this? If rebuilding the entire container from scratch is the only way to resolve this, I swear by Zeus I'm scrapping the entirety of this god damn homelab.
Never have this happen. Any backups to an earlier point where this doesn't happen?
 
Never have this happen. Any backups to an earlier point where this doesn't happen?
Unfortunately, I migrated from an older node to this new machine a couple months ago and forgot to configure periodic backups for this particular container. To be honest, the data is all still there, so it shouldn't be too terribly difficult to migrate the databases from the existing container to a new one, but it's still something I'd just rather not have to deal with. Guess I'll be monitoring this particular container for a bit and see what happens.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!