Container freezes and cannot be started again - rootfs not mounted

Dr.A.Colian

New Member
Jan 27, 2022
7
0
1
38
Hey Community,

I have a container (Rocky Linux) for my smart home system. Unfortunately, the container kind of freezes after an internal backup - I cannot login to the console, and the system is not reachable. Shutdown or reboot stop with errors. pct stop 105 stops the container, but it's no possible to power it on again, because the startup script cannot mount rootfs.

Output from lxc-start -lDEBUG -o lxc105start.log -F -n 105
Code:
root@vm-server:~# cat lxc105start.log
lxc-start 105 20220723231434.955 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2267 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 105 20220723231434.963 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2267 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 105 20220723231434.963 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
lxc-start 105 20220723231434.963 INFO     conf - ../src/lxc/conf.c:run_script_argv:337 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "105", config section "lxc"
lxc-start 105 20220723231435.320 DEBUG    conf - ../src/lxc/conf.c:run_buffer:310 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: mount: /var/lib/lxc/.pve-staged-mounts/rootfs: cannot mount /dev/loop0 read-only.

lxc-start 105 20220723231435.397 DEBUG    conf - ../src/lxc/conf.c:run_buffer:310 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: command 'mount /dev/loop0 /var/lib/lxc/.pve-staged-mounts/rootfs' failed: exit code 32

lxc-start 105 20220723231435.404 ERROR    conf - ../src/lxc/conf.c:run_buffer:321 - Script exited with status 255
lxc-start 105 20220723231435.404 ERROR    start - ../src/lxc/start.c:lxc_init:847 - Failed to run lxc.hook.pre-start for container "105"
lxc-start 105 20220723231435.404 ERROR    start - ../src/lxc/start.c:__lxc_start:2008 - Failed to initialize container "105"
lxc-start 105 20220723231435.404 INFO     conf - ../src/lxc/conf.c:run_script_argv:337 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "105", config section "lxc"
lxc-start 105 20220723231435.677 DEBUG    conf - ../src/lxc/conf.c:run_buffer:310 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 105 lxc post-stop produced output: umount: /var/lib/lxc/105/rootfs: not mounted

lxc-start 105 20220723231435.677 DEBUG    conf - ../src/lxc/conf.c:run_buffer:310 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 105 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/105/rootfs' failed: exit code 1

lxc-start 105 20220723231435.682 ERROR    conf - ../src/lxc/conf.c:run_buffer:321 - Script exited with status 1
lxc-start 105 20220723231435.682 ERROR    start - ../src/lxc/start.c:lxc_end:988 - Failed to run lxc.hook.post-stop for container "105"
lxc-start 105 20220723231435.682 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:main:306 - The container failed to start
lxc-start 105 20220723231435.682 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:main:311 - Additional information can be obtained by setting the --logfile and --logpriority options

1. How can I get the container to start again without a complete server reboot?
2. Any idea how to find out why the container crashes? I've seen in the summary that CPU and network traffic went down directly after the backup, altough they have been on a steady level 'til then.

Thanks for your support!
 
Ok, tonight it happened again, and I looked at the dmesg -T output. The first part up to 14 seconds after midnight also happens during the day, but afterwards seems to be the problem. I don't really know what to do, check or whatever, so help is appreciated.

Thanks.

P.S.: Has anyone an idea how to export/pipe the dmesg output with colors? :)
 

Attachments

  • dmesg1.jpeg
    dmesg1.jpeg
    599.1 KB · Views: 4
  • dmesg2.jpeg
    dmesg2.jpeg
    796.2 KB · Views: 4
  • dmesg3.jpeg
    dmesg3.jpeg
    706.3 KB · Views: 4
  • dmesg4.jpeg
    dmesg4.jpeg
    640.3 KB · Views: 4
Hi,
the log output seems to indicate problems with the sda disk. Please check if it's fine (e.g. using smartctl) and if it's physically connected properly.
 
Ok, it seems, there are 5 Bad Sectors on the sda1 (1 TB SSD). So, I tried to move my containers and VMs. Unfortunately, three of them can't be moved.

One (CT) throws a ton of errors during the copy process.

One (VM) just goes to around 83% and states:
Code:
qemu-img: error while reading at byte 28839116800: Input/output error
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f qcow2 -O qcow2 /mnt/pve/VM_storage/images/104/vm-104-disk-0.qcow2 zeroinit:/mnt/pve/data/images/104/vm-104-disk-0.qcow2' failed: exit code 1

And one (CT) just says:
Code:
Formatting '/mnt/pve/data/images/108/vm-108-disk-0.raw', fmt=raw size=8589934592 preallocation=off
Creating filesystem with 2097152 4k blocks and 524288 inodes
Filesystem UUID: ba2fb8e0-a68a-4c45-bd12-baa63e23b892
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
mount: /var/lib/lxc/108/.copy-volume-2: cannot mount /dev/loop0 read-only.
Specified filename /var/lib/lxc/108/.copy-volume-1 does not exist.
TASK ERROR: command 'mount -o ro /dev/loop0 /var/lib/lxc/108/.copy-volume-2//' failed: exit code 32

Any chance I can rescue them?

Regards
 
I'd suggest not actively using the disk anymore and making an image copy of the whole disk with a tool like ddrescue. From there, you can try again to copy your guest images, and hope they are not too corrupted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!