Proxmox crash during backup leading to bind mount not working

Bubbagump210

Member
Oct 1, 2020
53
33
23
45
I woke up today with an odd cascade of events. First, one node of a cluster showed gray question marks for all objects. Interestingly enough the node showing all gray was also the one whose web console I was connected to. I rebooted the node and this issue resolved itself. Next, one LXC container showed it was locked and would not boot. I ran `pct unlock ###` which did not help. I then removed the vzdump lines in the lxc.conf. Now the LXC would try to start. At this point it started to complain that one of the bind mounts would not mount. The LXC in question has two bind mounts. Removing the one allows the LXC to start so it is definitively an issue with a specific bind mount. The bind mount exists, I can look at the files, zpool status and a scrub show no issues. I haven't run any updates for quite some time, so nothing changed that I can think of. Any ideas how I can get this container to boot?


Code:
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2444 - Mountflags already were 4096, skipping remount
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2479 - Mounted "/T340-HDD/bindmounts/downloader" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/downloader" with filesystem type "none"
ERROR    utils - ../src/lxc/utils.c:safe_mount:1221 - Invalid argument - Failed to mount "/T340-HDD/bindmounts/transientdata" onto "/usr/lib/x86_64-linux-gnu/lxc/rootfs/transientdata"
ERROR    conf - ../src/lxc/conf.c:mount_entry:2410 - Invalid argument - Failed to mount "/T340-HDD/bindmounts/transientdata" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/transientdata"
ERROR    conf - ../src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
ERROR    start - ../src/lxc/start.c:do_start:1275 - Failed to setup container "101"
ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
DEBUG    network - ../src/lxc/network.c:lxc_delete_network:4173 - Deleted network devices
ERROR    start - ../src/lxc/start.c:__lxc_start:2074 - Failed to spawn container "101"
WARN     start - ../src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 16 for process 2744719
startup for container '101' failed
 
Interestingly enough, if I create a new ZFS data set and migrate the data to it, it mounts fine (my current quick and dirty fix). What could possibly "taint" a data set to make it cause LXC to fail?

Config for what its worth:

Code:
arch: amd64
cores: 2
features: nesting=1
hostname: Downloader
memory: 4096
nameserver: 192.168.30.10 192.168.30.11
net0: name=eth0,bridge=vmbr0,gw=192.168.10.1,hwaddr=9A:63:2B:8C:16:77,ip=192.168.10.5/24,tag=10,type=veth
onboot: 1
ostype: debian
rootfs: T340-SSD:subvol-101-disk-1,size=20G
searchdomain: localdomain
swap: 512
unprivileged: 1
lxc.mount.entry: /T340-HDD/bindmounts/downloader downloader none bind,create=dir 0 0
lxc.mount.entry: /T340-HDD/bindmounts/transientdata transientdata none bind,create=dir 0 0
 
Last edited:
I figured my issue out. Somehow a snapshot related to the dataset was mounted for reasons I can’t explain.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!