My Proxmox VE 8 does not seem to map all the RBD upon container start

nib_sib

New Member
Jan 28, 2025
2
0
1
Good day all,

Short description of the issue: inability to start containers consistently.
Context in which the problem arose: 7-node cluster recently upgraded from version 7 to 8 (I believe the issue did not occur before the upgrade); no PVE subscription.
Scope: All the containers (three in number, each of them having an RBD-based root FS and an RBD-based additional volume mp0) present on the fifth node.

After some tests I have come to describe the issue as my Proxmox VE 8 seemingly not mapping all RBD upon container start on my fifth node. Here are samples of the two most relevant messages I managed to obtain:
Code:
lxc pre-start produced output: /dev/rbd0
lxc pre-start produced output: failed to get device path

Code:
lxc pre-start produced output: /dev/rbd1
lxc pre-start produced output: mount: /var/lib/lxc/.pve-staged-mounts/mp0:
  special device /dev/rbd-pve/${cluster_id}/VirtualDisks/vm-122-disk-1 does not exist.

More details follow. Please note that in all the code blocks I have substituted ${cluster_id} for the literal cluster ID.

Thank you in advance for your help.
 
Last edited:
Reproducer when starting the container from the web UI:

Code:
root@node15:~# systemctl list-units | grep pve-container

root@node15:~#


Code:
root@node15:~# rbd device list

root@node15:~#


Code:
# I click on the start button of CT 122 in the web UI.

# The container fails to start and the task viewer yields:


run_buffer: 571 Script exited with status 2

lxc_init: 845 Failed to run lxc.hook.pre-start for container "122"

__lxc_start: 2034 Failed to initialize container "122"

TASK ERROR: startup for container '122' failed


Code:
root@node15:~# systemctl list-units | grep pve-container

● pve-container@122.service    loaded failed     failed    PVE LXC Container: 122


Code:
root@node15:~# rbd device list

id  pool          namespace  image          snap  device 

0   VirtualDisks             vm-122-disk-0  -     /dev/rbd0


At this point I can try to start the container again from the web UI: the start often fails and sometimes succeeds. However, if I rather map the RB device manually before trying to start it, the container always starts successfully:

Code:
root@node15:~# rbd device map --image vm-120-disk-1 --pool VirtualDisks

/dev/rbd1


Code:
root@node15:~# rbd device list

id  pool          namespace  image          snap  device 

0   VirtualDisks             vm-122-disk-0  -     /dev/rbd0

1   VirtualDisks             vm-120-disk-1  -     /dev/rbd1


Code:
# I click on the start button of CT 122 in the web UI.

# The container starts properly.


Reproducer when starting the container from the CLI (with -l DEBUG):


Code:
root@node15:~# rbd device list

root@node15:~#


Code:
root@node15:~# lxc-start -n 122 -F -l DEBUG -o /tmp/lxc-CT122.log

lxc-start: 122: ../src/lxc/utils.c: run_buffer: 571 Script exited with status 2

lxc-start: 122: ../src/lxc/start.c: lxc_init: 845 Failed to run lxc.hook.pre-start for container "122"

lxc-start: 122: ../src/lxc/start.c: __lxc_start: 2034 Failed to initialize container "122"

lxc-start: 122: ../src/lxc/utils.c: run_buffer: 571 Script exited with status 1

lxc-start: 122: ../src/lxc/start.c: lxc_end: 986 Failed to run lxc.hook.post-stop for container "122"

lxc-start: 122: ../src/lxc/tools/lxc_start.c: lxc_start_main: 307 The container failed to start

lxc-start: 122: ../src/lxc/tools/lxc_start.c: lxc_start_main: 312 Additional information can be obtained by setting the --logfile and --logpriority options


Code:
root@node15:~# cat /tmp/lxc-CT122.log

lxc-start 122 20250129215136.418 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 0 hostid 100000 range 65536

lxc-start 122 20250129215136.418 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 65536

lxc-start 122 20250129215136.419 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor

lxc-start 122 20250129215136.419 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "122", config section "lxc"

lxc-start 122 20250129215137.115 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 122 lxc pre-start produced output: /dev/rbd0


lxc-start 122 20250129215137.121 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 122 lxc pre-start produced output: failed to get device path


lxc-start 122 20250129215137.140 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 2

lxc-start 122 20250129215137.140 ERROR    start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "122"

lxc-start 122 20250129215137.140 ERROR    start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "122"

lxc-start 122 20250129215137.140 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "122", config section "lxc"

lxc-start 122 20250129215137.684 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/122/rootfs: not mounted


lxc-start 122 20250129215137.684 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/122/rootfs' failed: exit code 1


lxc-start 122 20250129215137.701 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1

lxc-start 122 20250129215137.701 ERROR    start - ../src/lxc/start.c:lxc_end:986 - Failed to run lxc.hook.post-stop for container "122"

lxc-start 122 20250129215137.701 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:307 - The container failed to start

lxc-start 122 20250129215137.701 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:312 - Additional information can be obtained by setting the --logfile and --logpriority options

Code:
root@node15:~# rbd device list

id  pool          namespace  image          snap  device 

0   VirtualDisks             vm-122-disk-0  -     /dev/rbd0


At this point I can try to start the container again from the CLI: the start often fails and sometimes succeeds. I assume I can rather map the RB device manually and then retry to start the container from the CLI with a 100% chance of success (like from the web UI) but I have not confirmed that yet (here I am looking for sharing more useful information).

Starting the container again (below I am showing an example of output when this second start fails)... :

Code:
root@node15:~# rm /tmp/lxc-CT122.log


Code:
root@node15:~# lxc-start -n 122 -F -l DEBUG -o /tmp/lxc-CT122.log

lxc-start: 122: ../src/lxc/utils.c: run_buffer: 571 Script exited with status 32

lxc-start: 122: ../src/lxc/start.c: lxc_init: 845 Failed to run lxc.hook.pre-start for container "122"

lxc-start: 122: ../src/lxc/start.c: __lxc_start: 2034 Failed to initialize container "122"

lxc-start: 122: ../src/lxc/tools/lxc_start.c: lxc_start_main: 307 The container failed to start

lxc-start: 122: ../src/lxc/tools/lxc_start.c: lxc_start_main: 312 Additional information can be obtained by setting the --logfile and --logpriority options


Code:
root@node15:~# cat /tmp/lxc-CT122.log

lxc-start 122 20250129215906.467 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 0 hostid 100000 range 65536

lxc-start 122 20250129215906.467 INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 65536

lxc-start 122 20250129215906.467 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor

lxc-start 122 20250129215906.467 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "122", config section "lxc"

lxc-start 122 20250129215907.166 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 122 lxc pre-start produced output: /dev/rbd1


lxc-start 122 20250129215907.176 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 122 lxc pre-start produced output: mount: /var/lib/lxc/.pve-staged-mounts/mp0: special device /dev/rbd-pve/${cluster_id}/VirtualDisks/vm-122-disk-1 does not exist.

       dmesg(1) may have more information after failed mount system call.


lxc-start 122 20250129215907.176 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 122 lxc pre-start produced output: command 'mount /dev/rbd-pve/${cluster_id}/VirtualDisks/vm-122-disk-1 /var/lib/lxc/.pve-staged-mounts/mp0' failed: exit code 32


lxc-start 122 20250129215907.194 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 32

lxc-start 122 20250129215907.194 ERROR    start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "122"

lxc-start 122 20250129215907.194 ERROR    start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "122"

lxc-start 122 20250129215907.194 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "122", config section "lxc"

lxc-start 122 20250129215907.751 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp0: not mounted.


lxc-start 122 20250129215907.751 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp0' failed: exit code 32


lxc-start 122 20250129215907.778 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp1: not mounted.


lxc-start 122 20250129215907.778 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp1' failed: exit code 32


lxc-start 122 20250129215907.781 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp3: not mounted.


lxc-start 122 20250129215907.782 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp3' failed: exit code 32


lxc-start 122 20250129215907.785 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp4: not mounted.


lxc-start 122 20250129215907.785 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp4' failed: exit code 32


lxc-start 122 20250129215907.789 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp2: not mounted.


lxc-start 122 20250129215907.789 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 122 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp2' failed: exit code 32


lxc-start 122 20250129215908.156 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "122", config section "lxc"

lxc-start 122 20250129215908.658 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:307 - The container failed to start

lxc-start 122 20250129215908.658 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:312 - Additional information can be obtained by setting the --logfile and --logpriority options


Code:
root@node15:~# rbd device list

root@node15:~#


Relevant pieces of metadata including those of one of the affected containers:

Code:
root@node15:~# pveversion --verbose

proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)

pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)

proxmox-kernel-helper: 8.1.0

proxmox-kernel-6.8: 6.8.12-7

proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7

ceph: 17.2.7-pve3

ceph-fuse: 17.2.7-pve3

corosync: 3.1.7-pve3

criu: 3.17.1-2+deb12u1

glusterfs-client: 10.3-5

ifupdown2: 3.2.0-1+pmx11

ksm-control-daemon: 1.5-1

libjs-extjs: 7.0.0-5

libknet1: 1.28-pve1

libproxmox-acme-perl: 1.5.1

libproxmox-backup-qemu0: 1.4.1

libproxmox-rs-perl: 0.3.4

libpve-access-control: 8.2.0

libpve-apiclient-perl: 3.3.2

libpve-cluster-api-perl: 8.0.10

libpve-cluster-perl: 8.0.10

libpve-common-perl: 8.2.9

libpve-guest-common-perl: 5.1.6

libpve-http-server-perl: 5.1.2

libpve-network-perl: 0.10.0

libpve-rs-perl: 0.9.1

libpve-storage-perl: 8.3.3

libspice-server1: 0.15.1-1

lvm2: 2.03.16-2

lxc-pve: 6.0.0-1

lxcfs: 6.0.0-pve2

novnc-pve: 1.5.0-1

proxmox-backup-client: 3.3.2-1

proxmox-backup-file-restore: 3.3.2-2

proxmox-firewall: 0.6.0

proxmox-kernel-helper: 8.1.0

proxmox-mail-forward: 0.3.1

proxmox-mini-journalreader: 1.4.0

proxmox-offline-mirror-helper: 0.6.7

proxmox-widget-toolkit: 4.3.4

pve-cluster: 8.0.10

pve-container: 5.2.3

pve-docs: 8.3.1

pve-edk2-firmware: 4.2023.08-4

pve-esxi-import-tools: 0.7.2

pve-firewall: 5.1.0

pve-firmware: 3.14-3

pve-ha-manager: 4.0.6

pve-i18n: 3.3.3

pve-qemu-kvm: 9.0.2-4

pve-xtermjs: 5.3.0-3

qemu-server: 8.3.6

smartmontools: 7.3-pve1

spiceterm: 3.3.0

swtpm: 0.8.0+pve1

vncterm: 1.8.0

zfsutils-linux: 2.2.7-pve1


Code:
root@node15:~# cat /etc/pve/storage.cfg | grep --after 4 '^rbd:'

rbd: VirtualDisks

        content images,rootdir

        krbd 0

        pool VirtualDisks


root@node15:~#


Code:
root@node15:~# pvesm status | grep rbd

VirtualDisks         rbd     active     55970596997     26822232197     29148364800   47.92%

Code:
root@node15:~# pvesm list VirtualDisks | grep vm-122

VirtualDisks:vm-122-disk-0 raw     rootdir    268435456000 122

VirtualDisks:vm-122-disk-1 raw     rootdir   6597069766656 122

Code:
root@node15:~# rbd list --pool VirtualDisks --long | grep vm-122

vm-122-disk-0   250 GiB            2          

vm-122-disk-1     6 TiB            2

Code:
root@node15:~# cat /etc/pve/lxc/122.conf

arch: amd64

cores: 5

features: nesting=1

hostname: XX-Y-ZZZ-BCK-2025-01

memory: 102400

mp0: VirtualDisks:vm-122-disk-1,mp=/media/1-mount,size=6T

net0: name=eth0,bridge=vmbr2,firewall=1,hwaddr=DB:2F:11:29:8E:64,ip=172.17.10.88/24,ip6=auto,type=veth

net1: name=eth1,bridge=vmbr3,firewall=1,hwaddr=DB:24:11:A7:EC:54,ip=dhcp,ip6=auto,type=veth

ostype: debian

rootfs: VirtualDisks:vm-122-disk-0,size=250G

searchdomain: sub11.dc-xyz.ca

swap: 5120

unprivileged: 1

lxc.cgroup2.devices.allow: c 195:* rwm

lxc.cgroup2.devices.allow: c 510:* rwm

lxc.cgroup2.devices.allow: c 226:* rwm

lxc.mount.entry: /dev/nvidia6 dev/nvidia6 none bind,optional,create=file

lxc.mount.entry: /dev/nvidia7 dev/nvidia7 none bind,optional,create=file

lxc.mount.entry: /dev/nvidia8 dev/nvidia8 none bind,optional,create=file

lxc.mount.entry: /dev/nvidia9 dev/nvidia9 none bind,optional,create=file

lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file

lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file

lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file

lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
 
Last edited: