Issue with starting container - can't map rbd volume

d2600hz

New Member
Mar 12, 2024
17
3
3
So to preface - I was doing scheduled maintenance and migrating all VM's and this lone single container which we have on one of our clusters. It failed to stop properly and in the interests of getting the maintenance done, the box was rebooted soon afterwards. Now no matter what I do I cannot get this container to run (it's the only container we have).

We were in the process of fixing our network too - we had a mix of true public and private addresses and were fixing our monitors/managers to run specifically on the private networks (using the method of adding the private network as first network in ceph.conf and destroying and recreating the monitors and managers). This was followed by an upgrade to 8.41 on all hosts followed by a reboot. Only this container isn't working.

So simply on startup:
Code:
# pct start 105 --debug
run_buffer: 571 Script exited with status 110
lxc_init: 845 Failed to run lxc.hook.pre-start for container "105"
__lxc_start: 2034 Failed to initialize container "105"
0 hostid 100000 range 65536
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "105", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: In some cases useful info is found in syslog - try "dmesg | tail".

DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: rbd: sysfs write failed

DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: can't map rbd volume vm-105-disk-0: rbd: sysfs write failed

ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 110
ERROR    start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "105"
ERROR    start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "105"
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "105", config section "lxc"

And rbd info:

Code:
# rbd info ewr-pool/vm-105-disk-0
rbd image 'vm-105-disk-0':
        size 20 GiB in 5120 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 4cbfb3bc45c7ae
        block_name_prefix: rbd_data.4cbfb3bc45c7ae
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features:
        flags:
        create_timestamp: Wed Apr 24 01:48:04 2024
        access_timestamp: Wed Apr 24 01:48:04 2024
        modify_timestamp: Wed Apr 24 01:48:04 2024

And configuration of the CT:

Code:
# cat /etc/pve/lxc/105.conf
arch: amd64
cores: 2
features: nesting=1
hostname: shipyard-couch
memory: 1024
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.10.8.1,hwaddr=BC:24:11:22:37:80,ip=10.10.8.100/24,tag=108,type=veth
ostype: centos
rootfs: ewr-pool:vm-105-disk-0,size=20G,mountoptions=discard
swap: 512
unprivileged: 1

Also seeing this error in dmesg when you try and start it, but google fu is failing me today:

Code:
[ 2990.348616] libceph: another match of type 1 in addrvec
[ 2990.348621] libceph: problem decoding monmap, -22

Any clues where I can look next?