[Workaround] PVE 9.2 / Ceph Tentacle: LXC containers fail to start with "rbd: sysfs write failed" — OSDs advertise v1-only public addr

MrJoshuaP

New Member
Jun 27, 2026
2
0
1
Posting this in case it helps others on PVE 9.2 + Ceph Tentacle, and to ask whether the underlying behavior is a bug. Full disclosure, I heavily relied on AI to troubleshoot and resolve (albeit with a workaround) and draft this post.

After a cluster-wide cold boot, all my LXC containers backed by Ceph RBD refused to start with rbd: sysfs write failed / exit status 110. VMs on the same pool were completely unaffected. Root cause turned out to be a messenger-protocol mismatch: my OSDs publish a v1-only public address in the OSDMap, while Tentacle's rbd map now defaults to msgr2 — so krbd can't find a v2 address and aborts. A ceph.conf workaround fixes it; I think the v1-only public advertisement may be a Tentacle bug.

My environment is:
  • Proxmox VE 9.2.3, kernel 7.0.12-1-pve
  • Ceph 20.2.1-pve1 Tentacle (hyperconverged), 5-node cluster, 3 nodes running Ceph
  • Separate public (192.168.111.0/24) and cluster (192.168.123.0/24) networks
  • Cluster originally built on an earlier Ceph (Squid) release and upgraded to Tentacle
After a power failure, everything came back online as expected, with the exception of my LXC containers. When I tried to start them, I was presented with the following error:

Code:
pct start <vmid>
...
rbd: sysfs write failed
can't map rbd volume vm-<vmid>-disk-0: rbd: sysfs write failed
Script exited with status 110

dmesg showed:

Code:
libceph: mon1 (2)192.168.111.11:3300 session established
libceph: no match of type 2 in addrvec
libceph: corrupt full osdmap (-2) epoch <N> off <X>
libceph: osdc handle_map corrupt msg

The kernel connects to the mon over msgr2 (the (2)…:3300), then fails decoding the OSDMap because it can't find a type-2 (v2) address for an OSD.

The cluster is healthy and the config is standard. The OSDs bind and listen on v2 sockets on the public interface, and the OSD metadata reports v2 — but the published addrvec in the OSDMap is v1-only:

Code:
# ceph osd metadata 5 | grep front_addr
"front_addr": "[v2:192.168.111.10:6802/...,v1:192.168.111.10:6803/...]",   <-- v2 present

# ceph osd find 5
"addrs": { "addrvec": [ { "type": "v1", "addr": "192.168.111.10:6803", ... } ] }   <-- v2 missing

ceph osd dump confirms it for every OSD: the public-network address is bare v1:, while the cluster-network address is a full [v2:…,v1:…] addrvec. ms_bind_msgr2, ms_bind_ipv4 are true, ms_bind_ipv6 is false, mons advertise both v1+v2 correctly, and there are no stray public_addr lines on the OSDs (only the standard per-mon ones).

Confirmed it's purely a messenger mismatch — a manual map with legacy mode connects over v1 and works fine:

Code:
# rbd map <pool>/vm-<vmid>-disk-0 -o ms_mode=legacy
/dev/rbd0       <-- success

Why only LXC, and why after a reboot? VMs map via librbd (userspace), which tolerates the v1-only public addrs. LXC uses kernel krbd, which — combined with Tentacle's rbd device map now defaulting to msgr2 — strictly requires v2 and aborts. It only surfaced after the cold boot because that's when the OSDs first restarted onto Tentacle and (re)published the v1-only addrvec; already-running containers had been coasting on maps made before the upgrade.

The workaround to resolve the issue is to use legacy mode.

Add rbd_default_map_options = ms_mode=legacy to /etc/pve/ceph.conf under [client]:

INI:
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_map_options = ms_mode=legacy

This makes every rbd map (including the ones Proxmox issues for containers) default to msgr1. After this, both a bare rbd map and pct start succeed. No daemon restart needed; it only affects krbd maps (LXC), not VMs.

Is the v1-only public addrvec expected on Tentacle, or a bug? The OSDs clearly bind v2 on the public network and report it in metadata, yet the OSDMap publishes only v1 for the public address — which is what breaks krbd now that rbd map defaults to msgr2.

Has anyone else on PVE 9.2 + Tentacle seen this, and is there a way to make the OSDs publish the full v2+v1 public addrvec so the legacy workaround isn't needed?

Happy to provide full ceph osd dump, ceph mon dump, and ss output if useful.
 
Hi @MrJoshuaP

thanks for posting on the forum!

This is indeed unexpected and not a known issue.
I just tried reproducing it on my test cluster, upgrading it from PVE 8 to 9 and then from Squid to Tentacle, just as you wrote, but i couldn't see any missing addresses.

Were there any error messages during your upgrades?

Could you try restarting a single OSD service, provided your cluster is healthy and get the log for this timeframe.
systemctl restart ceph-osd@<OSD-ID>.service
cat /var/log/ceph/ceph-osd.<OSD-ID>.log
Also the monitor logs would be of interest
cat /var/log/ceph/ceph-mon.<node-name>.log

Yours sincerely
Jonas
 
@j.theisen

The live upgrade worked without issue or error that I noticed -- though I'm not entirely sure what I would have seen or should have looked for?

Everything was happy up until I had a power outage that forced a cold boot of all my ceph nodes. After all nodes were powered off, started back up and they came back online, that's when the problem happened. It seemed that the previous maps were used until the cold boot of the cluster.

In your testing, did you power off the cluster completely and turn it back on?

Originally, I thought there might have been some problem with the config, so I tried:
Code:
ceph config set osd ms_bind_msgr1 true
ceph config set osd ms_bind_msgr2 true

My cluster is healthy and things are working -- aside from the OSD v2 issue...

Code:
root@master-0:~# ceph status
  cluster:
    id:     c25e7160-ad62-49dd-95bb-71ffbb0d7067
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum master-0,master-1,master-2 (age 23h) [leader: master-0]
    mgr: master-1(active, since 23h), standbys: master-0, master-2
    mds: 1/1 daemons up, 2 standby
    osd: 6 osds: 6 up (since 24m), 6 in (since 4M)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 128.63k objects, 480 GiB
    usage:   1.3 TiB used, 9.6 TiB / 11 TiB avail
    pgs:     97 active+clean
 
  io:
    client:   741 B/s rd, 558 KiB/s wr, 1 op/s rd, 61 op/s wr

Attached are some troubleshooting outputs I ran as well as tailed ceph-osd and ceph-mon logs for a specific OSD's restart.
 

Attachments