RBD Mirror deamons stuck in warning state with broken rbd replication

Feb 24, 2025
2
0
1
I have 2 clusters with rbd mirroring setup between them following the documentation and for a while mirroring worked until the clusters were shutdown for 2 months, one brought back up and a few weeks later the other one brought up. since then even after removing and re-configuring the rbd mirroring the daemons are stuck in the warning state as per the below command outputs.



Code:
# rbd mirror pool status --verbose prod1
health: WARNING
daemon health: WARNING
image health: WARNING
images: 1 total
    1 unknown

DAEMONS
service 16719889:
  instance_id:
  client_id: pvesite1
  hostname: pvesite101
  version: 19.2.0
  leader: false
  health: OK


IMAGES
vm-10004-disk-0:
  global_id:   fb058f1d-a499-48bb-96dc-cbc076a9b731
  state:       down+unknown
  description: status not found
  last_update:

Code:
# rbd mirror pool status  prod1 --verbose
health: WARNING
daemon health: WARNING
image health: OK
images: 0 total

DAEMONS
service 21050634:
  instance_id:
  client_id: pvesite2
  hostname: pvesite201
  version: 19.2.0
  leader: false
  health: OK


IMAGES
 
this is the output of ceph -s, im running 19.2.0 with v6 onthe pub and priv networks, so im hitting the v6 subnet calculaton bug, however this issue started back in v18 so i dont think thats causing issues here

cluster:
id: 0a8145be-7e2c-4fcc-baef-2c292cd624b7
health: HEALTH_ERR
9 osds(s) are not reachable

services:
mon: 3 daemons, quorum pvesite101,pvesite102,pvesite103 (age 4d)
mgr: pvesite101(active, since 5d), standbys: pvesite102, pvesite103
mds: 1/1 daemons up, 2 standby
osd: 9 osds: 9 up (since 4d), 9 in (since 3M)
rbd-mirror: 1 daemon active (1 hosts)

data:
volumes: 1/1 healthy
pools: 6 pools, 161 pgs
objects: 300.79k objects, 886 GiB
usage: 2.6 TiB used, 13 TiB / 16 TiB avail
pgs: 161 active+clean

io:
client: 1.6 KiB/s rd, 306 KiB/s wr, 1 op/s rd, 24 op/s wr