Ceph rbd mirroring Snapshot-based not working :'(

opt-jgervais

New Member
Apr 9, 2021
4
0
1
35
Hello,

I'm trying to setup Ceph rbd mirroring Snapshot-based between 2 Ceph clusters each installed from a different PVE cluster.
I call them pve-c1 and pve-c2. Anyone here already setup it successfully ? A this moment i only try a one-way replication from pve-c1 to pve-c2.

Proxmox VE 6.3-2
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)

Seems some people said to have 2 Ceph Clusters with the same name can be a problem but the official documentation says the opposite.
Note that rbd-mirror does not require the source and destination clusters to have unique internal names; both can and should call themselves ceph. The config files that rbd-mirror needs for local and remote clusters can be named arbitrarily, and containerizing the daemon is one strategy for maintaining them outside of /etc/ceph to avoid confusion.

Infrastructure POC
Cluster pve-c1

3 nodes :
node 1 pve-c1-n1
node 2 pve-c1-n2
node 3 pve-c1-n3

2 volumes :
c1-pool-hdd-1
c1-pool-ssd-1

Cluster pve-c2
3 nodes :
node 1 pve-c2-n1
node 2 pve-c2-n2
node 3 pve-c2-n3

3 volumes :
c2-pool-hdd-1
c2-pool-ssd-1
c1-pool-hdd-1

The replication is setup with the pool c1-pool-hdd-1 from pve-c1-n1 to pve-c2-n1.
Here are the steps I followed :

## Auth
# pve-c1
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.31:/etc/pve/priv/pve-c1.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.31:/etc/ceph/pve-c1.conf

# pve-c2
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.11:/etc/pve/priv/pve-c2.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.11:/etc/ceph/pve-c2.conf

## rbd-mirror
# pve-c2
Code:
systemctl enable ceph-rbd-mirror.target
cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
systemctl enable ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service
systemctl start ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service

## setup Image mode
# pve-c1
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c1
# pve-c2
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c2

## Peer setup
# pve-c2
Code:
rbd mirror pool peer add c1-pool-hdd-1 client.rbd-mirror-peer.pve-c1-n1@pve-c1 -n client.rbd-mirror-peer.pve-c2-n1
# Because the direction is setup rx-tx by default on pve-c2, i change the direction by rx-only (also KO if i let rx-tx)
Code:
rbd mirror pool peer set c1-pool-hdd-1 {ID} direction rx-only

## Enable snapshot replication of the image c1-pool-hdd-1/vm-500-disk-0 and c1-pool-hdd-1/vm-501-disk-0
# pve-c1
Code:
rbd mirror image enable c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image enable c1-pool-hdd-1/vm-501-disk-0 snapshot

I also did some manual snapshot of these images
Code:
rbd mirror image snapshot c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image snapshot c1-pool-hdd-1/vm-501-disk-0 snapshot

that's all.
Here are logs and information after my setup

Cluster pve-c1
Node pve-c1-n1

Code:
root@pve-c1-n1:~# ls /etc/ceph/
ceph.conf  pve-c1.conf    pve-c2.conf  rbdmap

Code:
root@pve-c1-n1:/etc/ceph# cat pve-c1.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 169.254.3.11/24
     fsid = b749dcbe-95a8-4e53-9c64-7aef7078cca0
     mon_allow_pool_delete = true
     mon_host = 169.254.2.11 169.254.2.12 169.254.2.13
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 169.254.2.11/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve-c1-n1]
     public_addr = 169.254.2.11

[mon.pve-c1-n2]
     public_addr = 169.254.2.12

[mon.pve-c1-n3]
     public_addr = 169.254.2.13

Code:
root@pve-c1-n1:~# ls /etc/pve/priv/
acme                           lock
authkey.key                       pve-c1.client.admin.keyring
authorized_keys                       pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
ceph                           pve-c1.mon.keyring
ceph.client.admin.keyring               pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.rbd-mirror-peer.pve-c1-n1.keyring  pve-root-ca.key
ceph.mon.keyring                   pve-root-ca.srl
known_hosts

Code:
root@pve-c1-n1:/etc/ceph# ceph auth get client.rbd-mirror-peer.pve-c1-n1
exported keyring for client.rbd-mirror-peer.pve-c1-n1
[client.rbd-mirror-peer.pve-c1-n1]
    key = AQAtUNRg53BkDBAAB2KHc050bkYGDp0jhZzk3A==
    caps mon = "profile rbd-mirror-peer"
    caps osd = "profile rbd"

Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c1-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c1-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd

Code:
root@pve-c1-n1:~# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0
vm-502-disk-0

Code:
root@pve-c1-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: rbd-mirror.pve-c1

Peer Sites:
UUID: c935a06f-b77d-4671-8f12-4a71d856c56e
Name: pve-c2
Mirror UUID: dc437fdd-7ec9-4c26-93bb-616d7b0f0832
Direction: tx-only

Code:
root@pve-c1-n1:~# rbd mirror pool status c1-pool-hdd-1  --verbose
health: WARNING
daemon health: UNKNOWN
image health: WARNING
images: 2 total
    2 starting_replay

DAEMONS
  none

IMAGES
vm-500-disk-0:
  global_id:   778ba6a3-9f55-468b-9c25-0c8f1d060a73

vm-501-disk-0:
  global_id:   6ee71be5-bdba-4e84-8040-6b2ceb164ee6

Code:
root@pve-c1-n1:~# rbd info c1-pool-hdd-1/vm-500-disk-0
rbd image 'vm-500-disk-0':
    size 8 GiB in 2048 objects
    order 22 (4 MiB objects)
    snapshot_count: 1
    id: bb7a2d40c0ff03
    block_name_prefix: rbd_data.bb7a2d40c0ff03
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Tue Jun 22 08:55:54 2021
    access_timestamp: Thu Jun 24 14:08:59 2021
    modify_timestamp: Thu Jun 24 16:31:11 2021
    mirroring state: enabled
    mirroring mode: snapshot
    mirroring global id: 778ba6a3-9f55-468b-9c25-0c8f1d060a73
    mirroring primary: true

Cluster pve-c2
Node pve-c2-n1

Code:
root@pve-c2-n1:~# ls /etc/ceph/
ceph.conf  pve-c1.conf    pve-c2.conf  rbdmap  token

Code:
root@pve-c2-n1:/etc/ceph# cat pve-c2.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 169.254.3.31/24
     fsid = 60c3b071-5d57-48fc-8eaa-f1da68a3d6ed
     mon_allow_pool_delete = true
     mon_host = 169.254.2.31 169.254.2.32 169.254.2.33
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 169.254.2.31/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve-c2-n1]
     public_addr = 169.254.2.31

[mon.pve-c2-n2]
     public_addr = 169.254.2.32

[mon.pve-c2-n3]
     public_addr = 169.254.2.33

Code:
root@pve-c2-n1:/etc/ceph# ls /etc/pve/priv/
acme                           lock
authkey.key                       pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
authorized_keys                       pve-c2.client.admin.keyring
ceph                           pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.admin.keyring               pve-c2.mon.keyring
ceph.client.rbd-mirror-peer.pve-c2-n1.keyring  pve-root-ca.key
ceph.mon.keyring                   pve-root-ca.srl
known_hosts

Code:
root@pve-c2-n1:~# ceph auth get client.rbd-mirror-peer.pve-c2-n1
exported keyring for client.rbd-mirror-peer.pve-c2-n1
[client.rbd-mirror-peer.pve-c2-n1]
    key = AQA3UNRggugSGBAAqvbOZDTGRqu9amLoIoEB7g==
    caps mon = "profile rbd-mirror-peer"
    caps osd = "profile rbd"

Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c2-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c2-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd
pool 10 'c1-pool-hdd-1' replicated size 1 min_size 1 crush_rule 3 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode warn last_change 833 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

For the moment, i setup the replicated pool from pve-c1 cluster to size 1 and min_size 1.

Code:
root@pve-c2-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: pve-c2

Peer Sites:
UUID: 5def7721-bb6a-44e8-8cd9-33c15c9ab1c3
Name: pve-c1
Direction: rx-only
Client: client.rbd-mirror-peer.pve-c1-n1

Code:
root@pve-c2-n1:~# rbd mirror pool status c1-pool-hdd-1  --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 2 total
    2 starting_replay

DAEMONS
service 8115656:
  instance_id: 8148072
  client_id: rbd-mirror-peer.pve-c2-n1
  hostname: pve-c2-n1
  version: 15.2.13
  leader: true
  health: OK

IMAGES
vm-500-disk-0:
  global_id:   778ba6a3-9f55-468b-9c25-0c8f1d060a73
  state:       down+unknown
  description: status not found
  last_update:

vm-501-disk-0:
  global_id:   6ee71be5-bdba-4e84-8040-6b2ceb164ee6
  state:       down+unknown
  description: status not found
  last_update:

As you can see, state value is "down+unknown"

Code:
root@pve-c2-n1:/etc/ceph# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0

Extract logs :
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status: replay not running
...
Before i also had this message but change direction to rx-only on pve-c2-n1 fix it (see above) :
init: failed to retrieve mirror peer uuid from remote pool

I also tried to setup rdb mirroring snapshot-base with the other way bootstrap but the import command failed.
# pve-c1
Bash:
root@pve-c1-n1:~# rbd mirror pool peer bootstrap create --site-name rbd-mirror.pve-c1 c1-pool-ssd-1 > token
root@pve-c1-n1:~# scp token root@192.168.2.31:/etc/ceph/

# pve-c2
Code:
root@pve-c2-n1:/etc/ceph# rbd --cluster pve-c2 mirror pool peer bootstrap import --site-name pve-c2 --direction rx-only c2-pool-ssd-1 token
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: failed to import peer bootstrap token


I also see the redhat documention explaining how to setup it when we have 2 cluster with the same name but change the variable CLUSTER seems not possible... The ceph cluster will failed.
root@pve-c2-n1:/etc/ceph# cat /etc/default/ceph
# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#

# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
#CLUSTER=pve-c2

Any suggestions ?

i followed many sources as https://docs.ceph.com/en/latest/rbd/rbd-mirroring/ & https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring & https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/

Jérémy.
 
hese is the file log /var/log/ceph/ceph-client.rbd-mirror-peer.pve-c2-n1.log (log level 20/20)
2021-06-24T18:33:32.519+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:32.519+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:32.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651335, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:32.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd3becf60)
2021-06-24T18:33:32.523+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:35.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] handle_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] handle_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status: replay not running
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] operator(): replay status ready: r=-11
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: waiting for replay status
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status: replay not running
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] operator(): replay status ready: r=-11
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] set_mirror_image_status_update: waiting for replay status
2021-06-24T18:33:36.607+0200 7fb1188d6700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_task:
2021-06-24T18:33:36.607+0200 7fb1190d7700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 get_mirror_uuid:
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_get_mirror_uuid: r=0
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_get_mirror_uuid: remote_mirror_uuid=37272deb-bc36-481f-a4c8-7dc1bf44a703
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 mirror_peer_ping:
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_ping: r=0
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 mirror_peer_list:
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_list: r=0
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_list: remote_mirror_peer_uuid=
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 schedule_task:
2021-06-24T18:33:37.251+0200 7fb11c949580 20 rbd::mirror::Mirror: 0x55fbd22c6f80 run_cache_manager: tune memory
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 execute_timer_task:
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651336, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:37.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd4a8f7d0)
2021-06-24T18:33:37.523+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:42.251+0200 7fb11c949580 20 rbd::mirror::Mirror: 0x55fbd22c6f80 run_cache_manager: tune memory
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 execute_timer_task:
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.527+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651337, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:42.527+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd4a8ef00)
2021-06-24T18:33:42.527+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:45.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] handle_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] handle_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status: replay not running
 
Did you manage to get snapshot mirroring working with PVE wiki howto?
I'm facing the same issue and result:
Code:
1 starting_replay
 
Got a similar problem between 2 cluster in one-way replication.
This used to work with ceph pacifc before 16.2.11 then crash...
And now with ceph reef, it only crash.

Please see attached file for help.
 

Attachments