Ceph rbd mirroring Snapshot-based not working :'(

opt-jgervais

New Member
Apr 9, 2021
4
0
1
34
Hello,

I'm trying to setup Ceph rbd mirroring Snapshot-based between 2 Ceph clusters each installed from a different PVE cluster.
I call them pve-c1 and pve-c2. Anyone here already setup it successfully ? A this moment i only try a one-way replication from pve-c1 to pve-c2.

Proxmox VE 6.3-2
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)

Seems some people said to have 2 Ceph Clusters with the same name can be a problem but the official documentation says the opposite.
Note that rbd-mirror does not require the source and destination clusters to have unique internal names; both can and should call themselves ceph. The config files that rbd-mirror needs for local and remote clusters can be named arbitrarily, and containerizing the daemon is one strategy for maintaining them outside of /etc/ceph to avoid confusion.

Infrastructure POC
Cluster pve-c1

3 nodes :
node 1 pve-c1-n1
node 2 pve-c1-n2
node 3 pve-c1-n3

2 volumes :
c1-pool-hdd-1
c1-pool-ssd-1

Cluster pve-c2
3 nodes :
node 1 pve-c2-n1
node 2 pve-c2-n2
node 3 pve-c2-n3

3 volumes :
c2-pool-hdd-1
c2-pool-ssd-1
c1-pool-hdd-1

The replication is setup with the pool c1-pool-hdd-1 from pve-c1-n1 to pve-c2-n1.
Here are the steps I followed :

## Auth
# pve-c1
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.31:/etc/pve/priv/pve-c1.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.31:/etc/ceph/pve-c1.conf

# pve-c2
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.11:/etc/pve/priv/pve-c2.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.11:/etc/ceph/pve-c2.conf

## rbd-mirror
# pve-c2
Code:
systemctl enable ceph-rbd-mirror.target
cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
systemctl enable ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service
systemctl start ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service

## setup Image mode
# pve-c1
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c1
# pve-c2
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c2

## Peer setup
# pve-c2
Code:
rbd mirror pool peer add c1-pool-hdd-1 client.rbd-mirror-peer.pve-c1-n1@pve-c1 -n client.rbd-mirror-peer.pve-c2-n1
# Because the direction is setup rx-tx by default on pve-c2, i change the direction by rx-only (also KO if i let rx-tx)
Code:
rbd mirror pool peer set c1-pool-hdd-1 {ID} direction rx-only

## Enable snapshot replication of the image c1-pool-hdd-1/vm-500-disk-0 and c1-pool-hdd-1/vm-501-disk-0
# pve-c1
Code:
rbd mirror image enable c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image enable c1-pool-hdd-1/vm-501-disk-0 snapshot

I also did some manual snapshot of these images
Code:
rbd mirror image snapshot c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image snapshot c1-pool-hdd-1/vm-501-disk-0 snapshot

that's all.
Here are logs and information after my setup

Cluster pve-c1
Node pve-c1-n1

Code:
root@pve-c1-n1:~# ls /etc/ceph/
ceph.conf  pve-c1.conf    pve-c2.conf  rbdmap

Code:
root@pve-c1-n1:/etc/ceph# cat pve-c1.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 169.254.3.11/24
     fsid = b749dcbe-95a8-4e53-9c64-7aef7078cca0
     mon_allow_pool_delete = true
     mon_host = 169.254.2.11 169.254.2.12 169.254.2.13
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 169.254.2.11/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve-c1-n1]
     public_addr = 169.254.2.11

[mon.pve-c1-n2]
     public_addr = 169.254.2.12

[mon.pve-c1-n3]
     public_addr = 169.254.2.13

Code:
root@pve-c1-n1:~# ls /etc/pve/priv/
acme                           lock
authkey.key                       pve-c1.client.admin.keyring
authorized_keys                       pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
ceph                           pve-c1.mon.keyring
ceph.client.admin.keyring               pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.rbd-mirror-peer.pve-c1-n1.keyring  pve-root-ca.key
ceph.mon.keyring                   pve-root-ca.srl
known_hosts

Code:
root@pve-c1-n1:/etc/ceph# ceph auth get client.rbd-mirror-peer.pve-c1-n1
exported keyring for client.rbd-mirror-peer.pve-c1-n1
[client.rbd-mirror-peer.pve-c1-n1]
    key = AQAtUNRg53BkDBAAB2KHc050bkYGDp0jhZzk3A==
    caps mon = "profile rbd-mirror-peer"
    caps osd = "profile rbd"

Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c1-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c1-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd

Code:
root@pve-c1-n1:~# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0
vm-502-disk-0

Code:
root@pve-c1-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: rbd-mirror.pve-c1

Peer Sites:
UUID: c935a06f-b77d-4671-8f12-4a71d856c56e
Name: pve-c2
Mirror UUID: dc437fdd-7ec9-4c26-93bb-616d7b0f0832
Direction: tx-only

Code:
root@pve-c1-n1:~# rbd mirror pool status c1-pool-hdd-1  --verbose
health: WARNING
daemon health: UNKNOWN
image health: WARNING
images: 2 total
    2 starting_replay

DAEMONS
  none

IMAGES
vm-500-disk-0:
  global_id:   778ba6a3-9f55-468b-9c25-0c8f1d060a73

vm-501-disk-0:
  global_id:   6ee71be5-bdba-4e84-8040-6b2ceb164ee6

Code:
root@pve-c1-n1:~# rbd info c1-pool-hdd-1/vm-500-disk-0
rbd image 'vm-500-disk-0':
    size 8 GiB in 2048 objects
    order 22 (4 MiB objects)
    snapshot_count: 1
    id: bb7a2d40c0ff03
    block_name_prefix: rbd_data.bb7a2d40c0ff03
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Tue Jun 22 08:55:54 2021
    access_timestamp: Thu Jun 24 14:08:59 2021
    modify_timestamp: Thu Jun 24 16:31:11 2021
    mirroring state: enabled
    mirroring mode: snapshot
    mirroring global id: 778ba6a3-9f55-468b-9c25-0c8f1d060a73
    mirroring primary: true

Cluster pve-c2
Node pve-c2-n1

Code:
root@pve-c2-n1:~# ls /etc/ceph/
ceph.conf  pve-c1.conf    pve-c2.conf  rbdmap  token

Code:
root@pve-c2-n1:/etc/ceph# cat pve-c2.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 169.254.3.31/24
     fsid = 60c3b071-5d57-48fc-8eaa-f1da68a3d6ed
     mon_allow_pool_delete = true
     mon_host = 169.254.2.31 169.254.2.32 169.254.2.33
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 169.254.2.31/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve-c2-n1]
     public_addr = 169.254.2.31

[mon.pve-c2-n2]
     public_addr = 169.254.2.32

[mon.pve-c2-n3]
     public_addr = 169.254.2.33

Code:
root@pve-c2-n1:/etc/ceph# ls /etc/pve/priv/
acme                           lock
authkey.key                       pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
authorized_keys                       pve-c2.client.admin.keyring
ceph                           pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.admin.keyring               pve-c2.mon.keyring
ceph.client.rbd-mirror-peer.pve-c2-n1.keyring  pve-root-ca.key
ceph.mon.keyring                   pve-root-ca.srl
known_hosts

Code:
root@pve-c2-n1:~# ceph auth get client.rbd-mirror-peer.pve-c2-n1
exported keyring for client.rbd-mirror-peer.pve-c2-n1
[client.rbd-mirror-peer.pve-c2-n1]
    key = AQA3UNRggugSGBAAqvbOZDTGRqu9amLoIoEB7g==
    caps mon = "profile rbd-mirror-peer"
    caps osd = "profile rbd"

Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c2-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c2-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd
pool 10 'c1-pool-hdd-1' replicated size 1 min_size 1 crush_rule 3 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode warn last_change 833 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

For the moment, i setup the replicated pool from pve-c1 cluster to size 1 and min_size 1.

Code:
root@pve-c2-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: pve-c2

Peer Sites:
UUID: 5def7721-bb6a-44e8-8cd9-33c15c9ab1c3
Name: pve-c1
Direction: rx-only
Client: client.rbd-mirror-peer.pve-c1-n1

Code:
root@pve-c2-n1:~# rbd mirror pool status c1-pool-hdd-1  --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 2 total
    2 starting_replay

DAEMONS
service 8115656:
  instance_id: 8148072
  client_id: rbd-mirror-peer.pve-c2-n1
  hostname: pve-c2-n1
  version: 15.2.13
  leader: true
  health: OK

IMAGES
vm-500-disk-0:
  global_id:   778ba6a3-9f55-468b-9c25-0c8f1d060a73
  state:       down+unknown
  description: status not found
  last_update:

vm-501-disk-0:
  global_id:   6ee71be5-bdba-4e84-8040-6b2ceb164ee6
  state:       down+unknown
  description: status not found
  last_update:

As you can see, state value is "down+unknown"

Code:
root@pve-c2-n1:/etc/ceph# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0

Extract logs :
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status: replay not running
...
Before i also had this message but change direction to rx-only on pve-c2-n1 fix it (see above) :
init: failed to retrieve mirror peer uuid from remote pool

I also tried to setup rdb mirroring snapshot-base with the other way bootstrap but the import command failed.
# pve-c1
Bash:
root@pve-c1-n1:~# rbd mirror pool peer bootstrap create --site-name rbd-mirror.pve-c1 c1-pool-ssd-1 > token
root@pve-c1-n1:~# scp token root@192.168.2.31:/etc/ceph/

# pve-c2
Code:
root@pve-c2-n1:/etc/ceph# rbd --cluster pve-c2 mirror pool peer bootstrap import --site-name pve-c2 --direction rx-only c2-pool-ssd-1 token
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: failed to import peer bootstrap token


I also see the redhat documention explaining how to setup it when we have 2 cluster with the same name but change the variable CLUSTER seems not possible... The ceph cluster will failed.
root@pve-c2-n1:/etc/ceph# cat /etc/default/ceph
# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#

# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
#CLUSTER=pve-c2

Any suggestions ?

i followed many sources as https://docs.ceph.com/en/latest/rbd/rbd-mirroring/ & https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring & https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/

Jérémy.
 
hese is the file log /var/log/ceph/ceph-client.rbd-mirror-peer.pve-c2-n1.log (log level 20/20)
2021-06-24T18:33:32.519+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:32.519+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:32.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651335, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:32.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:32.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd3becf60)
2021-06-24T18:33:32.523+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:35.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] handle_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] handle_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status: replay not running
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] operator(): replay status ready: r=-11
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: waiting for replay status
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status: replay not running
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] operator(): replay status ready: r=-11
2021-06-24T18:33:35.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] set_mirror_image_status_update: waiting for replay status
2021-06-24T18:33:36.607+0200 7fb1188d6700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_task:
2021-06-24T18:33:36.607+0200 7fb1190d7700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 get_mirror_uuid:
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_get_mirror_uuid: r=0
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_get_mirror_uuid: remote_mirror_uuid=37272deb-bc36-481f-a4c8-7dc1bf44a703
2021-06-24T18:33:36.607+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 mirror_peer_ping:
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_ping: r=0
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 mirror_peer_list:
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_list: r=0
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 handle_mirror_peer_list: remote_mirror_peer_uuid=
2021-06-24T18:33:36.931+0200 7fb1068b2700 10 rbd::mirror::RemotePollPoller: 0x55fbd22d61a0 schedule_task:
2021-06-24T18:33:37.251+0200 7fb11c949580 20 rbd::mirror::Mirror: 0x55fbd22c6f80 run_cache_manager: tune memory
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 execute_timer_task:
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:37.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651336, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:37.523+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:37.523+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd4a8f7d0)
2021-06-24T18:33:37.523+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:42.251+0200 7fb11c949580 20 rbd::mirror::Mirror: 0x55fbd22c6f80 run_cache_manager: tune memory
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 execute_timer_task:
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 notify_heartbeat:
2021-06-24T18:33:42.523+0200 7fb1188d6700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.527+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: notify_id=3723736651337, handle=94540090344960, notifier_id=8148072
2021-06-24T18:33:42.527+0200 7fb10d8c0700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify: our own notification, ignoring
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: r=0
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 is_leader: 1
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 handle_notify_heartbeat: 1 acks received, 0 timed out
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::Instances: 0x55fbd3c1bb00 acked: instance_ids=[8148072]
2021-06-24T18:33:42.527+0200 7fb1190d7700 10 rbd::mirror::LeaderWatcher: 0x55fbd47b1500 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55fbd4a8ef00)
2021-06-24T18:33:42.527+0200 7fb1190d7700 5 rbd::mirror::Instances: 0x55fbd3c1bb00 handle_acked: instance_ids=[8148072]
2021-06-24T18:33:45.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] handle_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1188d6700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] handle_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] update_mirror_image_status: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::ImageReplayer: 0x55fbd4839680 [13/778ba6a3-9f55-468b-9c25-0c8f1d060a73] schedule_update_mirror_image_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 15 rbd::mirror::ImageReplayer: 0x55fbd4839180 [13/6ee71be5-bdba-4e84-8040-6b2ceb164ee6] set_mirror_image_status_update: force=0, state=--
2021-06-24T18:33:45.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status:
2021-06-24T18:33:45.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969000 get_replay_status: replay not running
 
Did you manage to get snapshot mirroring working with PVE wiki howto?
I'm facing the same issue and result:
Code:
1 starting_replay
 
Got a similar problem between 2 cluster in one-way replication.
This used to work with ceph pacifc before 16.2.11 then crash...
And now with ceph reef, it only crash.

Please see attached file for help.
 

Attachments

  • ceph-reef-rbd-mirror-crash.txt
    139.5 KB · Views: 0

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!