Hello,
I'm trying to setup Ceph rbd mirroring Snapshot-based between 2 Ceph clusters each installed from a different PVE cluster.
I call them pve-c1 and pve-c2. Anyone here already setup it successfully ? A this moment i only try a one-way replication from pve-c1 to pve-c2.
Proxmox VE 6.3-2
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)
Seems some people said to have 2 Ceph Clusters with the same name can be a problem but the official documentation says the opposite.
Infrastructure POC
Cluster pve-c1
3 nodes :
node 1 pve-c1-n1
node 2 pve-c1-n2
node 3 pve-c1-n3
2 volumes :
c1-pool-hdd-1
c1-pool-ssd-1
Cluster pve-c2
3 nodes :
node 1 pve-c2-n1
node 2 pve-c2-n2
node 3 pve-c2-n3
3 volumes :
c2-pool-hdd-1
c2-pool-ssd-1
c1-pool-hdd-1
The replication is setup with the pool c1-pool-hdd-1 from pve-c1-n1 to pve-c2-n1.
Here are the steps I followed :
## Auth
# pve-c1
# pve-c2
## rbd-mirror
# pve-c2
## setup Image mode
# pve-c1
# pve-c2
## Peer setup
# pve-c2
# Because the direction is setup rx-tx by default on pve-c2, i change the direction by rx-only (also KO if i let rx-tx)
## Enable snapshot replication of the image c1-pool-hdd-1/vm-500-disk-0 and c1-pool-hdd-1/vm-501-disk-0
# pve-c1
I also did some manual snapshot of these images
that's all.
Here are logs and information after my setup
Cluster pve-c1
Node pve-c1-n1
Cluster pve-c2
Node pve-c2-n1
For the moment, i setup the replicated pool from pve-c1 cluster to size 1 and min_size 1.
As you can see, state value is "down+unknown"
Extract logs :
I also tried to setup rdb mirroring snapshot-base with the other way bootstrap but the import command failed.
# pve-c1
# pve-c2
I also see the redhat documention explaining how to setup it when we have 2 cluster with the same name but change the variable CLUSTER seems not possible... The ceph cluster will failed.
root@pve-c2-n1:/etc/ceph# cat /etc/default/ceph
# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#
# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
#CLUSTER=pve-c2
Any suggestions ?
i followed many sources as https://docs.ceph.com/en/latest/rbd/rbd-mirroring/ & https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring & https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/
Jérémy.
I'm trying to setup Ceph rbd mirroring Snapshot-based between 2 Ceph clusters each installed from a different PVE cluster.
I call them pve-c1 and pve-c2. Anyone here already setup it successfully ? A this moment i only try a one-way replication from pve-c1 to pve-c2.
Proxmox VE 6.3-2
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)
Seems some people said to have 2 Ceph Clusters with the same name can be a problem but the official documentation says the opposite.
Note that rbd-mirror does not require the source and destination clusters to have unique internal names; both can and should call themselves ceph. The config files that rbd-mirror needs for local and remote clusters can be named arbitrarily, and containerizing the daemon is one strategy for maintaining them outside of /etc/ceph to avoid confusion.
Infrastructure POC
Cluster pve-c1
3 nodes :
node 1 pve-c1-n1
node 2 pve-c1-n2
node 3 pve-c1-n3
2 volumes :
c1-pool-hdd-1
c1-pool-ssd-1
Cluster pve-c2
3 nodes :
node 1 pve-c2-n1
node 2 pve-c2-n2
node 3 pve-c2-n3
3 volumes :
c2-pool-hdd-1
c2-pool-ssd-1
c1-pool-hdd-1
The replication is setup with the pool c1-pool-hdd-1 from pve-c1-n1 to pve-c2-n1.
Here are the steps I followed :
## Auth
# pve-c1
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.31:/etc/pve/priv/pve-c1.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.31:/etc/ceph/pve-c1.conf
# pve-c2
Code:
ceph auth get-or-create client.rbd-mirror-peer."$(echo $HOSTNAME)" mon 'profile rbd-mirror-peer' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/pve/priv/ceph.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring root@192.168.2.11:/etc/pve/priv/pve-c2.client.rbd-mirror-peer."$(echo $HOSTNAME)".keyring
scp /etc/ceph/ceph.conf root@192.168.2.11:/etc/ceph/pve-c2.conf
## rbd-mirror
# pve-c2
Code:
systemctl enable ceph-rbd-mirror.target
cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
systemctl enable ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service
systemctl start ceph-rbd-mirror@rbd-mirror-peer."$(echo $HOSTNAME)".service
## setup Image mode
# pve-c1
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c1
Code:
rbd mirror pool enable c1-pool-hdd-1 image --site-name pve-c2
## Peer setup
# pve-c2
Code:
rbd mirror pool peer add c1-pool-hdd-1 client.rbd-mirror-peer.pve-c1-n1@pve-c1 -n client.rbd-mirror-peer.pve-c2-n1
Code:
rbd mirror pool peer set c1-pool-hdd-1 {ID} direction rx-only
## Enable snapshot replication of the image c1-pool-hdd-1/vm-500-disk-0 and c1-pool-hdd-1/vm-501-disk-0
# pve-c1
Code:
rbd mirror image enable c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image enable c1-pool-hdd-1/vm-501-disk-0 snapshot
I also did some manual snapshot of these images
Code:
rbd mirror image snapshot c1-pool-hdd-1/vm-500-disk-0 snapshot
rbd mirror image snapshot c1-pool-hdd-1/vm-501-disk-0 snapshot
that's all.
Here are logs and information after my setup
Cluster pve-c1
Node pve-c1-n1
Code:
root@pve-c1-n1:~# ls /etc/ceph/
ceph.conf pve-c1.conf pve-c2.conf rbdmap
Code:
root@pve-c1-n1:/etc/ceph# cat pve-c1.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 169.254.3.11/24
fsid = b749dcbe-95a8-4e53-9c64-7aef7078cca0
mon_allow_pool_delete = true
mon_host = 169.254.2.11 169.254.2.12 169.254.2.13
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 169.254.2.11/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.pve-c1-n1]
public_addr = 169.254.2.11
[mon.pve-c1-n2]
public_addr = 169.254.2.12
[mon.pve-c1-n3]
public_addr = 169.254.2.13
Code:
root@pve-c1-n1:~# ls /etc/pve/priv/
acme lock
authkey.key pve-c1.client.admin.keyring
authorized_keys pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
ceph pve-c1.mon.keyring
ceph.client.admin.keyring pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.rbd-mirror-peer.pve-c1-n1.keyring pve-root-ca.key
ceph.mon.keyring pve-root-ca.srl
known_hosts
Code:
root@pve-c1-n1:/etc/ceph# ceph auth get client.rbd-mirror-peer.pve-c1-n1
exported keyring for client.rbd-mirror-peer.pve-c1-n1
[client.rbd-mirror-peer.pve-c1-n1]
key = AQAtUNRg53BkDBAAB2KHc050bkYGDp0jhZzk3A==
caps mon = "profile rbd-mirror-peer"
caps osd = "profile rbd"
Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c1-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c1-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd
Code:
root@pve-c1-n1:~# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0
vm-502-disk-0
Code:
root@pve-c1-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: rbd-mirror.pve-c1
Peer Sites:
UUID: c935a06f-b77d-4671-8f12-4a71d856c56e
Name: pve-c2
Mirror UUID: dc437fdd-7ec9-4c26-93bb-616d7b0f0832
Direction: tx-only
Code:
root@pve-c1-n1:~# rbd mirror pool status c1-pool-hdd-1 --verbose
health: WARNING
daemon health: UNKNOWN
image health: WARNING
images: 2 total
2 starting_replay
DAEMONS
none
IMAGES
vm-500-disk-0:
global_id: 778ba6a3-9f55-468b-9c25-0c8f1d060a73
vm-501-disk-0:
global_id: 6ee71be5-bdba-4e84-8040-6b2ceb164ee6
Code:
root@pve-c1-n1:~# rbd info c1-pool-hdd-1/vm-500-disk-0
rbd image 'vm-500-disk-0':
size 8 GiB in 2048 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: bb7a2d40c0ff03
block_name_prefix: rbd_data.bb7a2d40c0ff03
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Tue Jun 22 08:55:54 2021
access_timestamp: Thu Jun 24 14:08:59 2021
modify_timestamp: Thu Jun 24 16:31:11 2021
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: 778ba6a3-9f55-468b-9c25-0c8f1d060a73
mirroring primary: true
Cluster pve-c2
Node pve-c2-n1
Code:
root@pve-c2-n1:~# ls /etc/ceph/
ceph.conf pve-c1.conf pve-c2.conf rbdmap token
Code:
root@pve-c2-n1:/etc/ceph# cat pve-c2.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 169.254.3.31/24
fsid = 60c3b071-5d57-48fc-8eaa-f1da68a3d6ed
mon_allow_pool_delete = true
mon_host = 169.254.2.31 169.254.2.32 169.254.2.33
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 169.254.2.31/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.pve-c2-n1]
public_addr = 169.254.2.31
[mon.pve-c2-n2]
public_addr = 169.254.2.32
[mon.pve-c2-n3]
public_addr = 169.254.2.33
Code:
root@pve-c2-n1:/etc/ceph# ls /etc/pve/priv/
acme lock
authkey.key pve-c1.client.rbd-mirror-peer.pve-c1-n1.keyring
authorized_keys pve-c2.client.admin.keyring
ceph pve-c2.client.rbd-mirror-peer.pve-c2-n1.keyring
ceph.client.admin.keyring pve-c2.mon.keyring
ceph.client.rbd-mirror-peer.pve-c2-n1.keyring pve-root-ca.key
ceph.mon.keyring pve-root-ca.srl
known_hosts
Code:
root@pve-c2-n1:~# ceph auth get client.rbd-mirror-peer.pve-c2-n1
exported keyring for client.rbd-mirror-peer.pve-c2-n1
[client.rbd-mirror-peer.pve-c2-n1]
key = AQA3UNRggugSGBAAqvbOZDTGRqu9amLoIoEB7g==
caps mon = "profile rbd-mirror-peer"
caps osd = "profile rbd"
Code:
root@pve-c1-n1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 855 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 7 'c2-pool-hdd-1' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 771 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 8 'c2-pool-ssd-1' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 769 flags hashpspool stripe_width 0 application rbd
pool 10 'c1-pool-hdd-1' replicated size 1 min_size 1 crush_rule 3 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode warn last_change 833 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
For the moment, i setup the replicated pool from pve-c1 cluster to size 1 and min_size 1.
Code:
root@pve-c2-n1:~# rbd mirror pool info c1-pool-hdd-1
Mode: image
Site Name: pve-c2
Peer Sites:
UUID: 5def7721-bb6a-44e8-8cd9-33c15c9ab1c3
Name: pve-c1
Direction: rx-only
Client: client.rbd-mirror-peer.pve-c1-n1
Code:
root@pve-c2-n1:~# rbd mirror pool status c1-pool-hdd-1 --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 2 total
2 starting_replay
DAEMONS
service 8115656:
instance_id: 8148072
client_id: rbd-mirror-peer.pve-c2-n1
hostname: pve-c2-n1
version: 15.2.13
leader: true
health: OK
IMAGES
vm-500-disk-0:
global_id: 778ba6a3-9f55-468b-9c25-0c8f1d060a73
state: down+unknown
description: status not found
last_update:
vm-501-disk-0:
global_id: 6ee71be5-bdba-4e84-8040-6b2ceb164ee6
state: down+unknown
description: status not found
last_update:
As you can see, state value is "down+unknown"
Code:
root@pve-c2-n1:/etc/ceph# rbd list c1-pool-hdd-1
vm-500-disk-0
vm-501-disk-0
Extract logs :
Before i also had this message but change direction to rx-only on pve-c2-n1 fix it (see above) :2021-06-24T18:33:35.651+0200 7fb1190d7700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status:
2021-06-24T18:33:35.651+0200 7fb1190d7700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55fbd4969400 get_replay_status: replay not running
...
init: failed to retrieve mirror peer uuid from remote pool
I also tried to setup rdb mirroring snapshot-base with the other way bootstrap but the import command failed.
# pve-c1
Bash:
root@pve-c1-n1:~# rbd mirror pool peer bootstrap create --site-name rbd-mirror.pve-c1 c1-pool-ssd-1 > token
root@pve-c1-n1:~# scp token root@192.168.2.31:/etc/ceph/
# pve-c2
Code:
root@pve-c2-n1:/etc/ceph# rbd --cluster pve-c2 mirror pool peer bootstrap import --site-name pve-c2 --direction rx-only c2-pool-ssd-1 token
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-24T18:40:39.909+0200 7f1d0f4b73c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: failed to import peer bootstrap token
I also see the redhat documention explaining how to setup it when we have 2 cluster with the same name but change the variable CLUSTER seems not possible... The ceph cluster will failed.
root@pve-c2-n1:/etc/ceph# cat /etc/default/ceph
# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#
# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
#CLUSTER=pve-c2
Any suggestions ?
i followed many sources as https://docs.ceph.com/en/latest/rbd/rbd-mirroring/ & https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring & https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/
Jérémy.