Like i said in some of my previous posts i have tried ceph mirroring before, and followed the instructions available here.
This worked great, but only if you just used virtual machines, and not containers. So this does not work with container disks, because you would need to enable the journaling feature on them, and they wont start if you do.
The only available method that remained was snaphot base ceph mirroring.
So these are the instructions i used in order to make it work.
Required elements:
Names used:
So lets start
We activate mirroring on both clusters for every pool we want to have images mirrored. This command needs to be executed on node1-master and node1-backup.
We create a keyring, a user, on node1-master, and then transfer it on node1-backup, along with the ceph config from the master cluster.
On node1-master:
On node1-backup, we create a new user.
On node1-backup we configure the rbd-mirror daemon.
On node1-backup we create a peer for the pool with rx-only for the direction of the mirroring.
We then edit the peer on the main cluster to only transmit (tx-only)
On node1-master:
We now need to activate the mirror for each image we want to be mirrored, on the main cluster.
So, on node1-master:
If we want to enable mirroring on all the images we can run this:
Right here i had some timeout errors. From what i saw the image had mirroring enabled on it (the cmd was successful), just that the command timed out. So if you would execute the same command again it would say that mirroring is enabled.
I did get another error on some images, something about not being able to create the initial snapshot. But after i shutdown the vm\lxd that used the disks, the command worked, and it enabled mirroring on it.
Right after you enable mirroring on the images you should start the see them start to sync. You can check this on the backup cluster.
On node1-backup:
After we have mirroring enabled we need to schedule the snapshots. This can be done at the pool level or for each image.
You could enable a longer interval at the pool level, and then enable a shorter one for the disks that change more frequently. So you could enable a general 6h (6 hours) interval for all the images, and then set another of 5m (5 minutes) for disks that have files that change (or databases). It all depends on your needs.
So on node1-master:
To delete a created snaphot schedule you can run:
Check schedule status on node1-master:
If we want to use the an image in the backup cluster we need to demote the image on the main cluster, and promote it in the backup cluster.
On node1-master:
On node1-backup:
If we go backup with the images (promote the images from the main cluster again) and get split brain errors on the backup cluster, we need to force resync:
On node1-backup:
I order to do this i had some help from mysterysmith over on the ceph redit forum. He has been very helpfull.
Hope this helps someone
This worked great, but only if you just used virtual machines, and not containers. So this does not work with container disks, because you would need to enable the journaling feature on them, and they wont start if you do.
The only available method that remained was snaphot base ceph mirroring.
So these are the instructions i used in order to make it work.
Required elements:
- both clusters should be at least in Octopus
- both clusters should have pools with the same name (the one used for mirroring)
- the nodes in each cluster should see each other (VPN, LAN, etc)
Names used:
- master: the name of the main cluster
- backup: the name of the backup cluster, where we want to have everything mirrored for DR
- node1-master: the node used for setup, from the main cluster
- node1-backup: the node used for setup, from the backup cluster
- hdd_pool: the name of the pool that has the images we want to be mirrored
- 10.200.100.1: the ip of node1-backup
So lets start
We activate mirroring on both clusters for every pool we want to have images mirrored. This command needs to be executed on node1-master and node1-backup.
rbd mirror pool enable hdd_pool image
We create a keyring, a user, on node1-master, and then transfer it on node1-backup, along with the ceph config from the master cluster.
On node1-master:
ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv.master.client.rbd-mirror.master.keyring
scp /etc/ceph/ceph.conf root@10.200.100.1:/etc/ceph.master.conf
scp /etc/pve/priv.master.client.rbd-mirror.master.keyring root@10.200.100.1:/etc/pve/priv/
On node1-backup, we create a new user.
ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring
On node1-backup we configure the rbd-mirror daemon.
systemctl enable ceph-rbd-mirror.target
cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service
systemctl start ceph-rbd-mirror@rbd-mirror.backup.service
On node1-backup we create a peer for the pool with rx-only for the direction of the mirroring.
rbd mirror pool peer add hdd_pool client.rbd-mirror.master@master --direction rx-only
We then edit the peer on the main cluster to only transmit (tx-only)
On node1-master:
rbd mirror pool info hdd_pool
rbd mirror pool peer set hdd_pool <UUID_FROM_THE_CMD_ABOVE> direction tx-only
We now need to activate the mirror for each image we want to be mirrored, on the main cluster.
So, on node1-master:
rbd mirror image enable hdd_pool/<image_name> snapshot
If we want to enable mirroring on all the images we can run this:
rbd ls hdd_pool | while read line ; do rbd mirror image enable hdd_pool/$line snapshot ; done
Right here i had some timeout errors. From what i saw the image had mirroring enabled on it (the cmd was successful), just that the command timed out. So if you would execute the same command again it would say that mirroring is enabled.
I did get another error on some images, something about not being able to create the initial snapshot. But after i shutdown the vm\lxd that used the disks, the command worked, and it enabled mirroring on it.
Right after you enable mirroring on the images you should start the see them start to sync. You can check this on the backup cluster.
On node1-backup:
rbd mirror pool status hdd_pool --verbose
After we have mirroring enabled we need to schedule the snapshots. This can be done at the pool level or for each image.
You could enable a longer interval at the pool level, and then enable a shorter one for the disks that change more frequently. So you could enable a general 6h (6 hours) interval for all the images, and then set another of 5m (5 minutes) for disks that have files that change (or databases). It all depends on your needs.
So on node1-master:
rbd mirror snapshot schedule add -p hdd_pool 6h
rbd mirror snapshot schedule add -p hdd_pool --image <image_name> 5m
To delete a created snaphot schedule you can run:
rbd mirror snapshot schedule remove -p hdd_pool <defined_time>
Check schedule status on node1-master:
rbd mirror snapshot schedule ls --pool hdd_pool --recursive
rbd mirror snapshot schedule status
If we want to use the an image in the backup cluster we need to demote the image on the main cluster, and promote it in the backup cluster.
On node1-master:
rbd mirror image demote hdd_pool/<image_name>
On node1-backup:
rbd mirror image promote hdd_pool/<image_name>
If we go backup with the images (promote the images from the main cluster again) and get split brain errors on the backup cluster, we need to force resync:
On node1-backup:
rbd mirror image resync hdd_pool/<image_name>
I order to do this i had some help from mysterysmith over on the ceph redit forum. He has been very helpfull.
Hope this helps someone
Last edited: