[TUTORIAL] Proxmox Ceph Mirroring - Snapshot Mode

fxandrei · Apr 7, 2022

Like i said in some of my previous posts i have tried ceph mirroring before, and followed the instructions available here.

This worked great, but only if you just used virtual machines, and not containers. So this does not work with container disks, because you would need to enable the journaling feature on them, and they wont start if you do.

The only available method that remained was snaphot base ceph mirroring.

So these are the instructions i used in order to make it work.

Required elements:

both clusters should be at least in Octopus
both clusters should have pools with the same name (the one used for mirroring)
the nodes in each cluster should see each other (VPN, LAN, etc)

Names used:

master: the name of the main cluster
backup: the name of the backup cluster, where we want to have everything mirrored for DR
node1-master: the node used for setup, from the main cluster
node1-backup: the node used for setup, from the backup cluster
hdd_pool: the name of the pool that has the images we want to be mirrored
10.200.100.1: the ip of node1-backup

So lets start

We activate mirroring on both clusters for every pool we want to have images mirrored. This command needs to be executed on node1-master and node1-backup.

rbd mirror pool enable hdd_pool image

We create a keyring, a user, on node1-master, and then transfer it on node1-backup, along with the ceph config from the master cluster.

On node1-master:

ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv.master.client.rbd-mirror.master.keyring

scp /etc/ceph/ceph.conf root@10.200.100.1:/etc/ceph.master.conf

scp /etc/pve/priv.master.client.rbd-mirror.master.keyring root@10.200.100.1:/etc/pve/priv/

On node1-backup, we create a new user.

ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring

On node1-backup we configure the rbd-mirror daemon.

systemctl enable ceph-rbd-mirror.target

cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service

sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service

systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service

systemctl start ceph-rbd-mirror@rbd-mirror.backup.service

On node1-backup we create a peer for the pool with rx-only for the direction of the mirroring.

rbd mirror pool peer add hdd_pool client.rbd-mirror.master@master --direction rx-only

We then edit the peer on the main cluster to only transmit (tx-only)

On node1-master:

rbd mirror pool info hdd_pool
rbd mirror pool peer set hdd_pool <UUID_FROM_THE_CMD_ABOVE> direction tx-only

We now need to activate the mirror for each image we want to be mirrored, on the main cluster.

So, on node1-master:

rbd mirror image enable hdd_pool/<image_name> snapshot

If we want to enable mirroring on all the images we can run this:

rbd ls hdd_pool | while read line ; do rbd mirror image enable hdd_pool/$line snapshot ; done

Right here i had some timeout errors. From what i saw the image had mirroring enabled on it (the cmd was successful), just that the command timed out. So if you would execute the same command again it would say that mirroring is enabled.

I did get another error on some images, something about not being able to create the initial snapshot. But after i shutdown the vm\lxd that used the disks, the command worked, and it enabled mirroring on it.

Right after you enable mirroring on the images you should start the see them start to sync. You can check this on the backup cluster.

On node1-backup:

rbd mirror pool status hdd_pool --verbose

After we have mirroring enabled we need to schedule the snapshots. This can be done at the pool level or for each image.

You could enable a longer interval at the pool level, and then enable a shorter one for the disks that change more frequently. So you could enable a general 6h (6 hours) interval for all the images, and then set another of 5m (5 minutes) for disks that have files that change (or databases). It all depends on your needs.

So on node1-master:

rbd mirror snapshot schedule add -p hdd_pool 6h
rbd mirror snapshot schedule add -p hdd_pool --image <image_name> 5m

To delete a created snaphot schedule you can run:

rbd mirror snapshot schedule remove -p hdd_pool <defined_time>

Check schedule status on node1-master:

rbd mirror snapshot schedule ls --pool hdd_pool --recursive
rbd mirror snapshot schedule status

If we want to use the an image in the backup cluster we need to demote the image on the main cluster, and promote it in the backup cluster.

On node1-master:

rbd mirror image demote hdd_pool/<image_name>

On node1-backup:

rbd mirror image promote hdd_pool/<image_name>

If we go backup with the images (promote the images from the main cluster again) and get split brain errors on the backup cluster, we need to force resync:

On node1-backup:

rbd mirror image resync hdd_pool/<image_name>

I order to do this i had some help from mysterysmith over on the ceph redit forum. He has been very helpfull.

Hope this helps someone

jsterr · Apr 7, 2022

Hey fxandrei,

thanks for the post, this is definetly something I will look into it in the feature. So to fully have working vms on backup node you still need to sync vm-config-files right? Any other things that might not be covered in your tutorial? Can you give more explanation for this:

If we go backup with the images (promote the images from the main cluster again) and get split brain errors on the backup cluster, we need to force resync:

Means if you promote image to node-1backup use it for a while, node1-master comes online again and youll promote it again, node1-backup will drop some errors? Which ones? And then you force a resync, so the current state of vm image will be synced from backup to master again?

If you demote a image on master, sync will be disabled until you promote it again?
If you arent able to demote images on master, is there a problem while promoting it on the available backup cluster?

So the sync only works in one direction and if you wanna go back to master with a specific image you need that resync from backup to master and after that youll promote it to master again and demote it on backup?

Is there any way for bidirectional sync and active active usage of both clusters so you dont have a full 3 node cluster on "standby"?

Edit: It would be so cool to get that implemented like the replication feature from zfs via gui.

Thanks Jonas

fxandrei · Apr 7, 2022

So to fully have working vms on backup node you still need to sync vm-config-files right? Any other things that might not be covered in your tutorial?

Yes, you would need to sync the config files of the vms and containers that you are mirroring.
You could do this with a simple cron job that can run a rsync command.

Means if you promote image to node-1backup use it for a while, node1-master comes online again and youll promote it again, node1-backup will drop some errors? Which ones? And then you force a resync, so the current state of vm image will be synced from backup to master again?

If node1-master comes online again and you want to keep the images that ran on the backup cluster, then you would need to get those images back on the master cluster pool. The setup i use has just one direction for mirroring (from the main cluster to the backup cluster). If you would want to have rx-tx then you would probably need to have the same setup in node1-master; you should have the daemon running on there, and have rx-tx on both nodes. But i dont want that (i want to be sure that the mirroring only happens in one direction).
So one you get the images back on the main cluster, you promote them, and demote the ones on the backup cluster.

Now, i both cases (of you copy the images backup on the main cluster or not), once the daemon sees that something has changed on the main cluster, i think it will get split brain errors. But you now know that you want to have the mirroring start again (mirroring the image from the main cluster to the backup cluster).
So you the force the sync, and it will mirror the images again from the master cluster.

So the sync only works in one direction and if you wanna go back to master with a specific image you need that resync from backup to master and after that youll promote it to master again and demote it on backup?

Mirroring in both directions could work only if you configure the mirroring to be rx-tx, and i think this needs the same config on both ends (with the mirroring daemon).
Like i said above, once you get your image on the master cluster, you can promote that image, and force the resync.

Is there any way for bidirectional sync and active active usage of both clusters so you dont have a full 3 node cluster on "standby"?

I think bidirectional mirroring using rbd mirroring only means that you could mirror images in both directions, but not at the same time. One image can only go in one direction at a time. So that means you cant use the image that is not the primary one.
So i dont think you could have an active active setup.

NetUser · Nov 14, 2022

Hi, first of all many thanks for the careful explanation, this is really helpful.
I've got a question, on the official CEPH doc page, at a certain point in the manual snapshot creation, it says "The most recent mirror-snapshot is automatically pruned if the limit is reached". I'd like to know, given i use default rbd_mirroring_max_mirroring_snapshots=5, that with scheduled snapshot you don't have this limitation and/or ceph handles the coalesce of old snapshots so you always have the most recent ones.
I'd not like to have the necessity of my DR cluster a year from now and find my images are one year old

Thanks in advance!

NetUser · Nov 18, 2022

Search

Search

[TUTORIAL] Proxmox Ceph Mirroring - Snapshot Mode

fxandrei

Renowned Member

jsterr

Renowned Member

fxandrei

Renowned Member

NetUser

Member

NetUser

Member