Best way to have hot spare + backup at the same time with PVE on ZFS (with pvesr & sanoid)

H25E

Member
Nov 5, 2020
68
4
13
32
Syncoid (dataset synchronization between two different machines) and Sanoid (creation and prune of snapshots) are two ZFS (high-level) tools that combined give you hot spares (syncoid) and backups (sanoid) to roll back to them if necessary.

Proxmox has the storage replication feature (pvesr) that overlaps with syncoid to create hot spare(s). What is nicer about pvesr than syncoid is that pver it's integreated with pve and adapts itself when a guest it's migrated, reversing the direction of the replication as needed.

The problem is that pvesr itself isn't enough to have proper backups where you can rollback if needed, because pvesr destroys the ZFS snapshot when replication ends. It would be really great to allow the snapshots to survive in combination with an auto-prune task. That way pvesr would provide hot spare and backup at the same time. But at the moment it isn't so...

We have to resort back to sanoid, which works great but it isn't aware of guest migrations. The point it's that only the node where a guest it's running should run sanoid in the guest's datasets. This way, later pvesr will replicate also the sanoid snapshots when replicating the guest. If the owner of the dataset isn't running locally, sanoid should skip the dataset to avoid snapshots duplication, because they will arrive through pvesr from the node where the guest is actively running. Some extra work should be done to make sanoid sentient about if the owner of the dataset is running in the local node or not, but it shouldn't be too difficult because sanoid allows to run a script before doing each snapshot.

I'm doing some tinkering myself with pvesr and sanoid right now with a two node cluster and I think it works great if you don't migrate any guest. I have a main node with pvesr and sanoid installed and configured. There, sanoid creates (and prunes) the backup snapshots and pvesr synchronize the guest and its backup snapshots to the secondary node. This way, you have a main node with local backups and hot spare with the backups synced too. If you screw up or get attacked with ransomware in some guest you can locally rollback to a safe snapshot. If your main node breaks, your data and your backups(!) are alive in the hot spare.

For the moment I only have two nodes but I was wondering how this would work or if it's feasible with bigger cluster (3,4,5) nodes with HA enabled where guests can be spread through the different nodes for "load balancing" and can be migrated automatically if a node goes down. My guess was something like that:
1671810423623.png
(green & red nodes also have the SANOID & pvesr boxes, they just don't appear to save time and space)

Ok, so my questions are:
  • Does all of that make any sense? To combine pvesr with sanoid to have hot spare and backup in the same nodes?
  • If it makes sense why isn't shipped already with PVE? I would hate to invest in a backup mode that doesn't make sense. Something like that it's on the PVE roadmap?
  • Any improvement / advice in the presented scheme?

Thanks for your time
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!