I would like to suggest some ideas to help guide the development of this while it is still not final. My use case is that I have two PVE nodes I use to run VMs for development support purposes (NFS file share for common objects, image cache servers, Jira, etc.) and to run our general office server and PBX software.
It is not worth the extra cost and maintenance complexity to me to have HA with automatic failover. I can live with short downtime to move the machines around.
I did configure my two nodes with ZFS, and named the storage the same on both nodes. This allows me to use the UI to move machines around when they're small. I have two VMs that have significant amount of disk space attached to them. Copying those takes hours across machines. This is where I would like to suggest a change in how you expect the recovery to happen with zfs-sync.
Instead of needing yet another send/recv step, I think it would be ideal if the sync could drop the backup files directly into an available ZFS storage on the remote node. Then the recovery would simply be to move the QEMU config file to the new node and start the machine (possibly changing the storage name) but not requiring a copy that could take hours. Also, the remote node I have does not necessarily have enough disk space for two whole copies of the disks (one for the zfs-send copy, then another for the VM to use).
My only question is what, if any, confusion would be caused to the remote node if there are extra VM disks in its ZFS storage it is not using?
Here is how I would anticipate using it:
storage on PVE1 called "tank" holds the disk images for vm-104-disk-1, vm-104-disk-2, vm-104-disk-3. These total about 1TB of data. I would like to zfs-sync this directly to node PVE2 into "tank" (each "tank" is private and local to each node, just named the same). I don't have room to sync it to another location on PVE2 then send/recv into "tank" when I need it. There is only one ZFS pool for data on this machine and it is does not have enough spare space to do this duplication. Also, if I sync the disks to a third node (just a file server) then the recovery send/recv takes about 6 hours over the LAN. By having the disk images already in the ZFS storage that PVE2 wants to use, recovering the machine becomes trivial and fast.
Also, if PVE made a ZFS data set per VM inside the pool, the send/recv would be much easier: you would "zfs snap -r tank/vm-104@xx" and "zfs send -r" instead of having to do each disk individually. This would assume a structure like "tank/vm-104/disk-1". You could also simplify setting zfs properties per VM like disabling compression if the VM's file system is already compressed, for example.