Replication non cluster pve-zsync

bishoptf

New Member
Jun 7, 2025
11
1
3
Read through the documentation for replication between 2 nodes that are not in a cluster using pve-zsync. I have this working in a lab setup but want to verify what I am doing and make sure what I am doing is the best way to accomplish what I need to do. I have a 2 node esxi environment that I am going to need to migrate to something else, looking at xcp-ng and proxmox. This supports a small business and not located in any datacenter but have 2 servers with a server in 2 different on site closets. For small businesses its not really from an architecture standpoint to be able to run a cluster. I think a lot of folks simplify this and run clusters with q devices but the reality is for a small business with nodes separated for redundancy reasons there are so many single points of failure that its impossible to always have 2 nodes reachable. Currently I have 2 esxi nodes and use veeam to replicate from one node to another and backup all the vm's. This has served me well and works for what is needed for this small business. If I were to lose a closet, major power issues, network switch etc, most services are duplicated across nodes except for one file share which then could be brought up via the veeam replicated vm. This is what I am needing to try to implement with the alternatives that I am looking into and have working with proxmox.

Currently running pve-zsync on one node and replicating to the other node and it appears to work well.

pve-zsync list
SOURCE NAME STATE LAST SYNC TYPE CON
100 testzsync1 ok 2025-07-06_09:15:01 qemu ssh

The data is copied and then every 15min is sync'd to the other node but only copies the disk part. From reading it appears that you also need to copy the vm configuration to the other node which I did with scp:

scp /etc/pve/qemu-server/100.conf root@x.x.x.x:/etc/pve/qemu-server/100.conf

One of the issues that I see with the SCP command that unless I add it to a nightly cron job its a one time thing, if I make any changes to the VM config options, while rare, those changes would not be copied unless I somehow add it to a nightly/daily cron job. The other thing that I read is if I need to bring up the vm on the other host, I would need to ensure that the replication job has been stopped. Depending on what kind of failure that forced the outage, this may be simple thing to do or something I need to ensure that I have done before I bring the other node back online.

I just wanted to make sure what I have above sounds correct and have an additional questions. Once I am able to bring the original node back up I assume that I could do a one time synch BACK to the original node. I haven't tried that but I assume that would be possible since I would want the application back on the original node.

Some comments which I know have been brought up before but want to echo thier statements is concerning about the VMID's. I understand partially why they took the approach they did but would have been nice for the storage to have vmid+name etc. When you replicate to the other node without the configuration part you only have the vmid labeled storage and unless I am missing something there is no way to know on that node what the name is associated with it. I know this probably is not an issue with clusters etc, but for non cluster environments (smaller businesses) this just makes things more difficult when trying to track what is what when replicating. Just my commentary, it is what it is kind of thing but really would make managing things for us small guys much easier.

Still have to get UPS and shutdown working but since veeam officially supports proxmox now that satisfies the backup requirement. Any input on what I am doing in regards to the replication would be great, thanks!
 
Looks like when I try to run pve-zsync from the replicate node back to the original node that I get a permission denied error. I have verified that I am able to login with root so not sure what the issue may be:

pve-zsync sync --source 100 --dest x.x.x.x:100 --verbose --maxsnap 2 --name restorezsync
root@x.x.x.x's password:
full send of vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18 estimated size is 17.4G
total estimated size is 17.4G
TIME SENT SNAPSHOT vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18
Job --source 100 --name restorezsync got an ERROR!!!
ERROR Message:
COMMAND:
zfs send -v -- vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18 | ssh -o 'BatchMode=yes' root@x.x.x.x.x -- zfs recv -F -- 100/vm-100-disk-0
GET ERROR:
root@x.x.x.x: Permission denied (publickey,password).

Does this mean that I need to delete the disk on the original and it has to re-sync the whole image? Thats what I believe it is telling me so not able to just synch the delta part...
 
Looks like when I try to run pve-zsync from the replicate node back to the original node that I get a permission denied error. I have verified that I am able to login with root so not sure what the issue may be:

pve-zsync sync --source 100 --dest x.x.x.x:100 --verbose --maxsnap 2 --name restorezsync
root@x.x.x.x's password:
full send of vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18 estimated size is 17.4G
total estimated size is 17.4G
TIME SENT SNAPSHOT vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18
Job --source 100 --name restorezsync got an ERROR!!!
ERROR Message:
COMMAND:
zfs send -v -- vmstorage-zfs/vm-100-disk-0@rep_restorezsync_2025-07-06_11:36:18 | ssh -o 'BatchMode=yes' root@x.x.x.x.x -- zfs recv -F -- 100/vm-100-disk-0
GET ERROR:
root@x.x.x.x: Permission denied (publickey,password).

Does this mean that I need to delete the disk on the original and it has to re-sync the whole image? Thats what I believe it is telling me so not able to just synch the delta part...
Looks like you have to create a job vs a one time sync, I thought you could just do a one time sync vs the recurring job but doesnt appear so. Using the create option allowed the job to complete...
 
Been playing with pve-zsync most of the day and still feel like I am not sure how it should work. Copying the main disk appears to work fine but it doesn't appear to copy snapshots, maybe those are excluded but I cannot find any documentation to really say if that is the case or not. Here is the one node:

root@dpcpver330:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.23G 226G 96K /rpool
rpool/ROOT 2.20G 226G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.20G 226G 2.20G /
rpool/data 96K 226G 96K /rpool/data
rpool/var-lib-vz 96K 226G 96K /var/lib/vz
vmstorage-zfs 22.1G 3.49T 112K /vmstorage-zfs
vmstorage-zfs/vm-100-disk-0 13.5G 3.49T 13.0G -
vmstorage-zfs/vm-100-state-Wireshark-removal 8.62G 3.50T 1.16G -

Here is the other node:

root@pve1:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.30G 69.4G 104K /rpool
rpool/ROOT 2.22G 69.4G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.22G 69.4G 2.22G /
rpool/data 96K 69.4G 96K /rpool/data
rpool/var-lib-vz 112K 69.4G 112K /var/lib/vz
vmstorage-zfs 16.5G 3.50T 3.46G /vmstorage-zfs
vmstorage-zfs/vm-100-disk-0 13.0G 3.50T 13.0G -

I have the main disk, vm-100-disk-0 but not the snapshot like we have on the other node which we was the source, here is the command that I was using:

pve-zsync sync --source 100 --dest x.x.x.x:vmstorage-zfs --name restoretest --verbose --maxsnap 5

Can anyone verify how pve-zsync should work with snapshots?

Thanks