Replication failure

Jun 9, 2025
16
2
3
Hi,

I'm trying to set up a simple replication between two hosts in a cluster that use local storage. I've set up the zfs pools with the same pool name, added them to datacenter storage, and created a replication job (see attached images). The replication fails after 1.3s with this error:

command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-232' -o 'UserKnownHostsFile=/etc/pve/nodes/pve-232/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.120.232 -- pvesr prepare-local-job 119-0 Pool156-1:vm-119-disk-0 --last_sync 0' failed: exit code 255

I think I've followed the steps based on the documentation - what am I missing?

Thanks - Ed
 

Attachments

  • zfs_pools.JPG
    zfs_pools.JPG
    24.3 KB · Views: 2
  • zfs_datacenter_storage.JPG
    zfs_datacenter_storage.JPG
    32.6 KB · Views: 2
  • zfs_datacenter_storage_details.JPG
    zfs_datacenter_storage_details.JPG
    35.2 KB · Views: 2
  • rep_job.JPG
    rep_job.JPG
    33.4 KB · Views: 2
you should have a single storage.cfg entry, not two entries each limited to a single node..
 
you should have a single storage.cfg entry, not two entries each limited to a single node..
OK so if I understand you correctly in the Datacenter > Storage section, the "Pool156-1" pool should be assigned to both of the nodes (see image) that I want to do replication between, is that correct? Also, since there is already data on the 1st node, is assigning this pool to the 2nd node going to cause any issues?
 

Attachments

  • zfs_datacenter_storage_details2.JPG
    zfs_datacenter_storage_details2.JPG
    18.1 KB · Views: 3
Last edited:
The pools need to be called the same. So if the pool on the second node is empty, you could rename it:
Code:
zpool export {pool}
zpool import {pool} {new name of pool, same as on other node}
Or destroy and recreate it under the same name.

If there is already data on it on the second node, try to move it away / live migrate guests to node 1.

Proxmox VE has no problem if host local storage is called the same on multiple nodes. And for the replication feature, it actually needs that :)
 
could you please post the full storage.cfg, and "pvesm list <storage>" for all storages on all nodes?
 
OK so if I understand you correctly in the Datacenter > Storage section, the "Pool156-1" pool should be assigned to both of the nodes (see image) that I want to do replication between, is that correct? Also, since there is already data on the 1st node, is assigning this pool to the 2nd node going to cause any issues?

could you please post the full storage.cfg, and "pvesm list <storage>" for all storages on all nodes?
OK, I can provide that info if still needed but I went ahead and added the "Pool156-1" pool to both nodes and now it replicated successfully so that's working now.

From a workflow perspective, how does replication work for failover? For example, I have a test vm on my "parent node" (pve-156) successfully replicated to my 2nd node (pve-232) and I can see the disk when I click the "Pool156-1" object under the "pve-232" node. I do not see a vm under pve-232 (not sure if I'm supposed to??) and if I simply stopped the vm on the parent node nothing happens; the replicated vm doesn't appear, start, anything like that...does the host itself have to be down for the replicated systems to spin up?

Sorry if this is all expected; I'm new at things like shared storage, replication, etc. so I may be asking questions that don't make sense or are otherwise answered in the docs but I just haven't interpreted the info correctly.

Thanks
 
there's two ways:
- with HA, the HA stack will handle the recovery (https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_recover_fenced_services) - but if replication was behind, you will lose data (potentially a lot)
- without HA, you need to recover manually ( https://pve.proxmox.com/pve-docs/chapter-pvesr.html#_migrating_a_guest_in_case_of_error ) after having verified that the original node is no longer running
Thanks for the info. OK, so looking at my Datacenter > HA...is there anything that I need to do there? In this scenario where these vms are running on a local zfs pool (vs the shared storage) and are being replicated to another node do I need to create a group to keep HA confined to these two nodes?
 
Thanks for the info. OK, so looking at my Datacenter > HA...is there anything that I need to do there? In this scenario where these vms are running on a local zfs pool (vs the shared storage) and are being replicated to another node do I need to create a group to keep HA confined to these two nodes?
Ah, forget that I posted this...found the info and I think I've got it configured correctly - will test next week, thanks.