Replication failure

ewallig@triangleaviation. · Aug 8, 2025

Hi,

I'm trying to set up a simple replication between two hosts in a cluster that use local storage. I've set up the zfs pools with the same pool name, added them to datacenter storage, and created a replication job (see attached images). The replication fails after 1.3s with this error:

command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-232' -o 'UserKnownHostsFile=/etc/pve/nodes/pve-232/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.120.232 -- pvesr prepare-local-job 119-0 Pool156-1:vm-119-disk-0 --last_sync 0' failed: exit code 255

I think I've followed the steps based on the documentation - what am I missing?

Thanks - Ed

fabian · Aug 8, 2025

you should have a single storage.cfg entry, not two entries each limited to a single node..

ewallig@triangleaviation. · Aug 8, 2025

fabian said:
you should have a single storage.cfg entry, not two entries each limited to a single node..

OK so if I understand you correctly in the Datacenter > Storage section, the "Pool156-1" pool should be assigned to both of the nodes (see image) that I want to do replication between, is that correct? Also, since there is already data on the 1st node, is assigning this pool to the 2nd node going to cause any issues?

aaron · Aug 8, 2025

The pools need to be called the same. So if the pool on the second node is empty, you could rename it:

Code:

zpool export {pool}
zpool import {pool} {new name of pool, same as on other node}

Or destroy and recreate it under the same name.

If there is already data on it on the second node, try to move it away / live migrate guests to node 1.

Proxmox VE has no problem if host local storage is called the same on multiple nodes. And for the replication feature, it actually needs that

fabian · Aug 8, 2025

could you please post the full storage.cfg, and "pvesm list <storage>" for all storages on all nodes?

ewallig@triangleaviation. · Aug 8, 2025

ewallig@triangleaviation. said:
OK so if I understand you correctly in the Datacenter > Storage section, the "Pool156-1" pool should be assigned to both of the nodes (see image) that I want to do replication between, is that correct? Also, since there is already data on the 1st node, is assigning this pool to the 2nd node going to cause any issues?

fabian said:
could you please post the full storage.cfg, and "pvesm list <storage>" for all storages on all nodes?

OK, I can provide that info if still needed but I went ahead and added the "Pool156-1" pool to both nodes and now it replicated successfully so that's working now.

From a workflow perspective, how does replication work for failover? For example, I have a test vm on my "parent node" (pve-156) successfully replicated to my 2nd node (pve-232) and I can see the disk when I click the "Pool156-1" object under the "pve-232" node. I do not see a vm under pve-232 (not sure if I'm supposed to??) and if I simply stopped the vm on the parent node nothing happens; the replicated vm doesn't appear, start, anything like that...does the host itself have to be down for the replicated systems to spin up?

Sorry if this is all expected; I'm new at things like shared storage, replication, etc. so I may be asking questions that don't make sense or are otherwise answered in the docs but I just haven't interpreted the info correctly.

Thanks

fabian · Aug 8, 2025

there's two ways:
- with HA, the HA stack will handle the recovery (https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_recover_fenced_services) - but if replication was behind, you will lose data (potentially a lot)
- without HA, you need to recover manually ( https://pve.proxmox.com/pve-docs/chapter-pvesr.html#_migrating_a_guest_in_case_of_error ) after having verified that the original node is no longer running

ewallig@triangleaviation. · Aug 8, 2025

fabian said:
there's two ways:
- with HA, the HA stack will handle the recovery (https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_recover_fenced_services) - but if replication was behind, you will lose data (potentially a lot)
- without HA, you need to recover manually ( https://pve.proxmox.com/pve-docs/chapter-pvesr.html#_migrating_a_guest_in_case_of_error ) after having verified that the original node is no longer running

Thanks for the info. OK, so looking at my Datacenter > HA...is there anything that I need to do there? In this scenario where these vms are running on a local zfs pool (vs the shared storage) and are being replicated to another node do I need to create a group to keep HA confined to these two nodes?

ewallig@triangleaviation. · Aug 8, 2025

ewallig@triangleaviation. said:
Thanks for the info. OK, so looking at my Datacenter > HA...is there anything that I need to do there? In this scenario where these vms are running on a local zfs pool (vs the shared storage) and are being replicated to another node do I need to create a group to keep HA confined to these two nodes?

Ah, forget that I posted this...found the info and I think I've got it configured correctly - will test next week, thanks.

Search

Search

Replication failure

ewallig@triangleaviation.

New Member

Attachments

fabian

Proxmox Staff Member

ewallig@triangleaviation.

New Member

Attachments

aaron

Proxmox Staff Member

fabian

Proxmox Staff Member

ewallig@triangleaviation.

New Member

fabian

Proxmox Staff Member

ewallig@triangleaviation.

New Member

ewallig@triangleaviation.

New Member

We value your privacy