newbie questions about replication

hspindel

Member
Aug 6, 2025
35
5
8
First of all, I am absolutely thrilled with Proxmox. Have set up two PVE (call them PVE1 and PVE2) and one PBS, and they all work great.

I decided to learn about replication. Was able to cluster my two PVEs. Ran into two things:

1. I expected that a VM I replicated from PVE1 to PVE2 would then show up as a VM I could start on PVE2. But it doesn't show up in the list of VMs. The disk for the replicated VM does show up in my storage on PVE2. I did read that in order to launch the VM on PVE2 that I should copy the .conf file for the VM from PVE1 to PVE2. If PVE1 is down, will that .conf file be available?
I tried to copy the .conf file, but I get a message that the copy can't be completed because the file exists (and since I'm root, I should be able to overwrite anything). The file does not exist if I use "ls -a", so I'm very puzzled by that.

2. The hardware on PVE1 and PVE2 are different, with PVE2 being less capable. So sometimes if I restore a VM that originated on PVE1 to PVE2 it won't start on PVE2 until I adjust the hardware requirements (e.g., RAM) to better match PVE2. What's the best practice for handling this? When I create a VM on PVE1, am I supposed to keep the hardware limitations of PVE2 in mind? That would mean not taking advantage of all the hardware on PVE1!
 
I expected that a VM I replicated from PVE1 to PVE2 would then show up as a VM I could start on PVE2. But it doesn't show up in the list of VMs. The disk for the replicated VM does show up in my storage on PVE2. I did read that in order to launch the VM on PVE2 that I should copy the .conf file for the VM from PVE1 to PVE2. If PVE1 is down, will that .conf file be available?
I tried to copy the .conf file, but I get a message that the copy can't be completed because the file exists (and since I'm root, I should be able to overwrite anything). The file does not exist if I use "ls -a", so I'm very puzzled by that.
copy does not work on the filesystem behind /etc/pve, but a move will do.

The hardware on PVE1 and PVE2 are different, with PVE2 being less capable. So sometimes if I restore a VM that originated on PVE1 to PVE2 it won't start on PVE2 until I adjust the hardware requirements (e.g., RAM) to better match PVE2. What's the best practice for handling this? When I create a VM on PVE1, am I supposed to keep the hardware limitations of PVE2 in mind? That would mean not taking advantage of all the hardware on PVE1!
There is no solution for this besides adding more RAM to your second node to do this. If a single VM is to big to be started on the second node, you will not have fun with this.

Keep in mind that a two-node "cluster" will not work if you loose one node, so you will not have what you're planing for. You need at least a qdevice on your PBS, to have an odd number of machines in order to be able to start the VMs if your main node fails. Maybe look into not-clustering the nodes and setup a simple replication job. You will not have problems with moving QM configs, because they're already present.
 
  • Like
Reactions: Johannes S and UdoB
copy does not work on the filesystem behind /etc/pve, but a move will do.

mv source/106.conf /etc/pve/qemu-server reports:

Cannot create regular file /etc/pve/qemu-server/106.conf: File exists

The file definitely doesn't exist according to ls.

There is no solution for this besides adding more RAM to your second node to do this. If a single VM is to big to be started on the second node, you will not have fun with this.

That's what I suspected. Thanks for confirming.

Keep in mind that a two-node "cluster" will not work if you loose one node, so you will not have what you're planing for. You need at least a qdevice on your PBS, to have an odd number of machines in order to be able to start the VMs if your main node fails. Maybe look into not-clustering the nodes and setup a simple replication job. You will not have problems with moving QM configs, because they're already present.

That's not what I was looking to do. I'm not looking for automatic failover. I just want to replicate a VM to a second PVE, and go and manually start the VM on the second PVE if I ever need it. I'm pretty sure I read somewhere that I needed to create a cluster in order to replicate so that's why I created one. Not using the cluster for anything else. If I try to replicate without having a cluster, replication tells me that replication needs at least two nodes.

So I ran the replication job, and the VM disk does show up in the proper storage on the second PVE, but no QM config files anywhere.



Thank you for your help.
 
Last edited:
Cannot create regular file /etc/pve/qemu-server/106.conf: File exists
That file (that VM with the ID 106) can only exist once in the cluster.

To verify if the old one is still present: ~# find /etc/pve/nodes -name 106.conf - probably you'll find it there. Then move that one.
 
Isn't it normal for conf to only be enabled on one server?

"/etc/pve" is a fuse-mounted database. Proxmox does a lot of work to make sure that it is identical on all cluster members. Basically it reflects the current state of the cluster, including all configuration.

That's also the reason why /etc/pve is only writeable as long as Quorum is reached.

Edit: you talk about two nodes. If one is down then the other one does not quorate. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support to implement a third vote!
 
Last edited:
I did not choose for VMID 106 to be present twice in the cluster. Replication from the first PVE to the second PVE automatically created it that way.

As above, what I am trying to achieve is an automatic replication from PVE1 to PVE2 with no automatic failover. If I try to replicate without a cluster present, the replication config dialog on PVE1 tells me that two PVE are necessary for replication, and provides me with no way to specify PVE2 as a replication target from PVE1 if no cluster is present.

The feedback seems to indicate that what I want is not possible with replication unless I add a third node for a quorum. Is that correct? Seems to me that makes replication pretty useless without a quorum. I could potentially do that since I have a Proxmox Backup server running and I think that can be used as a quorum member. But it's not what I'm really after.

Backup plan would be to have PVE2 restore VMs from PVE1 using PBS as a source. But I have to do that manually, and I was hoping for a more automatic solution using replication.

Thank you for the feedback.
 
Was unaware of that package, and it sounds like exactly what I want. Is it necessary to split the cluster before using PVE-zsync?
Nope it's not. But: A cluster brings additional complexity and (thus) additional potential failures.
For example: You should have at least three nodes like explained by Udo above. That can be mitigated with a qdevice (a raspberry would work or a ProxmoxBackupServer on an old PC) though. Another thing is that the cluster corosync service needs a low latency in the network thus it's recommended to have at least one dedicated network just for cluster communication. So you will need at least one additional network adapter on both nodes just for the cluster communication see https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network ). Of course ProxmoxVE will still allow to build a cluster if you don't do this but if the network load gets to high this might cause issues. I myself have a two-node+qdevice cluster in my homelab without a dedicated cluster network. But they are just two mini-pcs and don't run anything important, they are my playground for trying stuff ;) The important stuff I (my NAS VM, Paperless for important documents etc) I run on a dedicated single-node server). The benefit is that I can break my cluster without doing harm to anything important. Just last week I managed to broke the networking on one of my mini-pcs. It was annoying (because I was not even home but on A VPN so I couldn't fix it), but nothing of value was lost :)
So: If you don't need the functionality of the cluster (e.G. high-availability) then you are better of without it. If you would like to have one management interface for all your nodes, the Proxmox Datacenter Manager (although still in Alpha status) might be worth a look
 
Last edited:
  • Like
Reactions: UdoB
If all you are looking for is a periodic diff copy of the zfs, I think you can do it by creating a script.

The conf file would not need to be changed.

It should be possible to send to another host via ssh.

Only if it is not a ha, since the same vmid cannot be active on each node at the same time since it is a ha.

Code:
zfs send proxmox/proxmox/w2k25/vmtest1@setting | pv | zfs receive hdd_pool/proxmox/w2k25/vmtest1

zfs send -i proxmox/proxmox/w2k25/vmtest1@setting proxmox/proxmox/w2k25/vmtest1@test1 | pv | zfs receive -f hdd_pool/proxmox/w2k25/vmtest1
 
Last edited:
Thank you all for your help. I now have replication fully working and can fire up equivalent VMs on my second PVE.

I thought I would document some of the steps I went through in case anyone else reads this because it took some work for me to figure it out.

I decided I would implement a quorum device on my PBS. Following the directions, this all went very smoothly, and pvecm status immediately showed expected results.

So I started up replication jobs on my primary PVE. I saw that the VMs were successfully replicated to the secondary PVE. But the VMs did not show up as startable on the second node. I theorized (correctly) that the .conf files were not replicated, which surprised me.

So I figured out that what I needed to do was create a dummy VM on the secondary PVE with attributes identical to the primary PVE's VM, except that the storage was configured to be only 1GB and the VMID needed to be different.. Then I found I needed to go to /etc/pve/qemu-server and modify the conf file so that the scsi0: line pointed to storage that was replicated from the primary PVE instead of the dummy 1GB storage.

This worked fine for most of my VMs, and I could fire them up on the secondary PVE and they worked great. But I still had trouble with one of the VMs. I eventually figured out it was because that VM booted from UEFI storage, and I had to add to the conf file a line with efidisk0: pointing to the replicated storage. It was a little tricky to know which of the replicated stores was scsi0: and which was efidisk0:. I was able to tell by the relative sizes of the replicated storage. After applying these changes, the VM booted successfully from UEFI.

----------------

I did not find a single place on the web where this process was documented, and some of it was guesswork on my part. But I got it working, and with help from all of you I am now a very happy replicated Proxmox camper.

Perhaps I solved this in a roundabout way and there was an easier way. For other readers, it would be useful to post that.
 
Last edited:
If all you are looking for is a periodic diff copy of the zfs, I think you can do it by creating a script.

He still would have to also copy vms and lxcs. And pve-zsync already do it based on zfs, so no need to reinvent the wheel.
 
  • Like
Reactions: uzumo
I theorized (correctly) that the .conf files were not replicated, which surprised me.
This is working as expected: Storage replication inside a cluster is for continueing a VM in case of an network error together with high-availability (see https://pve.proxmox.com/wiki/High_Availability ) or to reduce the migration time. Thus you can always only start one copy of the VM or LXCs because otherwise you would end up with multiple machines with the same MAC adress in your network. The config lives in /etc/pve/nodes/nodenumber present on all nodes inside the cluster etc so in case of a migration or ha event it would be moved to the new host. So your whole manual configuration was in fact not neded and might bite you back at some later.

As already said: If you don't need high availability or another cluster-specific usecase you are better off with a combination of pve-zsync (for the replication including the configuration) and the Promox datacenter manager (for having a unified managment interface and a GUI to the remote-migrate options of the pct and qm commands for migration between the single nodes without a cluster). The PBS (of course) could still be used for both, I would recommend to setup a dedicated namespace for each independend node then. Thanks to deduplication the data identical on both nodes (aka the duplicated VMs etc) will still only be backuped once.
 
  • Like
Reactions: UdoB