Replication Without HA

Dihaxu

New Member
Feb 17, 2023
9
2
3
Hi, I am trying to understand the principles of replication and how it can be used for failover, without high availability.

I have this configured on a home lab, and notice that there is no "standby" VM created on the secondary node after replication. It looks like this is because only one node can "own" the VM, and so avoids multiple identical VMs attempting to be online at the same time.

Going by the cluster file system documentation, am I right in thinking that if the primary node fails, the way to bring up the replicated node is to move the VM conf file from the directory of the primary node to the directory of the secondary one, on the secondary node? This being possible because the conf files are shared across the cluster, meaning it doesn't matter that the failed node is down, because that configuration exists on the other node and can be moved to that node's directory so it can run the VM.

If so, what then happens when the primary server comes back online (for example after repair, or reconnection?) Will its filesystem not still have that VM conf file in its own directory, causing a conflict if it is set to power on after boot?

Thank you for any help in understanding this.
 
If so, what then happens when the primary server comes back online (for example after repair, or reconnection?) Will its filesystem not still have that VM conf file in its own directory, causing a conflict if it is set to power on after boot?
This is absolutely a valid question :)
We are talking about an existing Cluster, right?

When the primary server comes back online it will immediately replace its own (old!) database by the status of the cluster. This way it gets informed that one VM was moved away...

This is one reason you NEED to have at least three Nodes and two of them up and running all the time - simply to know which information is valid when a third Node comes up. (Alternatively: Quorum-Device.)

Best regards
 
It is an existing cluster, yes, but... it is a two node cluster. I was thinking that with two nodes I could still allow for "manual" failover without having the three nodes for automated HA, but it looks like this is just inviting trouble.

I have a third server running PBS, so my thought is to rebuild that as a PVE server also running PBS. That way the PVE instance can be a third quorum vote, and maybe also be useful for test backup restores.

So in that situation, when the failed node comes back online it checks the cluster status and config first, updating its copy, before it thinks about booting any VMs? Then it will see that the VMs it was hosting are now elsewhere, and not try and run them on itself.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!