Replication Without HA

Dihaxu · Apr 8, 2023

Hi, I am trying to understand the principles of replication and how it can be used for failover, without high availability.

I have this configured on a home lab, and notice that there is no "standby" VM created on the secondary node after replication. It looks like this is because only one node can "own" the VM, and so avoids multiple identical VMs attempting to be online at the same time.

Going by the cluster file system documentation, am I right in thinking that if the primary node fails, the way to bring up the replicated node is to move the VM conf file from the directory of the primary node to the directory of the secondary one, on the secondary node? This being possible because the conf files are shared across the cluster, meaning it doesn't matter that the failed node is down, because that configuration exists on the other node and can be moved to that node's directory so it can run the VM.

If so, what then happens when the primary server comes back online (for example after repair, or reconnection?) Will its filesystem not still have that VM conf file in its own directory, causing a conflict if it is set to power on after boot?

Thank you for any help in understanding this.

UdoB · Apr 8, 2023

Dihaxu said:
If so, what then happens when the primary server comes back online (for example after repair, or reconnection?) Will its filesystem not still have that VM conf file in its own directory, causing a conflict if it is set to power on after boot?

This is absolutely a valid question

We are talking about an existing Cluster, right?

When the primary server comes back online it will immediately replace its own (old!) database by the status of the cluster. This way it gets informed that one VM was moved away...

This is one reason you NEED to have at least three Nodes and two of them up and running all the time - simply to know which information is valid when a third Node comes up. (Alternatively: Quorum-Device.)

Best regards

Dihaxu · Apr 8, 2023

It is an existing cluster, yes, but... it is a two node cluster. I was thinking that with two nodes I could still allow for "manual" failover without having the three nodes for automated HA, but it looks like this is just inviting trouble.

I have a third server running PBS, so my thought is to rebuild that as a PVE server also running PBS. That way the PVE instance can be a third quorum vote, and maybe also be useful for test backup restores.

So in that situation, when the failed node comes back online it checks the cluster status and config first, updating its copy, before it thinks about booting any VMs? Then it will see that the VMs it was hosting are now elsewhere, and not try and run them on itself.

Search

Search

Replication Without HA

Dihaxu

New Member

UdoB

Distinguished Member

Dihaxu

New Member

We value your privacy