Qdevice VM in both of nodes for quorum votes

fabitaly67 · Apr 4, 2024

Hello,
I'have 2 PVE nodes in Cluster.
Actually I'm using each node with their VMs and when I need to switch off one node I migrate the VMs in the other node prior to node switch-off. This is working pretty good.
I tried to put one of the VMs in HA and disconnect its node. Result= the other node restart without start their VM (due to missing quorum/votes)

So I have an idea to put a QDevice VM (2 Qdevice VMs) in a virtual machine (linux mini distro) running as VM on each node to reach quorum 4 or in the worst situation (one node off), quorum3.

I would't put additional hardware

Could be a good idea????

Please, some suggestions, Thanks

Regards
FAB

Dunuin · Apr 4, 2024

fabitaly67 said:
So I have an idea to put a QDevice VM (2 Qdevice VMs) in a virtual machine (linux mini distro) running as VM on each node to reach quorum 4 or in the worst situation (one node off), quorum3.

Thats not how it works. You need 3 real physical machines. Either 3 full PVE nodes or 2 PVE nodes + 1 Qdevice on some third machine like a physical NAS, SBC or whatever.
Quorum is there to prevent screwing everything up (incl. data-loss) in case of an split brain and only way to prvent this is to have an uneven number of real physical machines (which isn't possible with only two machines).

fabitaly67 said:
I would't put additional hardware

Then you shouldn't run a cluster and especially no HA. Best then would be two stand-alone PVE nodes.

Azunai333 · Apr 4, 2024

As @Dunuin said.

You need 51% of ALL existing/expected votes. You should add only 1 (!) Qdevice, if your number of PVE nodes is even.
A Qdevice should not be VM on the same cluster/hardware.

fabitaly67 · Apr 4, 2024

Azunai333 said:
As @Dunuin said.

You need 51% of ALL existing/expected votes. You should add only 1 (!) Qdevice, if your number of PVE nodes is even.
A Qdevice should not be VM on the same cluster/hardware.

Thanks for both replies.
This is not an real productivity environment (all the backup of VMs are available and transferred nigthly on NAS every day). Actually both of nodes are at home in testing.
In the near future, each of the nodes will be positioned in different locations (1 at home and the other node is at the workplace) with a connection between them by a wifi link with 120Mbits of band.
Even one of the two nodes (an the eventual Qdevice in the same site) can be switched off in case of vacation. So my idea was to put a qdevice on each side as VM or Physical (if qdevice VM is not allowed in the same node).
Consider that the wifi connection can be broken and if it happens the Qdevice (positioned as third quorum votes device), couldn't be reached by the alive node.

Correct?

fabitaly67 · Apr 4, 2024

Dunuin said:
Thats not how it works. You need 3 real physical machines. Either 3 full PVE nodes or 2 PVE nodes + 1 Qdevice on some third machine like a physical NAS, SBC or whatever.
Quorum is there to prevent screwing everything up (incl. data-loss) in case of an split brain and only way to prvent this is to have an uneven number of real physical machines (which isn't possible with only two machines).

Then you shouldn't run a cluster and especially no HA. Best then would be two stand-alone PVE nodes.

Please see my further reply/clarification on post.
Thanks

Dunuin · Apr 4, 2024

fabitaly67 said:
Please see my further reply/clarification on post.

We saw this. But how should this prevent a split-brain situation with only two nodes if you are running one qdevice VM on each of those two nodes? See here what a split-brain is: https://en.wikipedia.org/wiki/Split-brain_(computing)

If you want HA you need three physical machines. It's mathematically impossible to have quorum with only two nodes where any of those two should be able to fail. Like you can't form a triangle with only two lines.

fabitaly67 · Apr 8, 2024

Dunuin said:
We saw this. But how should this prevent a split-brain situation with with only two nodes if you are running one qdevice VM on each of those two nodes? See here what a split-brain is: https://en.wikipedia.org/wiki/Split-brain_(computing)

If you want HA you need three physical machines. It's mathematically impossible to have quorum with only two nodes where any of those two should be able to fail. Like you can't form a triangle with only two lines.

Many thanks Dunuin,
In the past days I studied deeply the documentations about the "split brain" and about the Proxmox Cluster philosophy and best practices even if my setup is an Home Lab.

I understood that the third Qdevice is better to be offside (in a site where it is not placed node 1 or node 2 of cluster) in a network reachable from both main two nodes (even with no "low latence") expecially for my case where it could be dangerous and not suggested to have the Qdevice placed in Site1 or Site2 (together with one of the two main PVE nodes)!!

Than, my final idea to set up my two 2 nodes in cluster + the Qdevice (as quorum 3rd vote) will be as following:

- PVE node 1 of cluster (at my home - Site1) IP 10.10.40.81
- PVE node 2 of cluster (at workplace - Site2) IP 10.10.40.82
- PBS on a VM (IP 10.10.60.10) on my son's home-Site3 (he has a PVE standalone node) with NFS storage/share on my QNAP (IP 10.10.40.1) on the same site of node 1 "AS THIRD qdevice"

My son's PVE will never be part of my cluster (will remain stand alone PVE node), so the idea of put on it a PBS (Proxmox Backup Server) virtual machine as Qdevice will allow to create the third Qdevice and in the same time permit to make incremental Backups of VMs placed on Site1 and Site2 in a QNAP NAP placed at my home (Site1)

Site1 and Site2 are connected between them by a wifi Bridge link with bandwidth of 120 mbits. Site3 is connected by a Site-to-site IPSEC VPN to Site1 and with another one Site-to-site IPSEC VPN to Side2 with 100mbits bandwidth. So all the 3 members of Cluster are connected and reachable one to each other.

The gol is to have HA between Site1 and Site2 and in the same time an incremental Backups on my home NAS.

Appreciate comments and suggestions.

Thanks in advance,
FAB

leesteken · Apr 8, 2024

fabitaly67 said:
In the past days I studied deeply the documentations about the "split brain" and about the Proxmox Cluster philosophy and best practices even if my setup is an Home Lab.

So you know that it's a bad idea to cluster over different physical locations? See this recent comment for example: https://forum.proxmox.com/threads/multiple-pve-backup-server-set-up.144701/post-651458

fabitaly67 said:
- PVE node 1 of cluster (at my home - Site1) IP 10.10.40.81
- PVE node 2 of cluster (at workplace - Site2) IP 10.10.40.82

Home office? Otherwise make sure that the latency is low enough and the connection reliable enough for quorum to work.

fabitaly67 said:
- PBS on a VM (IP 10.10.60.10) on my son's home-Site3 (he has a PVE standalone node) with NFS storage/share on my QNAP (IP 10.10.40.1) on the same site of node 1 "AS THIRD qdevice"

That will probably make sure everything works at home (if there are no hardware breakdowns) but the workplace node might reboot and go read-only more often than you expect.

fabitaly67 · Apr 8, 2024

leesteken said:
So you know that it's a bad idea to cluster over different physical locations? See this recent comment for example: https://forum.proxmox.com/threads/multiple-pve-backup-server-set-up.144701/post-651458

Home office? Otherwise make sure that the latency is low enough and the connection reliable enough for quorum to work.

That will probably make sure everything works at home (if there are no hardware breakdowns) but the workplace node might reboot and go read-only more often than you expect.

Hi Ieesteken,
main reason for different location of node1 and node2 is that at site1, where is located node1(my home) I must have the possibility to switchoff completely this node (node1 - when I go to vacations, I mean switchoff all my homelab) and prior migrate all their VMs to the Node2 (hosted/placed at homework server room). This is already tested with the current 2 nodes Cluster with only two members, node1 and node2 (even it is not completaly funcional due the missing quorum/qdevice). Migration of my node1 VMs offsite for me it is a "must". HA is not a must but it is appreciated (as VMs failover).
In the current setup, if I switch off node 2 (Workplace node), Node1 at home go offline (in the reality this node1 is online but their VMs are switched-off or down), Instead, if I remove ethernet cable of node2 an then switch it off, node1 remain with their VMs up (this is an unsual/strange behaviour confirmed by other users).
Actually latency between node1 and node2 is between 2 and 10ms(detected by ping). Ping time between Node1 and stand-alone PVE at son's home (where is planned to place the PBS as Qdevice and Backups) is between 20 and 30ms.

Any comments will be appreciated.

FAB

Dunuin · Apr 8, 2024

fabitaly67 said:
Migration of my VMs offside for me it is a "must".

For that there is the new "qm remote-migrate ..." command for migrating VMs between different clusters (or stand-alone nodes).

fabitaly67 said:
Actually latency between node1 and node2 is between 2 and 10ms(detected by ping).

Another problem might be that you use the same wifi connection for corosync + migration traffic + backup traffic + guest traffic? See the manual:

The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.
The network should not be used heavily by other members, as while corosync does not uses much bandwidth it is sensitive to latency jitters; ideally corosync runs on its own physically separated network. Especially do not use a shared network for corosync and storage (except as a potential low-priority fallback in a redundant configuration).

So usually you want a dedicated NIC + switch only for corosync so the connection will never be sturated and driving up the latency which could for example cause fencing to reboot your nodes. Cluster and wifi is a bad combination.

fabitaly67 · Apr 8, 2024

Dunuin said:
For that there is the new "qm remote-migrate ..." command for migrating VMs between different clusters (or stand-alone nodes).

Another problem might be that you use the same wifi connection for corosync + migration traffic + backup traffic + guest traffic? See the manual:

So usually you want a dedicated NIC + switch only for corosync so the connection will never be sturated and driving up the latency whcih could cause fencing to reboot your nodes.

So you are suggesting to make the de-clustering of the 2 nodes and renunce to have a HA and incremental backups in a shared NFS share???

or whatelse you can suggest for my needs?

Resume of my needs or musts:

- Migration of VMs from node1 and node2
- HA/failover of certain VMs on the two nodes (not mandatory but highly appreciated)
- Incremental Backups in a shared NFS storage (different to the node1 or 2) , a QNAP NAS placed in site1

Again, really appreciated suggestions

FAB

Dunuin · Apr 8, 2024

fabitaly67 said:
- Incremental Backups in a shared NFS storage (different to the node1 or 2) , a QNAP NAS placed in site1

For backups, set up a PBS on both locations and sync the backup snapshots via sync jobs. This works fine with higher latenencies and without a cluster while still being able to do fast local backups/restores, having a offsite backup and ransomware protection.

fabitaly67 said:
- Migration of VMs from node1 and node2

This would work without a cluster on PVE8.X.

fabitaly67 said:
- HA/failover of certain VMs on the two nodes (not mandatory but highly appreciated)

For that you need a cluster but you are below the recommended minimum requirements to run such.

fabitaly67 · Apr 8, 2024

Dunuin said:
For backups, set up a PBS on both locations and sync the backup snapshots via sync jobs. This works fine with higher latenencies and without a cluster while still being able to do fast local backups/restores, having a offsite backup and ransomware protection.

This would work without a cluster on PVE8.X.

For that you need a cluster but you are below the recommended minimum requirements to run such.

Ok, Thanks
According to your reccomendations I will evaluate to de-cluster my Cluster or put both the nodes 1 and 2 at my home.

For your information, I tried now to make a test to remove all the HA jobs set in node1 for testing. After the switchoff of the node2 nothing has happened in Node1 (the node keeps its VMs running .......... no VMs are switchedoff or fencing attempts are observated).
Is it correct???
Can this to incourage to keep the idea of keeping all clustered, I mean node1 and node2 (renuncing to HA) ???

fabitaly67 · Apr 8, 2024

fabitaly67 said:
Ok, Thanks
According to your reccomendations I will evaluate to de-cluster my Cluster or put both the nodes 1 and 2 at my home.

For your information, I tried now to make a test to remove all the HA jobs set in node1 for testing. After the switchoff of the node2 nothing has happened in Node1 (the node keeps its VMs running .......... no VMs are switchedoff or fencing attempts are observated).
Is it correct???
Can this to incourage to keep the idea of keeping all clustered, I mean node1 and node2 (renuncing to HA) ???

Even I tried to remove and reinsert network cable several times on node2. Nothing happen in node1

This really means that disabling HA, no side effects or fencing occures in the cluster with only 2 nodes.

Shell be this true???

fabitaly67 · Apr 11, 2024

fabitaly67 said:
Even I tried to remove and reinsert network cable several times on node2. Nothing happen in node1

This really means that disabling HA, no side effects or fencing occures in the cluster with only 2 nodes.

Shell be this true???

According to the last tests, just to share the behaviour of PVE in a Cluster ambient, if the node1 (with no HA job set) remain ON whit its VMs regulary running, after node2 switchoff, all the VMs of node1 remain still active till reboot of node1 (when it remain stand alone). If node 1 reboots (alone) without node2 ON, all the VMs of it (node1) doesn't start fencing the node due missing of 2nd vote.
This is confirmed after several tests.
Resuming two nodes + qdevice (or tree nodes) are absolutely necessary for cluster!!
I will proceed to put in my home lab the 2 nodes always running with a Qdevice, this is in the first step. In a second step I will put another node offsite with PBS backuping the cluster on a QNAP. I don't want to put the NAS in the same place of cluster, so I will proceeed to setup PBS and make the "first complete backup" of cluster onsite and than move its hardware offsite (to continue with the following incremental backups).
FAB

Asipec · Apr 15, 2024

Hello,

I am writing regarding this topic. i am using 2 servers running proxmox VE and a PI as Qdevice so currently 3 devices. But they are not all in the same floor and not using UPS because of space lack and it happened that one floor lost power and the main server reboots and vms stopped working. I hava an idea to put VM on each node running Debian as extra Qdevices. so totaly 5 votes. so if the power goes out in any of the floors both of the nodes have minimum 2 votes to stop the reboot on machines. I would like to hear your comments on this if this is ok idea or is it dumb.
Sorry for bad english (balkans and all xD )

Thank you all in advance!

Dunuin · Apr 15, 2024

Asipec said:
I hava an idea to put VM on each node running Debian as extra Qdevices. so totaly 5 votes. so if the power goes out in any of the floors both of the nodes have minimum 2 votes to stop the reboot on machines. I would like to hear your comments on this if this is ok idea or is it dumb.

No, bad idea. Qdevice shouldn't be running virtualized on the same devices that are already voting. That beats the whole point of quorum.
And you really should think about getting two UPSs. Search for all the endless threads in this forum where a VM or even the whole PVE node won't boot again after an power outage because this corrupted the filesystems.

tomachi · Jul 16, 2024

Rather than use a whole VM, I am voting on my workstation, which goes down all the time, no matte though, running Kubuntu etc:

external_workstation# apt install corosync-qnetd
then I will install on every node of the the cluster

elite-rapper# apt install corosync-qdevice
hulk# apt install corosync-qdevice

and then just this on any 1 of the cluster nodes:
hulk# pvecm qdevice setup 10.0.0.123 etc

I wonder if I should have my backup eth wifi same host also?
hulk# pvecm qdevice setup 10.0.0.111 etc
hulk# pvecm qdevice setup 10.0.0.222 etc

Search

Search

Qdevice VM in both of nodes for quorum votes

fabitaly67

New Member

Dunuin

Distinguished Member

Azunai333

Active Member

fabitaly67

New Member

fabitaly67

New Member

Dunuin

Distinguished Member

fabitaly67

New Member

leesteken

Distinguished Member

fabitaly67

New Member

Dunuin

Distinguished Member

fabitaly67

New Member

Dunuin

Distinguished Member

fabitaly67

New Member

fabitaly67

New Member

fabitaly67

New Member

Asipec

New Member

Dunuin

Distinguished Member

tomachi

New Member

We value your privacy