Workaround to Cluster not ready no quorum (500) permanently!

vctgomes

New Member
Jul 4, 2023
15
1
3
I have two nodes connected on a cluster, one of them is my router too. I created clusters to be easily manage and transfer VMs and LXCs. I don't use things like HA or ZFS.

The problem is if I restart the machine that is with my router, VMs and LXCs doesn't started because error "Cluster not ready no quorum (500)", once my VM OpenWRT didn't started and my node 1 is disconnected from cluster with node 2.

I can fix the problem easily going to node 1 and giving the command pvecm expected 1, than finally my OpenWRT VM starts and everything is back to normal.

I'd like to know if is it possible to create a definitive workaround to expect only 1 cluster to start at least openwrt VM just when Proxmox starts.
 
There is no good way to fix this besides getting a third machine as a third node or at least as a qdevice so it could act as a third voter.
It's mathematically not possible to not run into a split brain situation with less than 3 voters. Thats why an uneven number of 3+ nodes is a requirement.
 
Last edited:
There is no good way to fix this besides getting a third machine as a third node or at least as a qdevice so it could act as a third voter.
It's mathematically not possible to not run into a split brain situation with less than 3 voters. Thats why an uneven number of 3+ nodes is a requirement.
The problem is even with 3 nodes, if my node 1 with router turn off, everything would stop working.
 
The problem is even with 3 nodes, if my node 1 with router turn off, everything would stop working.
Why you then don't use a high available virtualized router on both nodes? I for example got a OPNsense VM on two PVE nodes and if one of those VMs (or the whole node) is failing, the connections won't drop and the other VM on the other node will replace it within a second or so. That runs totally independently from PVEs HA you even wouldn't need a cluster at all. See for example here: https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration
 
Last edited:
  • Like
Reactions: UdoB
I have the same problem. I currently have 3 nodes located in different locations and connected via a VPN, which is kept on one of the VMs. If I have to restart any machine in the node, it will not remain blocked.
I had 3 nodes in a cluster in the same location and I experienced various problems when I was not physically near the computers. I also noticed that pve expected 1 is not necessarily permanent. I admit that I quickly gave up on the tests and remove the cluster.
I have to use cluster, because I like the transfer possibilities between nodes. Thank you in advance
 
  • Like
Reactions: vctgomes
There is no good way to fix this besides getting a third machine as a third node or at least as a qdevice so it could act as a third voter.
It's mathematically not possible to not run into a split brain situation with less than 3 voters. Thats why an uneven number of 3+ nodes is a requirement.
In essence, with a cluster, you do NOT want to run something that handles your routing.
In my opinion : for the love of god, and peace of mind, set your router/firewall by ways of hardware outside your cluster.
For a Non-clustered env its all fine, clustered, you dont want the firewall on it.

- Glowsome
 
  • Like
Reactions: itiberiu
In essence, with a cluster, you do NOT want to run something that handles your routing.
In my opinion : for the love of god, and peace of mind, set your router/firewall by ways of hardware outside your cluster.
For a Non-clustered env its all fine, clustered, you dont want the firewall on it.

- Glowsome
Is there any security risks? My Ethernet ports are running using passthrough, than Proxmox doesn't even has access to internet directly.

What's the risks on this implementation?
 
Is there any security risks? My Ethernet ports are running using passthrough, than Proxmox doesn't even has access to internet directly.

What's the risks on this implementation?
In reading your situation i do not see a risk, as you are placing a/the firewall outside/ in front of of the whole cluster setup.
Meaning you are not running into an infinite loop when your pfsense ( as a cluster resource) is not up, but is needed to be up for all nodes to reach quorum.
 
  • Like
Reactions: vctgomes
In reading your situation i do not see a risk, as you are placing a/the firewall outside/ in front of of the whole cluster setup.
Meaning you are not running into an infinite loop when your pfsense ( as a cluster resource) is not up, but is needed to be up for all nodes to reach quorum.
Thanks!
 
I had this error with me and this solution helped me a lot:
scp that files from the node that working fine in your cluster to the node that have an issues such as (proxmox no quorum 500)
scp -r /etc/corosync/* root@xx.xx.xx.xx:/etc/corosync/
scp /etc/pve/corosync.conf root@xx.xx.xx.xx:/etc/pve/
systemctl restart pve-cluster
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!