Understanding "Cluster not ready - no quorum? (500)" during VM start

davidand · Aug 29, 2020

I'm administering a Proxmox setup that has two machines in cluster:
1) a mission critical server - which runs on a long lasting UPS (should run without power for hours)
2) a non-critical server - that runs on a small UPS (lasts for a couple of minutes to turn off the server properly)

The problem is that when there is a power outage and I reboot server 1), I cannot start any VM because of the "cluster not ready - no quorum? (500)" error message. The VMs I want to start has nothing to do with server 2) at all, no shared resources, no shared storages. In my opinion I should be able to start these VMs.

Of course, a good question, why am I running server 1) and server 2) in cluster? Well I do that because I want to be able to migrate VMs between 1) and 2) as needed.

IMO when a VM is not running in HA mode, it should start fine even if another (non-related) node of the custer is down. Is my understanding wrong? Nobody else is having problem like me? Thanks!

dietmar · Aug 30, 2020

davidand said:
IMO when a VM is not running in HA mode, it should start fine even if another (non-related) node of the custer is down. Is my understanding wrong? Nobody else is having problem like me? Thanks!

The problem is that PVE does not know (100%) that the other server is offline, and to avoid split-brain it rejects to modify or start any VMs.
The good news is that you can simply set "expected votes" to one in that case, and you can work again:

# pvecm expected 1

Note: But please only do that if you are 100% sure the other node is offline.

davidand · Aug 30, 2020

Hi @dietmar - I was uncertain whether I can do a `pvecm expected 1` legally and then change it to `pvecm expected 2` before I power on the second server? Is that a correct assumption?

Do you think that I should discard this cluster and go ahead with two separate PVM machines if it's essential to my client for one machine to operate 100% functional if another node if offline (there are numerous reasons: hardware failure, software failure, power failure)?

My assumption is: building VMs means that failure of one system (VM) does not affect other systems (VMs). On the other hand if a failure of one single cluster node requires non-trivial manual intervention then I think clustering is a good idea only-and-only if someone requires HA / heavy load balancing. Is my assumption correct?

Thanks!

dietmar · Aug 30, 2020

davidand said:
Hi @dietmar - I was uncertain whether I can do a `pvecm expected 1` legally and then change it to `pvecm expected 2` before I power on the second server? Is that a correct assumption?

There is no need to set expected votes to '2' - this is done automatically when the second node connects.

davidand said:
Do you think that I should discard this cluster and go ahead with two separate PVM machines if it's essential to my client for one machine to operate 100% functional if another node if offline (there are numerous reasons: hardware failure, software failure, power failure)?

My assumption is: building VMs means that failure of one system (VM) does not affect other systems (VMs). On the other hand if a failure of one single cluster node requires non-trivial manual intervention then I think clustering is a good idea only-and-only if someone requires HA / heavy load balancing. Is my assumption correct?

Yes (for 2 node clusters, and only If you really think the command "pvecm expected 1" in non-trivial manual intervention).

davidand · Aug 30, 2020

Hi @dietmar, thanks for your aswer.

There is no need to set expected votes to '2' - this is done automatically when the second node connects.

Just to clarify: in your first comment you wrote that I should run the `pvecm expected 1` command only if I'm absolutely sure that the second host is not running. Do I understand correcly that even if I did run the `pvecm expected 1` command while the second host was running, it wasn't a disaster because I would just restart the second node and it would re-register itself? I.e. restart solves this problem?

Yes (for 2 node clusters, and only If you really think the command "pvecm expected 1" in non-trivial manual intervention).

Well, yes, I think it's non-trivial in a manner that it's not trivial to automate

First we need to count the nodes that are active, we need to make it sure that the other nodes are really down, then we need to execute the `pvecm expected N` command based on the information above and everything needs to run as a startup script so that no manual intervention is necessary. Additionally timing is also an issue because we should give the other node enough time to boot before we declare that it's dead and decrease the 'expected' count. — This is what I meant by non-trivial

dietmar · Aug 30, 2020

davidand said:
Do I understand correcly that even if I did run the `pvecm expected 1` command while the second host was running,

As stated above, you should never do that because can lead to split brain.

davidand said:
Well, yes, I think it's non-trivial in a manner that it's not trivial to automate

Yes, that is why most people use at least 3 nodes for a cluster.

davidand · Aug 30, 2020

As stated above, you should never do that because can lead to split brain.

But it's still correct to 1) set `pvecm expected 1` and 2) then power on the second node? (As per the suggestion in your 2nd response)

Yes, that is why most people use at least 3 nodes for a cluster.

Does it mean that if I have a cluster of 3 or more nodes then one node failing does not paralyze other nodes from starting VMs, as long as there are at least 2 nodes running?

r.jochum · Aug 30, 2020

Use this without a 3rd node https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support

dietmar · Aug 30, 2020

davidand said:
But it's still correct to 1) set `pvecm expected 1` and 2) then power on the second node? (As per the suggestion in your 2nd response)

Sorry, but I do not understand that question. I makes no sense to set expected votes manually If you power on the second node anyways.

davidand said:
Does it mean that if I have a cluster of 3 or more nodes then one node failing does not paralyze other nodes from starting VMs, as long as there are at least 2 nodes running?

You need "quorum", that means a majority of nodes must be online (more than half of your nodes).

ieronymous · Apr 4, 2021

dietmar said:
Sorry, but I do not understand that question. I makes no sense to set expected votes manually If you power on the second node anyways.

You need "quorum", that means a majority of nodes must be online (more than half of your nodes).

Jumped up a little bit late to the post but why dont stop the quorum service until he needs it?

Mohamed Elhossiny · Jan 26, 2024

I had this error with me and this solution helped me a lot:
scp that files from the node that working fine in your cluster to the node that have an issues such as (proxmox no quorum 500)
scp -r /etc/corosync/* root@xx.xx.xx.xx:/etc/corosync/
scp /etc/pve/corosync.conf root@xx.xx.xx.xx:/etc/pve/
systemctl restart pve-cluster

Search

Search

Understanding "Cluster not ready - no quorum? (500)" during VM start

davidand

Active Member

dietmar

Proxmox Staff Member

davidand

Active Member

dietmar

Proxmox Staff Member

davidand

Active Member

dietmar

Proxmox Staff Member

davidand

Active Member

r.jochum

Renowned Member

dietmar

Proxmox Staff Member

ieronymous

Well-Known Member

Mohamed Elhossiny

New Member

We value your privacy