TASK ERROR: cluster not ready - no quorum? - Start all vms and containers

  • Thread starter Thread starter costexx
  • Start date Start date
C

costexx

Guest
Hello.
I don't know if this is a bug or feature...
Today i had a power failure for a 2 node cluster.
After power was restored the servers started but no vm came up.

I connected to see where is the problem and i found: Start all vms and containers -> TASK ERROR: cluster not ready - no quorum?
I started every machine manually and everything worked.

I imagine that when the servers started one of them started first and when it tried to start the machines.. no quorum.

Bug or feature?
 
You should protect your nodes wit a UPS.

The cluster code waits some time for quorum. But if the other node does not get online within that time-span the VMs fail to start.
 
You should protect your nodes wit a UPS.

The cluster code waits some time for quorum. But if the other node does not get online within that time-span the VMs fail to start.

The servers were on ups but only for 2h :(
And they were on the same ups. That's why i was i little bit puzzled on the reason why there was no quorum. It should start in approximately the same time.
I will check the logs when i have a little bit of time.
 

I have to disagree. While you should certainly have your nodes on UPS systems, I don't generally want my auto starting VMs to be dependent on the health of every node in the cluster.

For a while I had one node using an older, slower boot drive and this caused it to not be present for the quorum every time we had a power failure that lasted longer than our UPS could handle. Meaning I had to manually reboot each VM every time the cluster dropped. Admittedly this was not very often but it did mean that we could never use PVE for anything critical unless I was avaliable, locally, or remotely to manually boot VMs.

PVE really should try to start any VM that's set for auto-start and that does not depend on another node (ie. shared storage on a node outside the quorum). Some more flexibility in this startup stuff would be really nice.

My workaround has always been to delay VMs startup by 5 minutes or so, but that's not ideal of course.
 
  • Like
Reactions: chrone
I just ran into this issue with a four node cluster that was all powered on at once after a multi-hour outage.

when the system tells me it's not ready, I expect it to try again, so I wrote a kludgy little untested startup script to address the problem:

#!/bin/bash


while [ $(awk '{print $1}' /proc/uptime | awk -F. '{print $1}') -lt 300 ]
do sleep 30
service pve-manager start
done

It's a little noisy in the logs, if it wasn't I'd probably just cron start to run every 3min forever.
If somebody else has a more elegant solution I would be happy to use it.
I'm putting this on all nodes, since they all threw the error.
 
  • Like
Reactions: chrone
I have 4 servers in the cluster.. Rebooted 2 last night..
1 server cam up and started al the VMs. The other server came up, started 2 (of 6) and then stalled. TASK ERROR: cluster not ready - no quorum? Why after successfully starting 2 of them would it fail?? There is a 5 minute delay in starting each VM..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!