Cluster Recovery

tannebil

Member
Jun 26, 2023
8
0
6
Through a series of unfortunate events and poor decisions, my cluster doesn't have quorum and I can't figure out how to get one. It has three nodes and a Qdevice (having failed to read about the pitfall of not deleting a qdevice before adding a third node). The primary node is operating but two nodes are red and, while the Qdevice shows in the cluster configuraition, it doesn't have a quorum vote. I tried setting expected votes to 1 on the working node but expected votes stubbornly remains at 2. Unfortunately, all the advice I've found about removing failed nodes assumes that the remaining nodes have a quorum and config files can be manually edited. All the VMs and containers are backed up on a Proxmox Backup Server so I assume that, worst cast, I can just rebuild the cluster from scratch.

But I wanted to check if I'm missing an easy way to restore quorum and remove the failed nodes rather than doing a complete rebuild. Any help?

Thanks, Bill
 
Hi,
seems like you have a network communication issue, I assume 2 of the 3 nodes are not able to see what you called the "primary node" and the qdevice. Test your corosync network connectivity and check the systemd journal for errors on all 3 nodes.