PVE cluster lost quorum

thomasctr

New Member
Dec 29, 2023
2
0
1
Hi all,

We have a 4 node cluster and everything was fine until two of these nodes got disconnected. Now we have a situation where we seems to have lost the quorum with no way to get it back. We especially have some errors we can not get rid of, so any help would be appreciated:

The information I can give (please ask for more if necessary):
Code:
# pveversion
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-14-pve)

Logs from pve-cluster services:
Code:
pmxcfs[3670817]: [quorum] crit: quorum_initialize returned wrong quorum_type: 0

Code:
# pvecm expected 1
You cannot change expected votes, corosync is not using votequorum

Code:
# pvecm status
Cluster information
-------------------
Name:             INFRA
Config Version:   6
Transport:        knet
Secure auth:      on

Unable to start votequorum status tracking: CS_ERR_BAD_HANDLE

This CS_ERR_BAD_HANDLE come back in any command about corosync quorum (like corosync-quorumtool -s )

Do you have any idea?
 
So... For now it seems the cluster is back. What we did is adding the following to /etc/corosync/corosync.conf on each nodes:

Code:
quorum {
  provider: corosync_votequorum
}

Then we restarted pve-cluster and corosync services.

pvecm status seems to give us correct informations and all vm have been restarted and backed up.
Code:
root@host:~# pvecm status
Cluster information
-------------------
Name:             INFRA
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jan  2 18:01:06 2024
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000004
Ring ID:          1.157
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.253.w
0x00000002          1 192.168.253.x
0x00000003          1 192.168.253.y
0x00000004          1 192.168.253.z (local)

Do you see anything wrong in this? Should I let the config file as is or do I need to remove what we added?
 
Note that you need to bump the config version in order for the changes to take place. Please always edit /etc/pve/corosync.conf rather than /etc/corosync/corosync.conf, the former is shared across the entire cluster. See [1] for more info on corosync.conf.

The output from pvecm status looks ok to me.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!