Cluster not quorate - extending auth key lifetime!

Miro_I

Member
Apr 2, 2021
30
3
13
43
Hello,
I am running 2 node cluster and it was working good but recently i seen this error in one node's syslog:
Code:
Sep 11 12:36:08 itho-ms pveproxy[3102243]: Cluster not quorate - extending auth key lifetime!
Sep 11 12:36:08 itho-ms pvedaemon[3101577]: Cluster not quorate - extending auth key lifetime!

All nodes running pve 8.2

Code:
root@itho-ms:~# pvecm status
Cluster information
-------------------
Name:             ITTC-Ruse
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Sep 11 12:41:03 2024
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.2f6d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.1.1
0x00000002          1 10.10.1.2 (local)

Any suggestions?
 
I think cluster is not quorate due to "WaitForAll" flag. Seems like you are using a custom corosync.conf with "two_node" flag enabled. Please post it in CODE tags. Also the output of corosync-cfgtool -n and corosync-cfgtool -s of both hosts.
 
Code:
root@ruse:~# corosync-cfgtool -n
Local node ID 1, transport knet
nodeid: 2 reachable
   LINK: 0 udp (10.10.1.1->10.10.1.2) enabled connected mtu: 1317

root@ruse:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0 udp
        addr    = 10.10.1.1
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected

Code:
root@itho-ms:~# corosync-cfgtool -n
Local node ID 2, transport knet
nodeid: 1 reachable
   LINK: 0 udp (10.10.1.2->10.10.1.1) enabled connected mtu: 1317

root@itho-ms:~# corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
        addr    = 10.10.1.2
        status:
                nodeid:          1:     connected
                nodeid:          2:     localhost
 
I tried some solutions in the web, restarted one of the nodes and not sure what happened but no longer see the "not quorate" error.
Now one of the nodes floods the log with this error:
Code:
Sep 11 16:34:12 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
Sep 11 16:34:12 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
Sep 11 16:34:13 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
Sep 11 16:34:13 itho-ms pvescheduler[3325509]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Sep 11 16:34:14 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
Sep 11 16:34:14 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
Sep 11 16:34:14 itho-ms corosync[3322665]:   [TOTEM ] Retransmit List: 10 11 5a 66 67 8a 98 a2 a3 a9 aa bc bf
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!