[SOLVED] Single Node Unavailable - Cluster Not Ready

alfonsomarzano

New Member
Nov 22, 2023
1
0
1
Italy
Hi,
we have a 3-node Proxmox VE 5.2-9 cluster (with CEPH) that is having issues with one of its nodes synchronizing.

Connecting to its GUI, the node is online but all VMs are Offline, and starting them results in the message: "Cluster not ready - no quorum".

Code:
pvecm status (node 1)

Quorum information
------------------
Date:             Wed Nov 22 11:11:53 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/1262744
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.0.121 (local)

Code:
pvesr logs show this:

Nov 20 12:24:00 pmox-ceph-01 systemd[1]: Starting Proxmox VE replication runner...
Nov 20 12:24:00 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:01 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:02 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:03 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:04 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:05 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:06 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:07 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:08 pmox-ceph-01 pvesr[4499]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 20 12:24:09 pmox-ceph-01 pvesr[4499]: error with cfs lock 'file-replication_cfg': no quorum!
Nov 20 12:24:09 pmox-ceph-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Nov 20 12:24:09 pmox-ceph-01 systemd[1]: Failed to start Proxmox VE replication runner.

Tests attempted:
  1. restart corosync
  2. restart pve-cluster
  3. restart node (we're not restarting the other nodes in fear of a complete lockout)
  4. changed cluster/corosync switch and switch ports
  5. reconfigured multicast on switch (Cisco SG550XG-8F8T)
We ran an omping test with puzzling results:
Code:
Node 1:

192.168.0.122 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.066/0.133/0.317/0.026
192.168.0.122 : multicast, xmt/rcv/%loss = 10000/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
192.168.0.123 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.071/0.139/0.361/0.027
192.168.0.123 : multicast, xmt/rcv/%loss = 10000/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

Node 2:

192.168.0.121 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.069/0.129/0.268/0.024
192.168.0.121 : multicast, xmt/rcv/%loss = 10000/9998/0%, min/avg/max/std-dev = 0.073/0.140/0.312/0.030
192.168.0.123 :   unicast, xmt/rcv/%loss = 9847/9847/0%, min/avg/max/std-dev = 0.076/0.135/0.305/0.031
192.168.0.123 : multicast, xmt/rcv/%loss = 9847/9847/0%, min/avg/max/std-dev = 0.083/0.160/0.348/0.035

Node 3:

192.168.0.121 :   unicast, xmt/rcv/%loss = 9690/9690/0%, min/avg/max/std-dev = 0.074/0.134/0.285/0.027
192.168.0.121 : multicast, xmt/rcv/%loss = 9690/9681/0% (seq>=10 0%), min/avg/max/std-dev = 0.078/0.142/0.302/0.030
192.168.0.122 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.075/0.126/0.336/0.027
192.168.0.122 : multicast, xmt/rcv/%loss = 10000/9991/0% (seq>=10 0%), min/avg/max/std-dev = 0.087/0.146/0.339/0.031

There have not been any network/configuration changes made since the issue was reported on a Saturday afternoon.

Has anyone else experienced this issue?

Edit: while we still haven't found the source of the issues on node1, the problem has been solved by switching the corosync transport protocol to udpu.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!