Hello,
we have a 16 node cluster. While adding the last node there was an timeout and after that the cluster got broken and the nodes can no longer see each other.
The pvecm nodes just lists his own node on every server. The corosync is using the CPU ~ 300%.
The pvecm status is showing a failed quorum status.
The nodes can see each other in the network (ping).
I already restartet the pve-cluster and the corosync services but this does not help.
I can see the following entries flooding the syslog:
corosync[128496]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Can you help us out here?
Thanks Arne
we have a 16 node cluster. While adding the last node there was an timeout and after that the cluster got broken and the nodes can no longer see each other.
The pvecm nodes just lists his own node on every server. The corosync is using the CPU ~ 300%.
The pvecm status is showing a failed quorum status.
Code:
root@cluster-a1:~# pvecm status
Cluster information
-------------------
Name: cluster-a
Config Version: 24
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Jun 26 00:32:18 2021
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.665
Quorate: No
Votequorum information
----------------------
Expected votes: 16
Highest expected: 16
Total votes: 1
Quorum: 9 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 XX.XXX.X.XXX (local)
The nodes can see each other in the network (ping).
I already restartet the pve-cluster and the corosync services but this does not help.
I can see the following entries flooding the syslog:
corosync[128496]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Can you help us out here?
Thanks Arne