Hi,
I'm running a cluster with 28 nodes. After a full cluster reboot (all nodes restarted) none of the nodes want to join up. This is what each of the nodes sees:
Rebooting individual nodes doesn't help, even though corosync seems to be ok:
Only when I bring down all the hosts (physical power off), and the bring them back online one-by-done do the nodes cluster back up. If I leave one or two hosts powered on it doensn't work. They really all need to be powered down.
Has anyone seen this before? Any ideas what is preventing the nodes from joining together despite "corosync-cfgtool -s" showing they can talk?
I'm running a cluster with 28 nodes. After a full cluster reboot (all nodes restarted) none of the nodes want to join up. This is what each of the nodes sees:
Code:
Cluster information
-------------------
Name: pvenl02
Config Version: 57
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Mon Sep 12 23:40:54 2022
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000003
Ring ID: 3.5498
Quorate: No
Votequorum information
----------------------
Expected votes: 28
Highest expected: 28
Total votes: 1
Quorum: 15 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.40.6.3 (local)
Rebooting individual nodes doesn't help, even though corosync seems to be ok:
Code:
# corosync-cfgtool -s
Local node ID 3, transport knet
LINK ID 0 udp
addr = 10.40.6.3
status:
nodeid: 1: connected
nodeid: 2: connected
nodeid: 3: localhost
nodeid: 4: connected
nodeid: 5: connected
nodeid: 6: connected
nodeid: 7: connected
nodeid: 8: connected
nodeid: 9: connected
nodeid: 10: connected
nodeid: 11: connected
nodeid: 12: connected
nodeid: 13: connected
nodeid: 14: connected
nodeid: 15: connected
nodeid: 16: connected
nodeid: 17: connected
nodeid: 18: connected
nodeid: 19: connected
nodeid: 20: connected
nodeid: 21: connected
nodeid: 22: connected
nodeid: 23: connected
nodeid: 24: connected
nodeid: 25: connected
nodeid: 26: connected
nodeid: 27: connected
nodeid: 28: connected
Only when I bring down all the hosts (physical power off), and the bring them back online one-by-done do the nodes cluster back up. If I leave one or two hosts powered on it doensn't work. They really all need to be powered down.
Has anyone seen this before? Any ideas what is preventing the nodes from joining together despite "corosync-cfgtool -s" showing they can talk?
Last edited: