I have been using proxmox for more than 10 years now and encountered a number of issues that I was always more or less able to solve, today
I want to share a problem that is happening to me for which I can't find a solution. I think that might be a real problem that is worth investigating.
We have a 24 node pve 6 cluster, months ago updated from pve 5. As we had heating issues in our data center we powered off 11 nodes, leaving 13 of them up. Quorum was never lost.
Yesterday I wanted to turn on some of the hypervisors that were previously powered off but they can not join the corosync cluster anymore.
This the message i find on working corosync members:
This is the status of the "good" part of the cluster:
This is what a node I just powered on sees:
Note that the Ring ID is different. The corosync.conf, renamed corosync.txt is attached. Same on all nodes.
I want to share a problem that is happening to me for which I can't find a solution. I think that might be a real problem that is worth investigating.
We have a 24 node pve 6 cluster, months ago updated from pve 5. As we had heating issues in our data center we powered off 11 nodes, leaving 13 of them up. Quorum was never lost.
Yesterday I wanted to turn on some of the hypervisors that were previously powered off but they can not join the corosync cluster anymore.
This the message i find on working corosync members:
Code:
Jul 27 09:34:48 hnode21 corosync[1993]: [TOTEM ] Message received from 192.168.145.119 has bad magic number (probably sent by unencrypted Kronosnet).. Ignoring
This is the status of the "good" part of the cluster:
Code:
root@hnode21:~# pvecm status
Cluster information
-------------------
Name: u-lite-v2
Config Version: 43
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Jul 27 10:04:10 2021
Quorum provider: corosync_votequorum
Nodes: 13
Node ID: 0x00000015
Ring ID: 1.a9e
Quorate: Yes
Votequorum information
----------------------
Expected votes: 24
Highest expected: 24
Total votes: 13
Quorum: 13
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.145.102
0x00000002 1 192.168.145.103
0x00000003 1 192.168.145.101
0x00000004 1 192.168.145.100
0x00000006 1 192.168.145.106
0x00000008 1 192.168.145.120
0x00000009 1 192.168.145.118
0x0000000b 1 192.168.145.108
0x0000000d 1 192.168.145.110
0x0000000f 1 192.168.145.116
0x00000012 1 192.168.145.113
0x00000015 1 192.168.145.121 (local)
0x00000018 1 192.168.145.115
This is what a node I just powered on sees:
Code:
root@hnode19:~# pvecm status
Cluster information
-------------------
Name: u-lite-v2
Config Version: 43
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Jul 27 10:02:32 2021
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x0000000a
Ring ID: a.b0e
Quorate: No
Votequorum information
----------------------
Expected votes: 24
Highest expected: 24
Total votes: 1
Quorum: 13 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x0000000a 1 192.168.145.119 (local)
Note that the Ring ID is different. The corosync.conf, renamed corosync.txt is attached. Same on all nodes.