Cluster suddenly stops existing..

AngryAdm

Member
Sep 5, 2020
145
30
18
94
I had a 3 node cluster setup, then I added a 4th node and suddenly the 3rd node was not part of the cluster and the cluster seems to not exist anymore.

I reinstall ALL 4 nodes fresh, joined them to a new cluster and now I have the same situation.

What is going on here? The "join information" is also greyed out...
1604269579216.png
 

Attachments

  • 1604269547719.png
    1604269547719.png
    40.7 KB · Views: 1
Last edited:
Reinstall from ISO, including wiping the directory /etc/pve/?
 
Can you please post
Code:
pvecm status
from the bad node and at least one other node?
 
The cluster seems to have started existing again magically while I was sleeping... Join information is no longer greyed out.

root@pve01:~# pvecm status
Cluster information
-------------------
Name: ALMA-MATER
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Nov 2 09:58:04 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.b6
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.5 (local)
0x00000002 1 10.0.0.6
0x00000003 1 10.0.0.7
0x00000004 1 10.0.0.8

---------------------------------------------------------

root@pve04:~# pvecm status
Cluster information
-------------------
Name: ALMA-MATER
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Nov 2 10:03:57 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000004
Ring ID: 1.b6
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.5
0x00000002 1 10.0.0.6
0x00000003 1 10.0.0.7
0x00000004 1 10.0.0.8 (local)
root@pve04:~#
 
Last edited:
  • Like
Reactions: Dominic
What I noticed before reinstalling yesterday was that /etc/pve/nodes did not contain information for node3 on node 1+2+4. Node3 which was the one that stopped being part of the cluster after adding pve04. These files are however present now on all nodes.
The files were present on node3 itself for all 4 nodes, but logging directly into node3 displayed that n1+2+4 were offline.
Loggin into pve01 it showed 1+2+4 online and 3 offline
 
Last edited:
  • Like
Reactions: Dominic