Broken Cluster on New Install

godber

New Member
Apr 24, 2023
3
1
3
Hi All,

I was setting up a new ProxMox cluster with three hosts, A, B, and C. I made the mistake of turning B off when I added C (I was worried I was going to pop a circuit breaker). So now that I have all three turned on, A and C show up as in the cluster and B is not joined to the cluster. It shows up in the "Datacenter" with a Red X. Can I do something to manually join B back into the cluster? Is this a case where something like the process shown in the following thread would work?

https://forum.proxmox.com/threads/how-to-totally-destroy-a-cluster-then-re-create-it.99123/

Thanks
 
I made the mistake of turning B off when I added C
Been there, done that.

The nodes which were online have a newer version of /etc/pve/corosync.conf now than the node which was offline. My expectation was that the turned-on-later node would fetch an updated version automatically. It didn't.

My solution was to manually copy that newer content to the temporarily-down node. Be careful, that file is crucial --> always make a backup-copy before tinkering with these files.

And search and learn regarding the relationship to /etc/corosync/corosync.conf first.


Good luck.
 
  • Like
Reactions: godber
Thanks for the response UdoB.

Inspecting the /etc/pve/corosync.conf and /etc/corosync/corosync.conf files I can see they are the same, the one in the /etc/pve filesystem is not writable. I imagine this means something, I'll try digging more later in the day. I'm not finding how to make that file editable ... I've read the following:

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration

But this only seems to discuss the case where things are working as they should. I've seen references to running pmxcfs -l to run that in local mode. I'll dig more into this possibility and read more later today.

The node B has the current status after I set it to expect 1

Code:
pvecm status
Cluster information
-------------------
Name:             pve-B
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May  3 15:44:08 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.10db
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 10.18.0.11 (local)

-Austin
 
I'm pretty sure this worked right. Here's what I did:

Code:
  systemctl stop pve-cluster
  systemctl stop corosync
  pmxcfs -l
  cp /tmp/corosync.conf /etc/pve/corosync.conf

Now the status looks good on node B and /etc/pve/priv/known_hosts is in sync now. B now shows up green in the UI too.
 
  • Like
Reactions: UdoB