[SOLVED] Unable to join cluster with 2nd server

teddymanguy · Oct 16, 2024

I am struggling hard to figure out why my 2nd server cannot join my newly created cluster. Honestly I'm stuck.

1st Server
poxmox 192.168.50.20

2nd Server
PVE02 192.168.50.70

When I attempt to join all I get in the GUI is below on PVE02

Code:

detected the following error(s):
* corosync is already running, is this node already in a cluster?!
TASK ERROR: Check if node may join a cluster failed!

journalctl -f shows the same error

not really sure where to go from here. I'll try dumping more info below:

poxmox pvecm status

Code:

Name:             teddycluster
Config Version:   1
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Oct 15 21:44:34 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.19
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.50.20 (local)

PVE02 pvecm status shows "Is this part of a cluster? no corosync.conf"

If im missing info or something else helps, let me know what i need to provide.

esi_y · Oct 16, 2024

Do you mind doing this via CLI? Just go to your standalone (about to become cluster member) node and:

Code:

pvecm add 192.168.50.20

(If it confiuses you, do not worry, it means basically you want to join (not add) the current node to the cluster that the node named by address is is part of already.)

Share the output.

teddymanguy · Oct 16, 2024

below is output

Code:

root@PVE02:~# pvecm add 192.168.50.20
Please enter superuser (root) password for '192.168.50.20': ********
detected the following error(s):
* corosync is already running, is this node already in a cluster?!
Check if node may join a cluster failed!

esi_y · Oct 16, 2024

teddymanguy said:

below is output

Code:

root@PVE02:~# pvecm add 192.168.50.20
Please enter superuser (root) password for '192.168.50.20': ********
detected the following error(s):
* corosync is already running, is this node already in a cluster?!
Check if node may join a cluster failed!

On the same node still (PVE02):

Code:

systemctl stop corosync
rm /etc/corosync/corosync.conf

# and now again

pvecm add 192.168.50.20

teddymanguy · Oct 16, 2024

output

Code:

root@PVE02:~# systemctl stop corosync
root@PVE02:~# rm /etc/corosync/corosync.conf
rm: cannot remove '/etc/corosync/corosync.conf': No such file or directory
root@PVE02:~# pvecm add 192.168.50.20                                                                                                                   
Please enter superuser (root) password for '192.168.50.20': 
Establishing API connection with host '192.168.50.20'
The authenticity of host '192.168.50.20' can't be established.
X509 SHA256 key fingerprint is FINGERPRINT                                              
Are you sure you want to continue connecting (yes/no)? yes
500 Can't connect to 192.168.50.20:8006 (hostname verification failed)

this is very much the first time i've gotten something like this

esi_y · Oct 16, 2024

Don't worry, so how about:

Code:

pvecm add 192.168.50.20 --use_ssh 1

esi_y · Oct 16, 2024

BTW I should probably mentioned this is best done directly either on the console of the said node (sitting there) or over SSH connection of your own - not GUI. When this succeeds, it will kick you out from the GUI.

teddymanguy · Oct 16, 2024

That fixed it, if you don't mind explaining why it was failing that would be awesome. Otherwise I am good to go and thank you SO MUCH for your help.

teddymanguy · Oct 16, 2024

Also i did it through SSH so i accidentally did it right

esi_y · Oct 16, 2024

teddymanguy said:
That fixed it, if you don't mind explaining why it was failing that would be awesome. Otherwise I am good to go and thank you SO MUCH for your help.

Short version, because PVE tooling is ...ehm.

I really do not like to go on blaming users for "must have done something" because I cannot know. The old way of doing these operations was over SSH, then they came with default SSL API based, probably something in your certificates did not match (they got regenerated by now) and it did not let you connect through even to make that call. By that switch you forced it to go over SSH where you are the one who decides to accept the host identity (you should check yout fingerprings right?

) and then it just issues the commands needed, makes some symlinks into /etc/pve, populates local corosync.conf, starts up corosync (not needed when not in cluster, it is off), re-mounts that FUSE filesystem (/etc/pve actually backed by /var/lib/pve-cluster/config.db), etc. Also, regenerates certs. You can find the certificates in /etc/pve and examine with openssl, you can also see what's in them in your own browser.

Code:

openssl x509 -in -noout -text -in /etc/pve/local/pve-ssl.pem

But I just want to keep my sanity and use SSH.

Search

Search

[SOLVED] Unable to join cluster with 2nd server

teddymanguy

New Member

esi_y

Renowned Member

teddymanguy

New Member

esi_y

Renowned Member

teddymanguy

New Member

esi_y

Renowned Member

esi_y

Renowned Member

teddymanguy

New Member

teddymanguy

New Member

esi_y

Renowned Member

We value your privacy