[SOLVED] Unable to join cluster with 2nd server

teddymanguy

New Member
Oct 16, 2024
5
0
1
I am struggling hard to figure out why my 2nd server cannot join my newly created cluster. Honestly I'm stuck.

1st Server
poxmox 192.168.50.20

2nd Server
PVE02 192.168.50.70

When I attempt to join all I get in the GUI is below on PVE02

Code:
detected the following error(s):
* corosync is already running, is this node already in a cluster?!
TASK ERROR: Check if node may join a cluster failed!

journalctl -f shows the same error

not really sure where to go from here. I'll try dumping more info below:

poxmox pvecm status
Code:
Name:             teddycluster
Config Version:   1
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Oct 15 21:44:34 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.19
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.50.20 (local)

PVE02 pvecm status shows "Is this part of a cluster? no corosync.conf"

If im missing info or something else helps, let me know what i need to provide.
 
Do you mind doing this via CLI? Just go to your standalone (about to become cluster member) node and:

Code:
pvecm add 192.168.50.20

(If it confiuses you, do not worry, it means basically you want to join (not add) the current node to the cluster that the node named by address is is part of already.)

Share the output.
 
Last edited:
below is output

Code:
root@PVE02:~# pvecm add 192.168.50.20
Please enter superuser (root) password for '192.168.50.20': ********
detected the following error(s):
* corosync is already running, is this node already in a cluster?!
Check if node may join a cluster failed!
 
below is output

Code:
root@PVE02:~# pvecm add 192.168.50.20
Please enter superuser (root) password for '192.168.50.20': ********
detected the following error(s):
* corosync is already running, is this node already in a cluster?!
Check if node may join a cluster failed!

On the same node still (PVE02):

Code:
systemctl stop corosync
rm /etc/corosync/corosync.conf

# and now again

pvecm add 192.168.50.20
 
output

Code:
root@PVE02:~# systemctl stop corosync
root@PVE02:~# rm /etc/corosync/corosync.conf
rm: cannot remove '/etc/corosync/corosync.conf': No such file or directory
root@PVE02:~# pvecm add 192.168.50.20                                                                                                                   
Please enter superuser (root) password for '192.168.50.20': 
Establishing API connection with host '192.168.50.20'
The authenticity of host '192.168.50.20' can't be established.
X509 SHA256 key fingerprint is FINGERPRINT                                              
Are you sure you want to continue connecting (yes/no)? yes
500 Can't connect to 192.168.50.20:8006 (hostname verification failed)



this is very much the first time i've gotten something like this
 
BTW I should probably mentioned this is best done directly either on the console of the said node (sitting there) or over SSH connection of your own - not GUI. When this succeeds, it will kick you out from the GUI.
 
That fixed it, if you don't mind explaining why it was failing that would be awesome. Otherwise I am good to go and thank you SO MUCH for your help.
 
That fixed it, if you don't mind explaining why it was failing that would be awesome. Otherwise I am good to go and thank you SO MUCH for your help.

Short version, because PVE tooling is ...ehm. ;)

I really do not like to go on blaming users for "must have done something" because I cannot know. The old way of doing these operations was over SSH, then they came with default SSL API based, probably something in your certificates did not match (they got regenerated by now) and it did not let you connect through even to make that call. By that switch you forced it to go over SSH where you are the one who decides to accept the host identity (you should check yout fingerprings right?;)) and then it just issues the commands needed, makes some symlinks into /etc/pve, populates local corosync.conf, starts up corosync (not needed when not in cluster, it is off), re-mounts that FUSE filesystem (/etc/pve actually backed by /var/lib/pve-cluster/config.db), etc. Also, regenerates certs. You can find the certificates in /etc/pve and examine with openssl, you can also see what's in them in your own browser.

Code:
openssl x509 -in -noout -text -in /etc/pve/local/pve-ssl.pem

But I just want to keep my sanity and use SSH. :D
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!