Issues joining new servers to clusters

Mar 29, 2020
1
0
1
39
Hi All, I appreciate any feedback I get here.

I have an existing 5 node cluster. It runs Ceph, but I dont believe this is applicable to the situation, but thought to mention it.
I also have an existing 3 node cluster. No shared storage.

I've migrated everything away from the 3 node cluster and have rebuilt the 3 servers to 6.4-9. All 3 were/are in a stand alone state.
I've pushed through updated on all 5 servers in the 5 node cluster and have completed reboots on 2 of the 5 nodes to update the kernel.

Both clusters exist on the same network, can ping each other, etc.

I tried to join each of the 3 rebuilt nodes to the 5 node cluster and not one of them completed successfully. They all would up stopping the Cluster Service and then never starting it again. It would show permission denied i the background screen and then eventually in the task viewer as well.
msedge_sbH6PvW690.png

When I would hit the web login, it would error with failed login. If i went to the console, it would login with the same login that had failed in the web login.
If i went to the 5 node cluster, it would show the additional node, in the node list, but list as stand alone, and never finish the join. In this image below, PVE04-PVE08 existed in the cluster prior (still do) and PVE02 was the one being joined in.
msedge_VApCaT6xsj.png
To get everything back to normal i would have to run
Code:
pvecm delnode nodename
and then it would error out that the node does not exist, and generate a new Cluster Version and things would resume as normal.

This is the output from pvecm status on one of the standalone nodes...all 3 are similar:
1625226631225.png

As I look at this more this morning with a fresh set of eyes, I see something weird (and unexpected). When i look at the status of the existing cluster, i see the below output: pvecm status
1625226688461.png

The names in the membership information show IP addresses that I would not expect to see. They are assigned to be part of CEPH. Where i am still waiting on my new SSDs for these 3 servers, I did not add the ceph networking yet, so the Network is unavailable.

I am looking for a little guidance now I guess. I understand that my installs that I have at this point need to be redone on each server that failed to join because the Server Finger Printed into the Cluster even though it isnt there now, and then created their own cluster. I'm ok with re-installing if i have to, or if its the path of least resistance at this point. As an edit/addition...The console works, the webpage refuses to load.
My question is, even though the join information uses the 10.0.0.0/24 address as part of the join, will it fall to the appropriate link, assuming it is present. While i am OK with re-installing, i'd prefer not to have to do it a few times to figure this out.

Any help is apprecaited.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!