Adding New Node to Cluster Fails - Out of Ideas

SCrisler

Member
Sep 1, 2014
4
2
23
I have an existing 2 node cluster on 6.3 that runs wonderfully. I have attempted to add 2 additional 6.3 nodes. As soon as I add the node, the cluster shows gray question marks and you can no longer manage the cluster or access a console.

As soon as I run:
systemctl stop corosync
systemctl stop pve-cluster
on the new node, then the new node shows a red X and the original cluster can once again be managed (no gray question marks).

I have poured over the internet and this forum trying to resolve the issue. The new node can SSH to the existing cluster nodes without a password prompt. When I add the new node via the command line, all the messages lead you to believe that everything worked flawlessly.

All the node names are in /etc/hosts file so you can ping all nodes using node names. I have tried removing the node from the cluster, reinstalling proxmox and rejoining the cluster only to get the same gray question marks. I have tried removing the node without reinstalling following the guidelines and rejoining. The join seems to go OK but then gray question marks.

On the new node, I have tried running:
pvecm updatecerts -f
This eventually comes back with a timeout error and never returns to a shell prompt so you have to just close the session.

I have noticed that the \etc\pve\nodes folder seems to be wrong. It lists all 4 node names even though the cluster only has 3 nodes presently. I'm guessing corosync is running this around the cluster because I cannot remove the 4th node that is no longer part of the cluster.

When I run pvecm status, it recognizes that I have a cluster of 3 machines with only 2 machines on-line (pve-cluster is stopped on the new node).

I'm out of ideas on what I should do to successfully add these additional machines to the cluster. Any suggestions would be greatly appreciated.
 
After snooping around, I have found that the /etc/pve/.members file is different on the bad node compared to the good nodes. On the bad node the members file is missing the ip address for the node which perhaps accounts for the grayed out information. Since .members is read only, is there a way to fix this. The /etc/pve/corosync.conf file is correct on the bad node. I'm assuming the .members file gets generated behind the scenes from the corosync.conf?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!