I have an existing 2 node cluster on 6.3 that runs wonderfully. I have attempted to add 2 additional 6.3 nodes. As soon as I add the node, the cluster shows gray question marks and you can no longer manage the cluster or access a console.
As soon as I run:
systemctl stop corosync
systemctl stop pve-cluster
on the new node, then the new node shows a red X and the original cluster can once again be managed (no gray question marks).
I have poured over the internet and this forum trying to resolve the issue. The new node can SSH to the existing cluster nodes without a password prompt. When I add the new node via the command line, all the messages lead you to believe that everything worked flawlessly.
All the node names are in /etc/hosts file so you can ping all nodes using node names. I have tried removing the node from the cluster, reinstalling proxmox and rejoining the cluster only to get the same gray question marks. I have tried removing the node without reinstalling following the guidelines and rejoining. The join seems to go OK but then gray question marks.
On the new node, I have tried running:
pvecm updatecerts -f
This eventually comes back with a timeout error and never returns to a shell prompt so you have to just close the session.
I have noticed that the \etc\pve\nodes folder seems to be wrong. It lists all 4 node names even though the cluster only has 3 nodes presently. I'm guessing corosync is running this around the cluster because I cannot remove the 4th node that is no longer part of the cluster.
When I run pvecm status, it recognizes that I have a cluster of 3 machines with only 2 machines on-line (pve-cluster is stopped on the new node).
I'm out of ideas on what I should do to successfully add these additional machines to the cluster. Any suggestions would be greatly appreciated.
As soon as I run:
systemctl stop corosync
systemctl stop pve-cluster
on the new node, then the new node shows a red X and the original cluster can once again be managed (no gray question marks).
I have poured over the internet and this forum trying to resolve the issue. The new node can SSH to the existing cluster nodes without a password prompt. When I add the new node via the command line, all the messages lead you to believe that everything worked flawlessly.
All the node names are in /etc/hosts file so you can ping all nodes using node names. I have tried removing the node from the cluster, reinstalling proxmox and rejoining the cluster only to get the same gray question marks. I have tried removing the node without reinstalling following the guidelines and rejoining. The join seems to go OK but then gray question marks.
On the new node, I have tried running:
pvecm updatecerts -f
This eventually comes back with a timeout error and never returns to a shell prompt so you have to just close the session.
I have noticed that the \etc\pve\nodes folder seems to be wrong. It lists all 4 node names even though the cluster only has 3 nodes presently. I'm guessing corosync is running this around the cluster because I cannot remove the 4th node that is no longer part of the cluster.
When I run pvecm status, it recognizes that I have a cluster of 3 machines with only 2 machines on-line (pve-cluster is stopped on the new node).
I'm out of ideas on what I should do to successfully add these additional machines to the cluster. Any suggestions would be greatly appreciated.