re-adding host to cluster

mgformula · Oct 21, 2024

i did a clean install of proxmox, using the same name and IP, i have gone through the documented steps removing a node from a cluster, which looked ok, when i attempted to add back the host i get an error unable to verify thumbprint. I have also gone though ssh_known_hosts file and removed the host that was reinstalled.
I have also verified the new node is not showing up in cluster status using pvecm nodes cli

dakralex · Oct 23, 2024

Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?

mgformula · Oct 23, 2024

dakralex said:
Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?

hello, i haven't found a solution yet. the output looks good, the missing re-installed host name is prox-vgpu-002
root@prox-vgpu-003:/etc/pve/nodes# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 prox-vgpu-001
2 1 prox-vgpu-003 (local)
4 1 prox-vgpu-004
5 1 prox-vgpu-005

mgformula · Oct 23, 2024

dakralex said:
Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?

yes i can ssh into host 002 from another host in the cluster.

dakralex · Oct 23, 2024

It seems like that the certificates are still setup for the old node. I'm assuming that you've moved all your VM configs to other PVE nodes already, is there still a directory for the old node (/etc/pve/nodes/prox-vgpu-002)? If so, you can delete it and also the old node's entry in /etc/pve/priv/authorized_keys and run pvecm updatecerts --force (reboot the PVE nodes if it doesn't work). Afterwards, you should be able to join the cluster with the same nodename again.

mgformula · Oct 23, 2024

dakralex said:
It seems like that the certificates are still setup for the old node. I'm assuming that you've moved all your VM configs to other PVE nodes already, is there still a directory for the old node (/etc/pve/nodes/prox-vgpu-002)? If so, you can delete it and also the old node's entry in /etc/pve/priv/authorized_keys and run pvecm updatecerts --force (reboot the PVE nodes if it doesn't work). Afterwards, you should be able to join the cluster with the same nodename again.

the cluster master didn't have anything on host 002 in /etc/pve/nodes, the auth keys has 6 entries but i cant tell which one is 002 if any.

dakralex · Oct 23, 2024

One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.

mgformula · Oct 23, 2024

dakralex said:
One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.

thanks! i was able to remove the old key from host 002, however after rebooting hosts its still unable to re-add to the cluster, same error about the fingerprint.

mgformula · Oct 25, 2024

i was able to resolve this by the following
apt install --reinstall corosync
pvecm updatecerts -F
systemctl restart pve-cluster
systemctl restart pvedaemon pveproxy

esi_y · Oct 26, 2024

dakralex said:
One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.

This "fix" attempt is completely futile - it does not really matter if there are additional keys, be it known_hosts or authorized_keys.

Also, I am sorry, but this misunderstanding again relates to BZ 5804 - the confusion is also amongst staff, yet @t.lamprecht marked it instantly as INVALID. I do not understand why content of what I am trying to communicate is ignored because the style is not appreciated.

I have encountered similar post AS THIS ONE before here: Unable to join cluster with 2nd server

The errors thrown are regarding SSL, this is clearly a NEW bug and undocumented.

My suggested "fix" is basically just bypassing it (SSH works just fine). It has nothing to do with SSH and the poor OP was chasing after red herrings here and bug is missed again.

EDIT: If it was not clear, I have nothing against @dakralex trying to help, but it's completely impossible for new staff to know these things if they are undocumented.

knedle · Nov 24, 2024

mgformula said:
i was able to resolve this by the following
apt install --reinstall corosync
pvecm updatecerts -F
systemctl restart pve-cluster
systemctl restart pvedaemon pveproxy

After messing up my cluster this is what fixed it for me.
First I did ssh to each node, then ran those commands on every node.

Search

Search

re-adding host to cluster

mgformula

New Member

dakralex

Proxmox Staff Member

mgformula

New Member

mgformula

New Member

dakralex

Proxmox Staff Member

mgformula

New Member

dakralex

Proxmox Staff Member

mgformula

New Member

mgformula

New Member

esi_y

Renowned Member

knedle

Member

We value your privacy