re-adding host to cluster

mgformula

New Member
Oct 2, 2024
10
1
3
i did a clean install of proxmox, using the same name and IP, i have gone through the documented steps removing a node from a cluster, which looked ok, when i attempted to add back the host i get an error unable to verify thumbprint. I have also gone though ssh_known_hosts file and removed the host that was reinstalled.
I have also verified the new node is not showing up in cluster status using pvecm nodes cli
 
Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?
 
Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?
hello, i haven't found a solution yet. the output looks good, the missing re-installed host name is prox-vgpu-002
root@prox-vgpu-003:/etc/pve/nodes# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 prox-vgpu-001
2 1 prox-vgpu-003 (local)
4 1 prox-vgpu-004
5 1 prox-vgpu-005


1729691398687.png
 
Hello mgformula!

Have you already found a solution to your problem? If not, can you still ssh into the reinstalled node from the other nodes? What is the output of pvecm on any cluster node? Could you post the full error message that you receive when re-adding the node to the cluster?
yes i can ssh into host 002 from another host in the cluster.
 
It seems like that the certificates are still setup for the old node. I'm assuming that you've moved all your VM configs to other PVE nodes already, is there still a directory for the old node (/etc/pve/nodes/prox-vgpu-002)? If so, you can delete it and also the old node's entry in /etc/pve/priv/authorized_keys and run pvecm updatecerts --force (reboot the PVE nodes if it doesn't work). Afterwards, you should be able to join the cluster with the same nodename again.
 
It seems like that the certificates are still setup for the old node. I'm assuming that you've moved all your VM configs to other PVE nodes already, is there still a directory for the old node (/etc/pve/nodes/prox-vgpu-002)? If so, you can delete it and also the old node's entry in /etc/pve/priv/authorized_keys and run pvecm updatecerts --force (reboot the PVE nodes if it doesn't work). Afterwards, you should be able to join the cluster with the same nodename again.
the cluster master didn't have anything on host 002 in /etc/pve/nodes, the auth keys has 6 entries but i cant tell which one is 002 if any.

1729693540320.png
 
One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.
 
One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.
thanks! i was able to remove the old key from host 002, however after rebooting hosts its still unable to re-add to the cluster, same error about the fingerprint.
 
i was able to resolve this by the following
apt install --reinstall corosync
pvecm updatecerts -F
systemctl restart pve-cluster
systemctl restart pvedaemon pveproxy
 
One way to find which SSH public key is the one from the old node would be to check every node's id_rsa.pub and see which of the entries in the authorized_keys does not belong to any node.

This "fix" attempt is completely futile - it does not really matter if there are additional keys, be it known_hosts or authorized_keys.

Also, I am sorry, but this misunderstanding again relates to BZ 5804 - the confusion is also amongst staff, yet @t.lamprecht marked it instantly as INVALID. I do not understand why content of what I am trying to communicate is ignored because the style is not appreciated.

I have encountered similar post AS THIS ONE before here: Unable to join cluster with 2nd server

The errors thrown are regarding SSL, this is clearly a NEW bug and undocumented.

My suggested "fix" is basically just bypassing it (SSH works just fine). It has nothing to do with SSH and the poor OP was chasing after red herrings here and bug is missed again.

EDIT: If it was not clear, I have nothing against @dakralex trying to help, but it's completely impossible for new staff to know these things if they are undocumented.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!