[SOLVED] After upgrading Nodes in a cluster, i got shh / migration problems

Chris&Patte

Renowned Member
Sep 3, 2013
55
1
73
Hello,
i have a cluster with 3 Nodes, now each node on the last PVE version.
One was newly installed and integrated,
one was updated from PVE 7.x,
one had been reinstalled from a old version with the same Name and IP.

Now, when doing a migration, it mostly brakes with a ssh error. Only to the newly installed Node it is working from the other Nodes, but not the other way or from any of the other two Node.


The migration ssh error is:
Bash:
2024-09-14 10:33:59 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node1' -o 'UserKnownHostsFile=/etc/pve/nodes/node1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.66.5.16 /bin/true
2024-09-14 10:33:59 ssh: connect to host 10.66.5.16 port 22: Connection timed out
2024-09-14 10:33:59 ERROR: migration aborted (duration 00:02:15): Can't connect to destination address using public key
TASK ERROR: migration aborted
 
Did i need to restart a service or something lese?
I think i have tried that already and done it now again, but that does not solve the problem.
 
I have checked all that key-files in /etc/priv/ and found that the ssh-rsa keys in /etc/pve/priv/authorized_keys have NOT the same length.
Two of the three keys in that file have longer lengths but the last one, from the oldest system which was reinstalled under the same name, has a much shorter key.

Also, all keys in /etc/pve/priv/known_hosts seem to be the same?
 
Last edited:
OK, it seems in the file /etc/pve/priv/known_hosts there had been a wrong IP (seems the newly installed node also changed IP).
After changing the IP in /etc/pve/priv/known_hosts and adding all nodes IP to the /etc/hosts file, i can now log on to and from every Node vioa ssh-cmdline without password and problems.

But now I#m absolutly stuck. I think something is messed up with the ssh-keys as i still got the error when doing a migration.
 
OK, seems io got it.
If someone falls into the same problem with old node data, what i have done is basicly.
- enabled Firewall rules for ssh on ALL networks that are attached to the Node (i use tweo, one for migration data, one for migration info & management )
- check your IP and names in all ssh-config files
- add your Nodes into the hosts file on every node. This is important if you use two or more networks and name resolution via a (Windows) DNS. As the DNS could give you a IP from a wrong network.
- connect from every Node to every Node via every IP on every Node and delete all entries from known_hosts that the ssh client complains about.

Seems that does the trick, even i have only a cloudy idea what exactly was wrong, it's clear that wrong IPs and changed ssh-keys have been the problem.
 
  • Like
Reactions: gfngfn256
Did i need to restart a service or something lese?
Not AFAIK.

Anyway I was going to tell you to clear all the keys & start again. But you got it working anyway (similar idea).

Maybe tag-mark the thread with [SOLVED] (upper right-corner under thread edit).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!