[SOLVED] Creating cluster, migration fails due to ssh errors

NewDude

Well-Known Member
Feb 24, 2018
72
6
48
Basically what the title says. When I try and migrate a test VM from the original node to the new one I get a status box with no content, and when I choose stop I get this:

2025-06-24 16:40:26 ERROR: migration aborted (duration 00:00:22): Can't connect to destination address using public key
TASK ERROR: migration aborted

Is it a reasonable approach to simply unjoin the new node from the cluster, and re-join? Or does it make more sense to try and resolve this manually?
 
I preformed some of the steps in this thread on a similar topic.

Basically, this:

Code:
log into pve1

cd .ssh
mv id_rsa id_rsa.old
mv id_rsa.pub id_rsa.pub.old
mv config config.old

log into pve2

cd .ssh
mv id_rsa id_rsa.old
mv id_rsa.pub id_rsa.pub.old
mv config config.old
pvecm updatecerts

back into pve1
pvecm updatecerts

Afterward if I changed the network for migrations from my 10G link to my 1G link migrations work.

When I checked updates on the original node I recognized what might have been the problem:

1750804265088.png

So those files were updated after the initial tweaks to certs. I'll retry the same process of creating ssh keys and running pvecm updatecerts on each node and see if that works.
 
Last edited:
No joy. Went through recreating the certs again, and running pvm updatecerts on both machines and it is still in a state where it will migrate using the slower network, but not the faster network.

When I try I get


> 2025-06-24 19:47:38 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve' -o 'UserKnownHostsFile=/etc/pve/nodes/pve/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10Gnet/bin/true
> 2025-06-24 19:47:38 ERROR: migration aborted (duration 00:00:03): Can't connect to destination address using public key

So it works, but not the way I'd like.

Additional information:

Both interfaces are set up redundantly, so there's a bond, and the bridge sits on top of that bridge.

And when I configured the cluster, instead of just indicating the management bond as the cluster management interface, I went ahead and threw the 10G interface in there as a backup as well.

Maybe that additional bit of complexity screwed something up.
 
Last edited:
Resolved this - the MTU setting of 9000 on the 10G link was causing ssh to time out.