[SOLVED] migration fails on different migration network

Stefano Giunchi

Renowned Member
Jan 17, 2016
84
12
73
50
Forlì, Italy
www.soasi.com
I have an old cluster, which grew up from PVE 4, and changing various servers.
Yesterday I completed the last transformation, and now it's a two-nodes PVE 8.1 with ZFS-replication.

I also renamed the servers, and resolved the typical issues I see many had when doing it. Anyway I say I've done it, because it could still be part of the problem.

I made an LACP bond between the two nodes, and want to use it for ZFS sync and migration.

Main network is 10.73.73.0/24
LACP network is 169.254.0.0/16

If I use main network for migration, everything works.
If I set "Migration Settings" to use the LACP, when i try to start migration I get this error:
Code:
2024-02-18 22:19:51 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-v1' root@169.254.0.15 /bin/true
2024-02-18 22:19:51 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-02-18 22:19:51 @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
2024-02-18 22:19:51 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-02-18 22:19:51 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2024-02-18 22:19:51 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2024-02-18 22:19:51 It is also possible that a host key has just been changed.
2024-02-18 22:19:51 The fingerprint for the ED25519 key sent by the remote host is
2024-02-18 22:19:51 SHA256:hjqNBo70gjdUpjawnhrNbKttfDFdUpZM4YvCIOHO7+4.
2024-02-18 22:19:51 Please contact your system administrator.
2024-02-18 22:19:51 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2024-02-18 22:19:51 Offending RSA key in /etc/ssh/ssh_known_hosts:12
2024-02-18 22:19:51   remove with:
2024-02-18 22:19:51   ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve-v1"
2024-02-18 22:19:51 Host key for pve-v1 has changed and you have requested strict checking.
2024-02-18 22:19:51 Host key verification failed.
2024-02-18 22:19:51 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

If I remove that entry from ssh_known_hosts,I get the same error but with different "offending" file

Code:
2024-02-18 22:21:27 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-v1' root@169.254.0.15 /bin/true
2024-02-18 22:21:27 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-02-18 22:21:27 @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
2024-02-18 22:21:27 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-02-18 22:21:27 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2024-02-18 22:21:27 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2024-02-18 22:21:27 It is also possible that a host key has just been changed.
2024-02-18 22:21:27 The fingerprint for the ED25519 key sent by the remote host is
2024-02-18 22:21:27 SHA256:hjqNBo70gjdUpjawnhrNbKttfDFdUpZM4YvCIOHO7+4.
2024-02-18 22:21:27 Please contact your system administrator.
2024-02-18 22:21:27 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2024-02-18 22:21:27 Offending ED25519 key in /root/.ssh/known_hosts:4
2024-02-18 22:21:27   remove with:
2024-02-18 22:21:27   ssh-keygen -f "/root/.ssh/known_hosts" -R "pve-soasi-v1"
2024-02-18 22:21:27 Host key for pve-v1 has changed and you have requested strict checking.
2024-02-18 22:21:27 Host key verification failed.
2024-02-18 22:21:27 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted
I tried to use the ssh-keygen command, but then I received this error:
Code:
Host key verification failed.
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-v1' root@10.73.73.15 pvecm mtunnel -migration_network 169.254.0.15/16 -get_migration_ip' failed: exit code 255

I tried to set the network as "secure", but I still get the ssh error.
I have two customers cluster with ZFS replication on a dedicated network, and they work without problems, but these were made this way from the start.

Thanks for any help.
 
Hi,

Did you try to use `ssh-keygen -f "/etc/ssh/ssh_known_hosts` to remove the old key for the hostname or/and IP address of the other node?

Did you try to run `pvecm updatecerts --force` command?

BTW; the /etc/ssh/ssh_known_hosts is a symlink to /etc/pve/priv/known_hosts, so I would make sure that the old keys are removed on both files.
 
Hi,

Did you try to use `ssh-keygen -f "/etc/ssh/ssh_known_hosts` to remove the old key for the hostname or/and IP address of the other node?

Did you try to run `pvecm updatecerts --force` command?

BTW; the /etc/ssh/ssh_known_hosts is a symlink to /etc/pve/priv/known_hosts, so I would make sure that the old keys are removed on both files.

These are both sadly wrong pieces of advice given the state of unresolved bugs which pile onto each other. In the OP's case, the issue was not corrupt keys, but the ssh-keygen -R on symlink makes matters worse, see linked bugs herewithin:

https://forum.proxmox.com/threads/s...ass-ssh-known_hosts-bug-s.137809/#post-640203
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!