Host key verification failed. when trying to migrate VM

DanielRouleau

Member
Nov 4, 2020
23
2
23
54
Good day
I run into a new issue and not sure how to handle it and I don't want to break something that work

I have a cluster with 7 nodes on Promox VE 8.1.10 with CEPH Storage
On my 5 firsts I can migrate VM from node to node no problem
But on the 2 last to Node I have added to the cluster I can't migrate VM to these nodes

What I did noticed is that I have install all the latest package on the last 2 nodes but on the other ones they are a bit older (a few weeks) so the do have a few package that are still left to be installed

The problem I have is if I try to migrate a VM (I have try different VM from different source node) to 1 of the 2 new nodes the migration does not start and pop immediately the error Host key verification failed.

Thank you for any idear
 
Have those 2 new nodes been rebooted yet?

You could try:

Code:
pvecm updatecerts
systemctl restart pveproxy
 
thank you for the idear, I did try the update certs, after that try to reboot node
still same issue
here is the complete error message
++
Host key verification failed.
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hote806' root@10.25.0.106 pvecm mtunnel -migration_network 10.10.10.102/24 -get_migration_ip' failed: exit code 255
++

What I do find bizarre
the VM I have try to migrate is located on node 10.25.0.105 and I'm trying to migrate 10.25.0.106

IP 10.25.0.102 is my 1st node

I have also try to migrate an other VM from Node with IP 10.25.0.103 and I have the exact same error message
++
Host key verification failed.
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hote806' root@10.25.0.106 pvecm mtunnel -migration_network 10.10.10.102/24 -get_migration_ip' failed: exit code 255
++

But if I take a VM on any other node I can migrate it to any other nodes (except my .106 and .107)

Thanks in advance for any idear
 
Here is pvecm status result
Cluster information
-------------------
Name: ncluster-2024
Config Version: 9
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Thu May 2 14:10:51 2024
Quorum provider: corosync_votequorum
Nodes: 7
Node ID: 0x00000002
Ring ID: 1.e6a
Quorate: Yes

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 7
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.10.104
0x00000002 1 10.10.10.102 (local)
0x00000003 1 10.10.10.103
0x00000004 1 10.10.10.110
0x00000005 1 10.10.10.105
0x00000006 1 10.10.10.106
0x00000007 1 10.10.10.107
 
IDK you NW config, but it could be you have a dedicated migration network of 10.10.10.102/24
See docs here.
Are the two problem nodes also connected to this NW/subnet?
What does this show on 1st Node:
Code:
cat /etc/pve/datacenter.cfg
 
migration: network=10.10.10.102/24,type=secure

And yes all node can speak on the 10.10.10.X network

I actually avec 2 Network card per host
1 on the 10.25.X.X for local vlan/internet vlan
1 on the 10.10.10.X for CEPH + backup (this is a 10G network)

And both node do have CEPH configure so they do speak on the 10.10.10.x
and from the 1st node I can ping both 10.10.10.. host
also this network does not have any gateway so it is limited to the internal traffic between the hosts
 
Last edited:
Good day so this is what I have found out
the problem seems to be between Kernel 8.2.2 and 8.1.10
When tryin to migrate form 8.1.10 to 8.2.2 under certain circonstance (not clear for me)
migration is not working but if I upgrade the node to 8.2.2 not I'M able to migrate from that node to other 8.2.2 nodes
Not sure what is causing the issue I will work to upgrade all my node to 8.2.2 hopefully I will not have the same issue as my first upgrade when network whent down because the NIC name change I had to change all my vlans configurations
 
This happened to me after I added a new proxmox node. I had to go into the new node, 'ssh root@existingnodename', answer 'Y" to adding ssh key, to all of the other nodes. I had to go to all existing proxmox nodes and 'ssh root@newnodename', answer 'Y". Then I could do migrations.

```
pvecm updatecerts
systemctl restart pveproxy
```
Didnt work for me.
 
  • Like
Reactions: DanielRouleau
wow great thanks very much I did test with 2 nodes that were not able to migrate and from each shell just did an ssh root@TheOtherNodeName say yes to the key and VOILA!!!! everything is working

thanks again for the answer this was very bizarre I event created a spreadsheet with source and destination, with 9 nodes it makes a lot of combination possible
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!