[SOLVED] migrate - Host key verification failed.

Falk

Member
Jan 19, 2015
11
0
21
Hi Everyone,
i have a strange (to me) problem. I have a test cluster with two nodes. I copied the ssh-keys and ssh from one node to another works fine.

Migrating a VM from node2 to node1 works without errors.
But when I migrate a vm from node1 to node2 it get the following error:

Task viewer: VM 103 - Migrate

OutputStatus

Stop
task started by HA resource agent
2017-10-13 16:07:39 # /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=prox2' root@89.27.xxx.xxx /bin/true
2017-10-13 16:07:39 Host key verification failed.
2017-10-13 16:07:39 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

I hope someone knows how to fix this problem.
Thanks
Falk
 
Thanks for your reply. Hostnames are set correctly and yes I am using public IPs. But both machines are in the same network and the firewall routes the traffic local.

I also added the hostname from node2 in hosts file of node1 and node1 in hosts of node2.
 
I created a cluster using my own notes. did it a few times using different guides on test setups. The migration has never been a problem. I will check your guide and see if I missed something. But the behavior that ssh in both directions and migration in one direction works is still confusing.
 
What does 'pvecm status' show? You are migrating a VM under HA, there you need to check the different options, like fallback and that all needed resources are available on both nodes.
 
node1:
root@proxmox:~# pvecm status
Quorum information
------------------
Date: Mon Oct 16 15:19:20 2017
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/96
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 89.27.xxx.114 (local)
0x00000002 1 89.27.xxx.117


node2:
root@proxmox2:~# pvecm status
Quorum information
------------------
Date: Mon Oct 16 15:20:05 2017
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1/96
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 89.27.xxx.114
0x00000002 1 89.27.xxx.117 (local)

resources are available. I created a VM on node2, migrated it successfully to node1 but am not able to migrate it back to node2. (same when I create on node1 and try to migrate to node2)
 
First, you need to change your expected votes to one, as with two nodes, the remaining node will go into read-only mode as it has no quorum anymore. Especially if you want to use HA, you need a third node or qdisc, otherwise no majority vote can be established. https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration

Did you test the ssh connection from both sides, by name and IP?
 
Thanks Alwin, i know that HA is not made to run on two servers. I will add a third node before the system goes live.

After your question I connected via SSH to ip, name and fqdn from both directions - worked. And now the migration does work too.
I have no idea what changed now but am happy that it is working now.
Thanks for your replies.
 
I have problem too
i can not fix this problem





Task viewer: VM 100 - Migrate

OutputStatus

Stop
2017-10-20 02:34:41 starting migration of VM 100 to node 'pve2' (192.168.64.102)
2017-10-20 02:34:41 copying disk images
2017-10-20 02:34:41 starting VM 100 on remote node 'pve2'
2017-10-20 02:34:42 volume 'local:100/vm-100-disk-1.qcow2' does not exist
2017-10-20 02:34:42 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' root@192.168.64.102 qm start 100 --skiplock --migratedfrom pve1 --migration_type secure --stateuri unix --machine pc-i440fx-2.9' failed: exit code 255
2017-10-20 02:34:42 aborting phase 2 - cleanup resources
2017-10-20 02:34:42 migrate_cancel
2017-10-20 02:34:43 ERROR: migration finished with problems (duration 00:00:02)
TASK ERROR: migration problems
 
root@pve4:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
2 1 pve2
3 1 pve3
4 1 pve4 (local)
1 1 pve1


root@pve4:~# pvecm status

Quorum information
------------------
Date: Fri Oct 20 02:56:21 2017
Quorum provider: corosync_votequorum

Nodes: 4
Node ID: 0x00000004
Ring ID: 2/16
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.64.102
0x00000003 1 192.168.64.103
0x00000004 1 192.168.64.104 (local)
0x00000001 1 192.168.64.105
 
2017-10-20 02:34:42 volume 'local:100/vm-100-disk-1.qcow2' does not exist

That’s your problem. You need to move every hard disk or cd to a shared storage, or make the local one shared.
 
Not sure if anyone fixed or need to fix this issue, I found the RSA SSH public key on the server and added ssh-rsa key that was missing each host manualy that are in the cluster:

public key can be found under /etc/ssh/ssh_host_rsa_key.pub

e.g. copy ssh-rsa AAAAQD3....... from /etc/ssh/ssh_host_rsa_key.pub
Add to /etc/ssh/ssh_know_hosts

It should look like this:
servername ssh-rsa AAAAQD309k89.......
serverIp ssh-rsa AAAAQD3.......

This is how I resoved my issue. Usually thishappens when you re-install the server and reuse the same ip and hostname.