Can not migrate VM to another server

itvietnam

Renowned Member
Aug 11, 2015
132
4
83
Hi,

We having migration problem recently, promox does not migrate VM to other server if 1 server die.

We tried manual migrate from hv103 to hv101 and got error:

Code:
()
Task viewer: VM 150 - Migrate
Output
Status
Stop
task started by HA resource agent
2017-12-18 19:13:12 # /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=hv101' root@10.10.30.151 /bin/true
2017-12-18 19:13:12 Host key verification failed.
2017-12-18 19:13:12 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

from hv103 we can ssh to hv101 without error:

Code:
root@hv103:~#
root@hv103:~# ping hv101
PING vhost-02-hv101 (10.10.30.151) 56(84) bytes of data.
64 bytes from vhost-02-hv101 (10.10.30.151): icmp_seq=1 ttl=64 time=0.119 ms
^C
--- vhost-02-hv101 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.119/0.119/0.119/0.000 ms
root@hv103:~# ssh hv101
Linux hv101 4.10.17-2-pve #1 SMP PVE 4.10.17-19 (Fri, 4 Aug 2017 13:34:37 +0200) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Dec 18 19:20:05 2017 from 10.10.30.153
root@hv101:~#

And our pveversion:
Code:
root@hv107:~#  for i in `seq 7`;do ssh hv10$i "pveversion";done
pve-manager/5.0-30/5ab26bc (running kernel: 4.10.17-2-pve)
pve-manager/5.0-30/5ab26bc (running kernel: 4.10.17-2-pve)
pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve)
pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve)
pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve)
pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve)
pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve)
root@hv107:~#

May i know how to fix?

I have search all topic and can not get over this problem for these days.
 
what happens when you try
Code:
/usr/bin/ssh -o 'HostKeyAlias=hv101' root@10.10.30.151 /bin/true
?
 
Hi dcsapak,

It return as below:

Code:
root@hv103:~# /usr/bin/ssh -o 'HostKeyAlias=hv101' root@10.10.30.151 /bin/true
Warning: the RSA host key for 'hv101' differs from the key for the IP address '[10.10.30.151]:4848'
Offending key for IP in /root/.ssh/known_hosts:2
Matching host key in /etc/ssh/ssh_known_hosts:3
Are you sure you want to continue connecting (yes/no)?

After move .ssh/known_hosts to ~ i try again and it's ok now.

May i know the root cause lead to this issue so i can avoid in future.

Thanks,
 
this can happen in a number of cases:

you regenerated the ssh keys of hv101 (either reinstall, or dpkg-reconfigure openssh-server, etc..)
you connected to a different host with the same hostname
 
Hi,

I have the same problem, and tried the steps above.

Cannot migrate:
Code:
root@vdg-pve01-par6:~# qm migrate 104 vdg-pve02-par6 --online
2018-02-08 10:36:24 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vdg-pve02-par6' root@10.1.246.2 /bin/true
2018-02-08 10:36:24 Host key verification failed.
2018-02-08 10:36:24 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
migration aborted

Tried to ssh directly: working:
Code:
root@vdg-pve01-par6:~# ssh root@10.1.246.2
Linux vdg-pve02-par6 4.13.13-5-pve #1 SMP PVE 4.13.13-38 (Fri, 26 Jan 2018 10:47:09 +0100) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 10:27:18 2018 from 10.1.246.1


Tried the command suggested by dcsapak:
Code:
root@vdg-pve02-par6:~#  /usr/bin/ssh -o 'HostKeyAlias=vdg-pve02-par6' root@10.1.246.2 /bin/true
root@vdg-pve02-par6:~#

Where should I search, now, for any bad host key (I reinstalled the 2nd node from scratch, and tried first to remove any references on /root/.ssh/ and /etc/ssh/ before) ?

Here is my pveversion -v:
Code:
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.4-pve2~bpo9