Unable to start VM:s

Veikko

Member
Dec 4, 2017
17
0
21
Finland
Hi all!

I recently had a problem starting up a VM on an updated cluster (4.4 to 5.1). I did everything according to the guides, first updating and rebalancing the ceph pool and then doing a dist-upgrade one node at a time. Everything is documented very well and the process is straightforward. Unfortunately, I realized that I ended up with 2 nodes not able to run any VM.s on them. I was allocating the problem, and even re-installing one node cleanly from ISO image 2 times (kicked it out of the cluster first, and adding it fresh with new hostname and ip to the cluster, with no luck.

The fix:
It seems that because my cluster is not fully homogenous (I have Intel Xeon E5335's and 5345's) the cpu flags are not identical. The older models are vt-x enabled, but they lack vnmi flag (Intel Virtual NMI, interrupt handling). It is a legacy flag which is not used so much anymore but it's preventing QEMU working in older hardware. There was an issue raised in linux kernel threads, and a fix was suggested not to remove this flag from the kernel. Fortunately, searching pvetest repository, there was a newer kernel version introduced to proxmox-ve, and updating my 2 faulty nodes from pvetest repo (apt install proxmox-ve) fixed the kernel to support the older processor models. I updated only proxmox-ve and then switched back to pve-no-subscription, to avoid accidental use of the test repository.

Now, I ended up with another problem. My freshly installed node is not accepting migration from the other cluster members. I can create and fire up a VM created on it, but when trying to migrate, there's

Code:
()
2017-12-07 12:09:13 # /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@10.10.10.51 /bin/true
2017-12-07 12:09:13 Host key verification failed.
2017-12-07 12:09:13 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

It seems that eventhough I installed the node as new, with a new hostname and all, I have missed something with the encryption keys. The cluster is healthy

Code:
root@pve01:~# pvecm status
Quorum information
------------------
Date:             Thu Dec  7 12:16:04 2017
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          2/776
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.2.2
0x00000003          1 192.168.2.3
0x00000004          1 192.168.2.4
0x00000001          1 192.168.2.51 (local)

and I can also use Ceph on all nodes no problem.

Any help on this?
 
My freshly installed node is not accepting migration from the other cluster members. I can create and fire up a VM created on it, but when trying to migrate, there's

Code:
()
2017-12-07 12:09:13 # /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@10.10.10.51 /bin/true
2017-12-07 12:09:13 Host key verification failed.
2017-12-07 12:09:13 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

Possible that the nodes are not in the .ssh/known_hosts list . Try to make ssh connections from node to node - if you will get something like

Code:
The authenticity of host '192.168.16.15 (192.168.16.15)' can't be established.
ECDSA key fingerprint is SHA256:57jnp1WYqBPM4jqqkibuwBWzJptFmUE4K6yn1qoX4I4.
Are you sure you want to continue connecting (yes/no)?

answer "yes" and your problem is solved.
 
I'm afraid that's not the case. I am able to ssh from every node to every node, but the problem still persists.
 
I'm afraid that's not the case. I am able to ssh from every node to every node, but the problem still persists.

Let's have a look at the content of the following files (from both source and destination node of migration):
/etc/hosts
/etc/pve/.members
/etc/pve/storage.cfg
/etc/network/interfaces
 
Source host first:
Code:
root@pve2:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost

# New nodes

10.10.10.51 pve01.toastpost.com pve01
10.10.10.52 pve02.toastpost.com pve02
10.10.10.53 pve03.toastpost.com pve03
10.10.10.54 pve04.toastpost.com pve04

192.168.2.51 pve01-corosync-r0.com pve01r0
192.168.2.52 pve02-corosync-r0.com pve02r0
192.168.2.53 pve03-corosync-r0.com pve03r0
192.168.2.54 pve04-corosync-r0.com pve04r0

192.168.3.51 pve01-corosync-r1.com pve01r1
192.168.3.52 pve02-corosync-r1.com pve02r1
192.168.3.53 pve03-corosync-r1.com pve03r1
192.168.3.54 pve04-corosync-r1.com pve04r1

# Old nodes

10.10.10.42 pve2.toastpost.com pve2 pvelocalhost
10.10.10.43 pve3.toastpost.com pve3
10.10.10.44 pve4.toastpost.com pve4

192.168.2.2 pve2-corosync-r0.com pve2r0
192.168.2.3 pve3-corosync-r0.com pve3r0
192.168.2.4 pve4-corosync-r0.com pve4r0

192.168.3.2 pve2-corosync-r1.com pve2r1
192.168.3.3 pve3-corosync-r1.com pve3r1
192.168.3.4 pve4-corosync-r1.com pve4r1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Code:
root@pve2:~# cat /etc/pve/.members
{
"nodename": "pve2",
"version": 22,
"cluster": { "name": "tpcluster", "version": 16, "nodes": 4, "quorate": 1 },
"nodelist": {
  "pve01": { "id": 1, "online": 1, "ip": "10.10.10.51"},
  "pve2": { "id": 2, "online": 1, "ip": "10.10.10.42"},
  "pve3": { "id": 3, "online": 1, "ip": "10.10.10.43"},
  "pve4": { "id": 4, "online": 1, "ip": "10.10.10.44"}
  }
}
Code:
root@pve2:~# cat /etc/pve/storage.cfg
nfs: ISOS
        export /arkisto/Arkisto/!install/ISO
        path /mnt/pve/ISOS
        server 10.10.10.30
        content iso
        maxfiles 1
        options vers=3

dir: local
        path /var/lib/vz
        content rootdir,vztmpl,iso,images
        maxfiles 0

rbd: rbd
        content images,rootdir
        monhost 192.168.0.51 192.168.0.2 192.168.0.3 192.168.0.4
        nodes pve3,pve01,pve2,pve4
        pool rbd
        username admin

nfs: Super01
        export /INT/proxmox
        path /mnt/pve/Super01
        server 10.10.10.32
        content backup
        maxfiles 4
        options vers=3
Code:
root@pve2:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage part of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eth0 inet manual

auto eth2
iface eth2 inet static
        address  192.168.2.2
        netmask  255.255.255.0
#cluster ring 0

iface eth3 inet manual

auto eth1
iface eth1 inet static
        address  192.168.3.2
        netmask  255.255.255.0
#cluster ring 1

auto eth4
iface eth4 inet static
        address  192.168.0.2
        netmask  255.255.255.0
#ceph traffic

auto vmbr0
iface vmbr0 inet static
        address  10.10.10.42
        netmask  255.255.255.0
        gateway  10.10.10.254
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0

And here's the one which does not support migration to:

Code:
root@pve01:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost

# New nodes

10.10.10.51 pve01.toastpost.com pve01 pvelocalhost
10.10.10.52 pve02.toastpost.com pve02
10.10.10.53 pve03.toastpost.com pve03
10.10.10.54 pve04.toastpost.com pve04

192.168.2.51 pve01-corosync-r0.com pve01r0
192.168.2.52 pve02-corosync-r0.com pve02r0
192.168.2.53 pve03-corosync-r0.com pve03r0
192.168.2.54 pve04-corosync-r0.com pve04r0

192.168.3.51 pve01-corosync-r1.com pve01r1
192.168.3.52 pve02-corosync-r1.com pve02r1
192.168.3.53 pve03-corosync-r1.com pve03r1
192.168.3.54 pve04-corosync-r1.com pve04r1

# Old nodes

10.10.10.42 pve2.toastpost.com pve2
10.10.10.43 pve3.toastpost.com pve3
10.10.10.44 pve4.toastpost.com pve4

192.168.2.2 pve2-corosync-r0.com pve2r0
192.168.2.3 pve3-corosync-r0.com pve3r0
192.168.2.4 pve4-corosync-r0.com pve4r0

192.168.3.2 pve2-corosync-r1.com pve2r1
192.168.3.3 pve3-corosync-r1.com pve3r1
192.168.3.4 pve4-corosync-r1.com pve4r1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Code:
root@pve01:~# cat /etc/pve/.members
{
"nodename": "pve01",
"version": 6,
"cluster": { "name": "tpcluster", "version": 16, "nodes": 4, "quorate": 1 },
"nodelist": {
  "pve01": { "id": 1, "online": 1, "ip": "10.10.10.51"},
  "pve2": { "id": 2, "online": 1, "ip": "10.10.10.42"},
  "pve3": { "id": 3, "online": 1, "ip": "10.10.10.43"},
  "pve4": { "id": 4, "online": 1, "ip": "10.10.10.44"}
  }
}
Code:
root@pve01:~# cat /etc/pve/storage.cfg
nfs: ISOS
        export /arkisto/Arkisto/!install/ISO
        path /mnt/pve/ISOS
        server 10.10.10.30
        content iso
        maxfiles 1
        options vers=3

dir: local
        path /var/lib/vz
        content rootdir,vztmpl,iso,images
        maxfiles 0

rbd: rbd
        content images,rootdir
        monhost 192.168.0.51 192.168.0.2 192.168.0.3 192.168.0.4
        nodes pve3,pve01,pve2,pve4
        pool rbd
        username admin

nfs: Super01
        export /INT/proxmox
        path /mnt/pve/Super01
        server 10.10.10.32
        content backup
        maxfiles 4
        options vers=3
Code:
root@pve01:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage part of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp6s0f0 inet manual

iface enp6s0f1 inet manual

auto enp8s0f0
iface enp8s0f0 inet static
        address  192.168.2.51
        netmask  255.255.255.0
#cluster ring 0

auto enp8s0f1
iface enp8s0f1 inet static
        address  192.168.3.51
        netmask  255.255.255.0
#cluster ring 1

auto enp9s0
iface enp9s0 inet static
        address  192.168.0.51
        netmask  255.255.255.0
#ceph traffic

auto vmbr0
iface vmbr0 inet static
        address  10.10.10.51
        netmask  255.255.255.0
        gateway  10.10.10.254
        bridge_ports enp6s0f0
        bridge_stp off
        bridge_fd 0

Hope this clarifies.
 
You use different IP subnets, also for corosync (rrp activated); maybe after reinstalling one or more nodes not all keys have been updated.

However: maybe one or more of the rsa keys have been changed. Delete all /root/.ssh/know_hosts files and rebuild them by establishing ssh sessions from every node to every other node using (at least) both 10.10.10.0/24 and 192.168.2.0/24 networks.
 
Hello!
I did all of that. Still no success:
Code:
()
2017-12-13 16:37:58 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@10.10.10.51 /bin/true
2017-12-13 16:37:58 Host key verification failed.
2017-12-13 16:37:58 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

Should I have copied some private key during installation of the new node from the old ones, or is this done automatically when joining a cluster?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!