Hi all!
I recently had a problem starting up a VM on an updated cluster (4.4 to 5.1). I did everything according to the guides, first updating and rebalancing the ceph pool and then doing a dist-upgrade one node at a time. Everything is documented very well and the process is straightforward. Unfortunately, I realized that I ended up with 2 nodes not able to run any VM.s on them. I was allocating the problem, and even re-installing one node cleanly from ISO image 2 times (kicked it out of the cluster first, and adding it fresh with new hostname and ip to the cluster, with no luck.
The fix:
It seems that because my cluster is not fully homogenous (I have Intel Xeon E5335's and 5345's) the cpu flags are not identical. The older models are vt-x enabled, but they lack vnmi flag (Intel Virtual NMI, interrupt handling). It is a legacy flag which is not used so much anymore but it's preventing QEMU working in older hardware. There was an issue raised in linux kernel threads, and a fix was suggested not to remove this flag from the kernel. Fortunately, searching pvetest repository, there was a newer kernel version introduced to proxmox-ve, and updating my 2 faulty nodes from pvetest repo (apt install proxmox-ve) fixed the kernel to support the older processor models. I updated only proxmox-ve and then switched back to pve-no-subscription, to avoid accidental use of the test repository.
Now, I ended up with another problem. My freshly installed node is not accepting migration from the other cluster members. I can create and fire up a VM created on it, but when trying to migrate, there's
It seems that eventhough I installed the node as new, with a new hostname and all, I have missed something with the encryption keys. The cluster is healthy
and I can also use Ceph on all nodes no problem.
Any help on this?
I recently had a problem starting up a VM on an updated cluster (4.4 to 5.1). I did everything according to the guides, first updating and rebalancing the ceph pool and then doing a dist-upgrade one node at a time. Everything is documented very well and the process is straightforward. Unfortunately, I realized that I ended up with 2 nodes not able to run any VM.s on them. I was allocating the problem, and even re-installing one node cleanly from ISO image 2 times (kicked it out of the cluster first, and adding it fresh with new hostname and ip to the cluster, with no luck.
The fix:
It seems that because my cluster is not fully homogenous (I have Intel Xeon E5335's and 5345's) the cpu flags are not identical. The older models are vt-x enabled, but they lack vnmi flag (Intel Virtual NMI, interrupt handling). It is a legacy flag which is not used so much anymore but it's preventing QEMU working in older hardware. There was an issue raised in linux kernel threads, and a fix was suggested not to remove this flag from the kernel. Fortunately, searching pvetest repository, there was a newer kernel version introduced to proxmox-ve, and updating my 2 faulty nodes from pvetest repo (apt install proxmox-ve) fixed the kernel to support the older processor models. I updated only proxmox-ve and then switched back to pve-no-subscription, to avoid accidental use of the test repository.
Now, I ended up with another problem. My freshly installed node is not accepting migration from the other cluster members. I can create and fire up a VM created on it, but when trying to migrate, there's
Code:
()
2017-12-07 12:09:13 # /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@10.10.10.51 /bin/true
2017-12-07 12:09:13 Host key verification failed.
2017-12-07 12:09:13 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted
It seems that eventhough I installed the node as new, with a new hostname and all, I have missed something with the encryption keys. The cluster is healthy
Code:
root@pve01:~# pvecm status
Quorum information
------------------
Date: Thu Dec 7 12:16:04 2017
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 2/776
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.2.2
0x00000003 1 192.168.2.3
0x00000004 1 192.168.2.4
0x00000001 1 192.168.2.51 (local)
and I can also use Ceph on all nodes no problem.
Any help on this?