Hello all,
I've spent several days reading everything I could find online and trying everything I could think of, and I'm finally at the end of my rope.
When I first built my PVE cluster way back when, I could live migrate here, there and everywhere with no issues. I've somewhat recently discovered that is no longer the case though, and more recently decided to finally dig in to it.
All nodes are the same PVE version. 3x NUC10i3, 1x NUC7P and 1x PN50 w/ Ryzen 5.
Thinking that perhaps there was some weird CPU mismatch thing going on, I made a custom CPU entry with only features common to the three processors in my cluster. I had one or two successful migrations immediately following that, but everything since is now frozen when it arrives at the remote node.
Previously, things would sometimes be frozen, and sometimes have "State: Internal-error" which I could never pull any more details from.
ONCE out of all this testing, when one of the VMs was set to KVM64 CPU, it got an actual crash and some stacktrace data on the console, but that was once out of well over 25 failed migrations.
I realize there is not a lot to go on here yet. I need to set up a proper test VM and test plan, to and from the various nodes - I just wanted to reach out first to get any suggestions on things to try, and more importantly, find out what kind of information I should be collecting about the failures and how, so when I come back with some real data, it has what is actually needed to get to the bottom of this.
pveversion -v is the same from all five nodes -
Thanks!
I've spent several days reading everything I could find online and trying everything I could think of, and I'm finally at the end of my rope.
When I first built my PVE cluster way back when, I could live migrate here, there and everywhere with no issues. I've somewhat recently discovered that is no longer the case though, and more recently decided to finally dig in to it.
All nodes are the same PVE version. 3x NUC10i3, 1x NUC7P and 1x PN50 w/ Ryzen 5.
Thinking that perhaps there was some weird CPU mismatch thing going on, I made a custom CPU entry with only features common to the three processors in my cluster. I had one or two successful migrations immediately following that, but everything since is now frozen when it arrives at the remote node.
Previously, things would sometimes be frozen, and sometimes have "State: Internal-error" which I could never pull any more details from.
ONCE out of all this testing, when one of the VMs was set to KVM64 CPU, it got an actual crash and some stacktrace data on the console, but that was once out of well over 25 failed migrations.
I realize there is not a lot to go on here yet. I need to set up a proper test VM and test plan, to and from the various nodes - I just wanted to reach out first to get any suggestions on things to try, and more importantly, find out what kind of information I should be collecting about the failures and how, so when I come back with some real data, it has what is actually needed to get to the bottom of this.
pveversion -v is the same from all five nodes -
Code:
root@NUC10i3FNH-2:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.22-pve1
ceph-fuse: 14.2.22-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1
root@NUC10i3FNH-2:~#
Thanks!
Last edited: