VMs not surviving migrations

Hyacin

Member
May 6, 2020
28
5
8
43
Hello all,

I've spent several days reading everything I could find online and trying everything I could think of, and I'm finally at the end of my rope.

When I first built my PVE cluster way back when, I could live migrate here, there and everywhere with no issues. I've somewhat recently discovered that is no longer the case though, and more recently decided to finally dig in to it.

All nodes are the same PVE version. 3x NUC10i3, 1x NUC7P and 1x PN50 w/ Ryzen 5.

Thinking that perhaps there was some weird CPU mismatch thing going on, I made a custom CPU entry with only features common to the three processors in my cluster. I had one or two successful migrations immediately following that, but everything since is now frozen when it arrives at the remote node.

Previously, things would sometimes be frozen, and sometimes have "State: Internal-error" which I could never pull any more details from.

ONCE out of all this testing, when one of the VMs was set to KVM64 CPU, it got an actual crash and some stacktrace data on the console, but that was once out of well over 25 failed migrations.

I realize there is not a lot to go on here yet. I need to set up a proper test VM and test plan, to and from the various nodes - I just wanted to reach out first to get any suggestions on things to try, and more importantly, find out what kind of information I should be collecting about the failures and how, so when I come back with some real data, it has what is actually needed to get to the bottom of this.

pveversion -v is the same from all five nodes -

Code:
root@NUC10i3FNH-2:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.22-pve1
ceph-fuse: 14.2.22-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1
root@NUC10i3FNH-2:~#

Thanks!
 
Last edited:
Migration between CPU vendors (Intel <-> AMD) is not supported in general. To exclude CPU issues, I would try migrating a VM using only the base 'kvm64' CPU model.

Also, after a VM crashes, check the syslog (both on source and target PVE, journalctl -e) for any errors.
 
  • Like
Reactions: Hyacin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!