Error When Live Migrating VM

Feb 10, 2025
19
0
1
I have a 3 node cluster that I have migrated some VMs between nodes within. Migrating from node 1 to node 2 and vice versa works fine. However, when migrating to node 3 from node 2 (haven't needed to migrate from node 1 so not sure if that's the same) I get the below error:

qmp command failed - VM 219 not running

The VM migrates successfully but the VM is offline when it first migrates to node 3. I have to manually start the VM. Once I have done this the VM is fine so it's not caused any problems (apart from downtime which is less than ideal given it's meant to be a live migration) but it is alarming since going between the other 2 nodes is fine. I tried migrating to node 3 twice and the same thing happened both times.

I can confirm that both nodes aren't using too much CPU, they reach maybe max 6% usage.
 
Which CPU type is configured for the VM? host has issues, when the server use different CPUs. Which CPUs do you have in the 3 nodes (AMD / Intel and generation is important)
 
  • Like
Reactions: Johannes S
please also check the system logs, post the VM config, pveversion -v from source and target node and the full migration and VM start logs!
 
Which CPU type is configured for the VM? host has issues, when the server use different CPUs. Which CPUs do you have in the 3 nodes (AMD / Intel and generation is important)
The VM is using host so I imagine this is likely the issue since the servers use different CPUs. Node 2 is AMD and node 3 is intel. Is this going to cause any issues with the VM? Is there any way I can stop this happening in future?

please also check the system logs, post the VM config, pveversion -v from source and target node and the full migration and VM start logs!
The logs I took are from journalctl, which logs specifically would you need? Also is there any easy way of me getting an output of the VM config?

pveversion from node 2:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-6-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-6
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.3-0~deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-2
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

pveversion from node 3:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve)
pve-manager: 8.3.0 (running version: 8.3.0/c1689ccb1065a83b)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-4
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
frr-pythontools: 10.3-0~deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.0
libpve-storage-perl: 8.2.9
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.2.9-1
proxmox-backup-file-restore: 3.2.9-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.1
pve-cluster: 8.0.10
pve-container: 5.2.2
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-1
pve-ha-manager: 4.0.6
pve-i18n: 3.3.1
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.0
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

The VM start and migration logs have nothing of interest other than the error posted originally.
 
you need to find the minimal compatible cpu type between both CPUs. I also had a amd / intel mix and i got it to run with x86-64-v1 or v2. The lower you go, the less feature from the CPU is available. It could also be necessary, to add some flags (like aes or so).

overview over the CPU tpyes:

https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html
Thank you for this. The thing is though, the VM does seem to run ok after starting it up manually. So is this likely to cause any issues with the VM going forward? Or is it just going to cause issues when migrating?
 
Thank you for this. The thing is though, the VM does seem to run ok after starting it up manually. So is this likely to cause any issues with the VM going forward? Or is it just going to cause issues when migrating?
its only an issue during live migration, when you migrate it offline, you could use the host cpu type.
 
note that live migration between intel and amd are not guaranteed to work, even when using non-physical CPU types:

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cpu_type
Is it just the live migration that would have issues or could we see future issues with the general operation of this VM? As I mentioned after booting the VM there are no errors in journalctl and it appears to be running as normal. All of our VMs are using 'host' as their CPU type.
 
this just affects the live state of the VM that is migrated in any fashion, - which includes live-migration as well as a hibernation or snapshot with RAM that is saved on one node and resumed/rolled back on another with different CPU type.
 
this just affects the live state of the VM that is migrated in any fashion, - which includes live-migration as well as a hibernation or snapshot with RAM that is saved on one node and resumed/rolled back on another with different CPU type.
Thank you for clarifying. So regardless of whether I do an online or offline migration, is the only way to ensure no interruption to get the CPU as close as possible to the common ground between both manufacturers? Could I then change it back to host once the VM is on the new node?

This isn't such an issue with VMs that aren't mission critical as a small amount of downtime while the VM reboots isn't as much of a problem. It's just the VMs that are sensitive to downtime that I'm nervous about.
 
the CPU type cannot be changed while the VM is running.. so you need some amount of downtime anyway, either for offline migration, or for changing the CPU type ;)
 
That is a very good point. Suppose I'll just have to give it a go, thanks for the help! I just needed to make sure my VMs hadn't been "tainted" in anyway post migration.
 
Last edited:
One more thing, given that I'm going to have downtime either way, would you recommend online or offline migration? I'm not sure it matters given that you said it shouldn't affect the VM going forward and should only affect the VM in migration.
 
the issue can also manifest itself with crashing a while after the migration, so if you know it's a problematic combination of node + CPU type, I would not recommend live migrating in the first place.
 
yes, just reboot them to get a new QEMU process
 
Ok so just to make sure I have this explicitly confirmed as I'm not sure anymore, when I migrate a VM to node 3 it migrates in a shut down state since node 3 isn't able to start it up. I then have to start it manually in the Proxmox interface. Is this enough to ensure that there aren't going to be any issues related to this down the line? Or is there still risk of there being future issues because of this migration problem?