Error When Live Migrating VM

Ok so just to make sure I have this explicitly confirmed as I'm not sure anymore, when I migrate a VM to node 3 it migrates in a shut down state since node 3 isn't able to start it up. I then have to start it manually in the Proxmox interface. Is this enough to ensure that there aren't going to be any issues related to this down the line? Or is there still risk of there being future issues because of this migration problem?

if the VM crashes during the migration, then there is no state anymore so yes, the next start of the VM is fresh and should be good. but I would recommend not live migrating in that case, but shutting the VM down before migration.
 
2025-11-26 14:42:44 migration status: completed
2025-11-26 14:42:44 ERROR: tunnel replied 'ERR: resume failed - VM 1001 qmp command 'query-status' failed - client closed connection' to command 'resume 1001'
2025-11-26 14:42:44 stopping migration dbus-vmstate helpers
2025-11-26 14:42:44 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 1001 not running locally on node 'pve-04'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-11-26 14:42:45 failed to stop dbus-vmstate on pve-04: command 'pvesh create /nodes/pve-04/qemu/1001/dbus-vmstate --action stop' failed: exit code 2
2025-11-26 14:42:45 flushing conntrack state for guest on source node
2025-11-26 14:42:47 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems

I'm thinking possible bug. Disabling DRS and will open a ticket with Proxmox.


Update: Proxlb caught it too:

Nov 26 20:42:55 drs-pve1 ProxLB[800]: 2025-11-26 20:42:55,283 - ProxLB - CRITICAL - Balancing: Job ID UPID:pve-03:00002041:0000E1F9:6927663B:qmigrate:1001:proxlb@pve: (guest: vault.xxxxx.xxx) went into an error! Please check manuall>
that sounds like a different issue with similar symptoms. check the start log of the VM on the target node, and the system log of the target node - likely you will find an error about the VM crashing tehere..
 
  • Like
Reactions: fiona
@JD2002 please also share the output of pveversion -v from both source and target of the migration as well as the VM configuration qm config 1001
 
@JD2002 please also share the output of pveversion -v from both source and target of the migration as well as the VM configuration qm config 1001
root@pve-03:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.14.11-4-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1



root@pve-04:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.14.8-2-bpo12-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14.8-2-bpo12-pve-signed: 6.14.8-2~bpo12+1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1


root@pve-04:~# qm config 1001
agent: 1
bios: ovmf
boot: order=scsi0
cipassword: **********
ciuser: davis
cores: 2
cpu: host
description: * Ubuntu 24.04 Server Host%0A* Hashicorp Vault PAM Server
efidisk0: nfs-pve1-1:1001/vm-1001-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: nfs-pve1-1:1001/vm-1001-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=dhcp
machine: q35
memory: 4096
meta: creation-qemu=9.2.0,ctime=1752194602
name: vault.<redacted>.<redacted>
net0: virtio=BC:24:11:2B:F8:44,bridge=VLAN100
numa: 0
ostype: l26
scsi0: nfs-pve1-1:1001/vm-1001-disk-0.raw,discard=on,iothread=1,size=130560M,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=631c106f-5eee-4786-8456-9ca5edf7f55e
sockets: 1
tags: backup;zabbix
vga: serial0
vmgenid: 9f395ed4-e60b-4ceb-92ca-55566391acc3
 
root@pve-03:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.14.11-4-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1



root@pve-04:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.14.8-2-bpo12-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14.8-2-bpo12-pve-signed: 6.14.8-2~bpo12+1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1


root@pve-04:~# qm config 1001
agent: 1
bios: ovmf
boot: order=scsi0
cipassword: **********
ciuser: davis
cores: 2
cpu: host
description: * Ubuntu 24.04 Server Host%0A* Hashicorp Vault PAM Server
efidisk0: nfs-pve1-1:1001/vm-1001-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: nfs-pve1-1:1001/vm-1001-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=dhcp
machine: q35
memory: 4096
meta: creation-qemu=9.2.0,ctime=1752194602
name: vault.<redacted>.<redacted>
net0: virtio=BC:24:11:2B:F8:44,bridge=VLAN100
numa: 0
ostype: l26
scsi0: nfs-pve1-1:1001/vm-1001-disk-0.raw,discard=on,iothread=1,size=130560M,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=631c106f-5eee-4786-8456-9ca5edf7f55e
sockets: 1
tags: backup;zabbix
vga: serial0
vmgenid: 9f395ed4-e60b-4ceb-92ca-55566391acc3
Another data point. Until I disabled it earlier this week -- this cluster had been under DRS management since it was originally built. And while rebalancing migrations don't occur frequently (I balance only on memory and with some fairly wide hysteresis), they do happen several times a day. I also maintain close monitoring (Zabbix) on all VMs. If this had been a problem prior to 9.1.1 it would have surfaced pretty quickly.
 
check the start log of the VM on the target node, and the system log of the target node - likely you will find an error about the VM crashing tehere..

please also provide those (from a failing migration of course ;))
 
  • Like
Reactions: fiona
please also provide those (from a failing migration of course ;))
Nov 30 18:23:21 pve-03 QEMU[2592330]: kvm: Unknown savevm section or instance 'dbus-vmstate/dbus-vmstate' 0. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices

Nov 30 18:23:21 pve-03 QEMU[2592330]: kvm: load of migration failed: Invalid argument

There are no hotplugged devices. The hosts are mirror images with respect to their hardware and software (Proxmox) configuration. And, as noted before, no changes were made prior to or after the 9.1.1 upgrade. No hardware changes have been made since these systems were deployed back on Jun 10th. This also affects VMs seemingly at random without regard for OS. So far, Linux, FreeBSD, and Solaris. I haven't seen a Windows VM impacted yet, but my Microsoft VM footprint is as tiny as I can make it. :cool: It 's fairly (if somewhat disruptively) easy to reproduce. I triggered this latest failure simply by migrating the VMs off a single host.

I should also mention that it doesn't appear to matter which host is migrating to/from which.
 
Last edited:
@JD2002 please share the full excerpt from the system logs from around the time of the issue from both source node and target node of the migration.
 
Apologies for the delay. For unrelated reasons, I needed to power-cycle one of the nodes. And noticed when it came back up -- I experienced zero failures from that node in receiving migrated VMs. As I had a planned downtime today to replace a UPS, I decided to test again once the cluster came back up from the replacement. After several all-node VM migrations from node to node, I can confirm that the cluster restart appears to have cleared the migration issue. This brings an interesting question though. As a matter of practice, I don't usually restart nodes after an update unless /var/run/reboot-required is triggered, which I check, and also flag through monitoring. Given that I have a mark hold set on the kernel to remain at 6.14, the flag wasn't triggered. And given QA testing probably included the kernel update and flagged reboot, the circumstance I was operating under may not been seen in QA. As such, I should tag this as somewhat "operator error" on my part and will alter procedures to include a mandatory cluster node restart under such circumstances -- and especially for hosts under a kernel mark hold.
 
thanks for your report! with also hit this internally now, so it seems there is some edge case not yet handled properly!
 
Okay folks. I looked at the patch, said "Okay, what the h***" and applied to my own environment. YOLO. Since it seems to be a fruitful place to gather data on how often this issue fires and the circumstances due to the LB running.
 
Last edited:
Please note that an updated version of the patch got applied with qemu-server = 9.1.2 which is currently available in the pve-test repository, so it would be great if you could test with that instead! While the early version of the patch should also work, the applied one is a slightly nicer approach and also adds a bit of hardening.
 
I was trying to stay out of the test repos for what is, nominally, a production environment. Things have been stable for the last full day. And interesting side effect is that this change seems to have made ProxLB "migration-happy" as the level of host-migrations is several times normal. Nothing that is really detrimental to the environment but since LB uses metrics from the API, something seems to be skewing a bit. Perhaps the difference with the "nicer patch"?