We have 8+1 nodes in our proxmox 9 cluster and proxlb to balance VM guests on these hosts. storage used is ceph. This morning a live migration has failed and the guest was found stopped with this error message :
pveversion is the same on node2 and node3 :
Balancing is done every 6 hours and it's the first time it fails with a stopped guest VM. Any idea of this issue ? (attached, the log file of node2 around the failed migration)
Regards.
Code:
stopped previously running dbus-vmstate helper for VM 120
2025-09-24 08:07:26 starting migration of VM 120 to node 'node2' (192.168.0.12)
2025-09-24 08:07:26 starting VM 120 on remote node 'node2'
2025-09-24 08:07:27 [node2] trying to acquire lock...
2025-09-24 08:07:27 [node2] OK
2025-09-24 08:07:27 start remote tunnel
2025-09-24 08:07:27 ssh tunnel ver 1
2025-09-24 08:07:27 starting online/live migration on unix:/run/qemu-server/120.migrate
2025-09-24 08:07:27 set migration capabilities
2025-09-24 08:07:27 migration downtime limit: 100 ms
2025-09-24 08:07:27 migration cachesize: 2.0 GiB
2025-09-24 08:07:27 set migration parameters
2025-09-24 08:07:27 start migrate command to unix:/run/qemu-server/120.migrate
2025-09-24 08:07:28 migration active, transferred 968.8 MiB of 16.0 GiB VM-state, 1.4 GiB/s
2025-09-24 08:07:29 migration active, transferred 2.3 GiB of 16.0 GiB VM-state, 1.3 GiB/s
2025-09-24 08:07:30 migration active, transferred 3.8 GiB of 16.0 GiB VM-state, 1.7 GiB/s
2025-09-24 08:07:31 migration active, transferred 5.5 GiB of 16.0 GiB VM-state, 1.8 GiB/s
2025-09-24 08:07:32 migration active, transferred 7.3 GiB of 16.0 GiB VM-state, 1.5 GiB/s
2025-09-24 08:07:33 average migration speed: 2.7 GiB/s - downtime 74 ms
2025-09-24 08:07:33 migration completed, transferred 8.9 GiB VM-state
2025-09-24 08:07:33 migration status: completed
2025-09-24 08:07:33 ERROR: tunnel replied 'ERR: resume failed - VM 120 qmp command 'query-status' failed - client closed connection' to command 'resume 120'
2025-09-24 08:07:33 stopping migration dbus-vmstate helpers
2025-09-24 08:07:33 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 120 not running locally on node 'node2'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-09-24 08:07:34 failed to stop dbus-vmstate on node2: command 'pvesh create /nodes/node2/qemu/120/dbus-vmstate --action stop' failed: exit code 2
2025-09-24 08:07:34 flushing conntrack state for guest on source node
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
2025-09-24 08:07:52 ERROR: migration finished with problems (duration 00:00:27)
TASK ERROR: migration problems
pveversion is the same on node2 and node3 :
Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.6 (running version: 9.0.6/49c767b70aeb6648)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14: 6.14.11-2
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.10
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.7
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.14-1
proxmox-backup-file-restore: 4.0.14-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.1
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.11
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-4
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.21
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.4-pve1
Balancing is done every 6 hours and it's the first time it fails with a stopped guest VM. Any idea of this issue ? (attached, the log file of node2 around the failed migration)
Regards.