live migration issue on one VM

Jan 9, 2025
4
2
3
Poitiers - France
We have 8+1 nodes in our proxmox 9 cluster and proxlb to balance VM guests on these hosts. storage used is ceph. This morning a live migration has failed and the guest was found stopped with this error message :

Code:
stopped previously running dbus-vmstate helper for VM 120
2025-09-24 08:07:26 starting migration of VM 120 to node 'node2' (192.168.0.12)
2025-09-24 08:07:26 starting VM 120 on remote node 'node2'
2025-09-24 08:07:27 [node2] trying to acquire lock...
2025-09-24 08:07:27 [node2]  OK
2025-09-24 08:07:27 start remote tunnel
2025-09-24 08:07:27 ssh tunnel ver 1
2025-09-24 08:07:27 starting online/live migration on unix:/run/qemu-server/120.migrate
2025-09-24 08:07:27 set migration capabilities
2025-09-24 08:07:27 migration downtime limit: 100 ms
2025-09-24 08:07:27 migration cachesize: 2.0 GiB
2025-09-24 08:07:27 set migration parameters
2025-09-24 08:07:27 start migrate command to unix:/run/qemu-server/120.migrate
2025-09-24 08:07:28 migration active, transferred 968.8 MiB of 16.0 GiB VM-state, 1.4 GiB/s
2025-09-24 08:07:29 migration active, transferred 2.3 GiB of 16.0 GiB VM-state, 1.3 GiB/s
2025-09-24 08:07:30 migration active, transferred 3.8 GiB of 16.0 GiB VM-state, 1.7 GiB/s
2025-09-24 08:07:31 migration active, transferred 5.5 GiB of 16.0 GiB VM-state, 1.8 GiB/s
2025-09-24 08:07:32 migration active, transferred 7.3 GiB of 16.0 GiB VM-state, 1.5 GiB/s
2025-09-24 08:07:33 average migration speed: 2.7 GiB/s - downtime 74 ms
2025-09-24 08:07:33 migration completed, transferred 8.9 GiB VM-state
2025-09-24 08:07:33 migration status: completed
2025-09-24 08:07:33 ERROR: tunnel replied 'ERR: resume failed - VM 120 qmp command 'query-status' failed - client closed connection' to command 'resume 120'
2025-09-24 08:07:33 stopping migration dbus-vmstate helpers
2025-09-24 08:07:33 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 120 not running locally on node 'node2'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-09-24 08:07:34 failed to stop dbus-vmstate on node2: command 'pvesh create /nodes/node2/qemu/120/dbus-vmstate --action stop' failed: exit code 2
2025-09-24 08:07:34 flushing conntrack state for guest on source node
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
2025-09-24 08:07:52 ERROR: migration finished with problems (duration 00:00:27)
TASK ERROR: migration problems

pveversion is the same on node2 and node3 :

Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.6 (running version: 9.0.6/49c767b70aeb6648)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14: 6.14.11-2
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.10
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.7
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.14-1
proxmox-backup-file-restore: 4.0.14-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.1
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.11
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-4
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.21
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.4-pve1

Balancing is done every 6 hours and it's the first time it fails with a stopped guest VM. Any idea of this issue ? (attached, the log file of node2 around the failed migration)

Regards.
 

Attachments

I've Forgotten the VM config :

Code:
affinity: 16,17,40,41
bios: seabios
boot: order=ide0;scsi0;scsi1;scsi2;scsi3;scsi4
cores: 1
cpu: x86-64-v2-AES
ide0: none,media=cdrom
memory: 16384
meta: creation-qemu=10.0.2,ctime=1758629387
name: cortexte2
net0: virtio=00:50:56:bd:00:0d,bridge=dc
numa: 0
ostype: l26
scsi0: prod_nvme:vm-120-disk-0,discard=on,iothread=1,size=15G
scsi1: prod_nvme:vm-120-disk-1,discard=on,iothread=1,size=100G
scsi2: prod_nvme:vm-120-disk-2,discard=on,iothread=1,size=100G
scsi3: prod_nvme:vm-120-disk-3,discard=on,iothread=1,size=100G
scsi4: prod_nvme:vm-120-disk-4,discard=on,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=423d7766-3000-2211-72b4-b54b4e4ef17c
sockets: 4
tags: plb_pin_node1;plb_pin_node2;plb_pin_node3;plb_pin_node4;prod
vmgenid: 7e1c76e3-9141-486d-88aa-a9fe70189f8d

And of course, our 8 nodes have the same hardware.
 
I am seeing this as well.
All hosts are upgraded to PVE 9, and sometimes when a VM migrates, I get:

2025-09-24 16:08:14 ERROR: tunnel replied 'ERR: resume failed - VM 102 qmp command 'query-status' failed - client closed connection' to command 'resume 102'
2025-09-24 16:08:18 ERROR: migration finished with problems (duration 00:00:13)

I am able to reproduce it by disabling the Conntrack state button in the migration panel.

I am guessing what is going on here is the script we are using (Proxlb) is not using this new Conntrack feature correctly.
 
Hi,
@pmarasse @ManFriday please share the system journal from the target node around the time of the migration. Please share the configuration of the bridge used by the VM's virtual NIC(s).

@ManFriday please share the full log from a failed migration as well as an affected VM configuration.
 
Bridge config:
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
# VM Networks
Bond Config:
auto bond1
iface bond1 inet manual
bond-slaves ens2081f0np0 ens2086f0np0
bond-miimon 100
bond-mode active-backup
bond-primary ens2081f0np0
# VM Networks Bond
For each VLAn we use an SDN
Example:
auto vlan110
iface vlan110
bridge_ports vmbr1.110
bridge_stp off
bridge_fd 0
alias VLAN-110
VM Config:
agent: 1
balloon: 6144
bios: seabios
boot: order=sata0;scsi0
cores: 2
cpu: Skylake-Server-noTSX-IBRS
memory: 8192
meta: creation-qemu=9.0.2,ctime=1743094877
name: vmname
net0: virtio=00:50:56:0B:69:B8,bridge=vlan70
net1: virtio=00:50:56:03:A9:3B,bridge=vlan107
ostype: l26
sata0: none,media=cdrom
scsi0: DS9:vm-553-disk-0,iothread=1,size=50G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=42114333-e583-5dfc-671f-e2ebcdd68957
sockets: 1
tags: bsg;redhat;webserver
vmgenid: bd5a837c-0fc5-4405-a13d-b301919f242a


oot@SOURCE-NODE:/var/log/pve/tasks/4# cat 'UPID:SOURCE-NODE:0016528B:0007F636:68D41974:qmigrate:553:<token>:’
2025-09-24 11:16:53 use dedicated network address for sending migration traffic (172.17.200.51)
2025-09-24 11:16:53 starting migration of VM 553 to node 'DEST-NODE' (172.17.200.51)
2025-09-24 11:16:53 starting VM 553 on remote node 'DEST-NODE'
2025-09-24 11:16:58 start remote tunnel
2025-09-24 11:16:59 ssh tunnel ver 1
2025-09-24 11:16:59 starting online/live migration on unix:/run/qemu-server/553.migrate
2025-09-24 11:16:59 set migration capabilities
2025-09-24 11:16:59 migration downtime limit: 100 ms
2025-09-24 11:16:59 migration cachesize: 1.0 GiB
2025-09-24 11:16:59 set migration parameters
2025-09-24 11:16:59 start migrate command to unix:/run/qemu-server/553.migrate
2025-09-24 11:17:00 migration active, transferred 515.6 MiB of 8.0 GiB VM-state, 524.3 MiB/s
2025-09-24 11:17:01 migration active, transferred 1.0 GiB of 8.0 GiB VM-state, 534.1 MiB/s
2025-09-24 11:17:02 migration active, transferred 1.4 GiB of 8.0 GiB VM-state, 335.0 MiB/s
2025-09-24 11:17:03 migration active, transferred 1.7 GiB of 8.0 GiB VM-state, 264.1 MiB/s
2025-09-24 11:17:04 migration active, transferred 1.9 GiB of 8.0 GiB VM-state, 338.2 MiB/s
2025-09-24 11:17:05 migration active, transferred 2.2 GiB of 8.0 GiB VM-state, 264.4 MiB/s
2025-09-24 11:17:06 migration active, transferred 2.5 GiB of 8.0 GiB VM-state, 278.9 MiB/s
2025-09-24 11:17:07 migration active, transferred 2.7 GiB of 8.0 GiB VM-state, 740.5 MiB/s
2025-09-24 11:17:08 migration active, transferred 3.0 GiB of 8.0 GiB VM-state, 553.0 MiB/s
2025-09-24 11:17:09 migration active, transferred 3.3 GiB of 8.0 GiB VM-state, 526.0 MiB/s
2025-09-24 11:17:10 migration active, transferred 3.6 GiB of 8.0 GiB VM-state, 341.4 MiB/s
2025-09-24 11:17:11 migration active, transferred 3.9 GiB of 8.0 GiB VM-state, 383.6 MiB/s
2025-09-24 11:17:12 migration active, transferred 4.3 GiB of 8.0 GiB VM-state, 373.9 MiB/s
2025-09-24 11:17:13 migration active, transferred 4.7 GiB of 8.0 GiB VM-state, 563.2 MiB/s
2025-09-24 11:17:14 migration active, transferred 5.2 GiB of 8.0 GiB VM-state, 548.6 MiB/s
2025-09-24 11:17:15 migration active, transferred 5.7 GiB of 8.0 GiB VM-state, 516.9 MiB/s
2025-09-24 11:17:16 migration active, transferred 6.3 GiB of 8.0 GiB VM-state, 514.5 MiB/s
2025-09-24 11:17:17 migration active, transferred 6.8 GiB of 8.0 GiB VM-state, 536.5 MiB/s
2025-09-24 11:17:18 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 144.5 MiB/s
2025-09-24 11:17:18 xbzrle: send updates to 368 pages in 76.2 KiB encoded memory
2025-09-24 11:17:18 average migration speed: 432.1 MiB/s - downtime 60 ms
2025-09-24 11:17:18 migration completed, transferred 7.3 GiB VM-state
2025-09-24 11:17:18 migration status: completed
2025-09-24 11:17:19 ERROR: tunnel replied 'ERR: resume failed - VM 553 qmp command 'query-status' failed - client closed connection' to command 'resume 553'
2025-09-24 11:17:19 stopping migration dbus-vmstate helpers
2025-09-24 11:17:19 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 553 not running locally on node 'DEST-NODE'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-09-24 11:17:21 failed to stop dbus-vmstate on DEST-NODE: command 'pvesh create /nodes/DEST-NODE/qemu/553/dbus-vmstate --action stop' failed: exit code 2
2025-09-24 11:17:21 flushing conntrack state for guest on source node
2025-09-24 11:17:25 ERROR: migration finished with problems (duration 00:00:32)
TASK ERROR: migration problems

Logs at DEST HOST:
Sep 24 11:16:01 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:16:02 DEST-HOST pvedaemon[550553]: <user@domain> update VM 381: -net0 virtio=XX:XX:XX:XX:XX:XX,bridge=VLAN107
Sep 24 11:16:06 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:16:06 DEST-HOST pvestatd[33327]: VM 660 qmp command failed - VM 660 not running
Sep 24 11:16:26 DEST-HOST pvedaemon[550553]: <root@pam> successful auth for user 'root@pam'
Sep 24 11:16:30 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:16:35 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:16:40 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
Sep 24 11:16:55 DEST-HOST qm[1243414]: <root@pam> starting task UPID:DEST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam:
Sep 24 11:16:55 DEST-HOST qm[1243421]: start VM 553: UPID:DEST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam:
Sep 24 11:16:56 DEST-HOST systemd[1]: Started 553.scope.
Sep 24 11:16:57 DEST-HOST kernel: tap553i0: entered promiscuous mode
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered blocking state
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered disabled state
Sep 24 11:16:57 DEST-HOST kernel: tap553i0: entered allmulticast mode
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered blocking state
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered forwarding state
Sep 24 11:16:58 DEST-HOST kernel: tap553i1: entered promiscuous mode
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered blocking state
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered disabled state
Sep 24 11:16:58 DEST-HOST kernel: tap553i1: entered allmulticast mode
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered blocking state
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered forwarding state
Sep 24 11:16:58 DEST-HOST qm[1243421]: VM 553 started with PID 1243503.
Sep 24 11:16:58 DEST-HOST systemd[1]: Started pve-dbus-vmstate@553.service - PVE DBus VMState Helper (VM 553).
Sep 24 11:16:58 DEST-HOST qm[1243414]: <root@pam> end task UPID:DEST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam: OK
Sep 24 11:16:58 DEST-HOST dbus-vmstate[1243684]: pve-vmstate-553 listening on :1.74331
Sep 24 11:17:06 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 51 retries
Sep 24 11:17:11 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:17:15 DEST-HOST systemd[1]: pve-dbus-vmstate@553.service: Deactivated successfully.
Sep 24 11:17:17 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:17:18 DEST-HOST QEMU[1243503]: kvm: Unknown savevm section or instance 'dbus-vmstate/dbus-vmstate' 0. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices
Sep 24 11:17:18 DEST-HOST QEMU[1243503]: kvm: load of migration failed: Invalid argument
Sep 24 11:17:19 DEST-HOST kernel: tap553i1: left allmulticast mode
Sep 24 11:17:19 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered disabled state
Sep 24 11:17:19 DEST-HOST kernel: tap553i0: left allmulticast mode
Sep 24 11:17:19 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered disabled state
Sep 24 11:17:19 DEST-HOST qm[1243729]: VM 553 qmp command failed - VM 553 qmp command 'query-status' failed - client closed connection
Sep 24 11:17:19 DEST-HOST systemd[1]: 553.scope: Deactivated successfully.
Sep 24 11:17:19 DEST-HOST systemd[1]: 553.scope: Consumed 10.466s CPU time, 7.6G memory peak.
Sep 24 11:17:20 DEST-HOST qmeventd[1245294]: Starting cleanup for 553
Sep 24 11:17:21 DEST-HOST qmeventd[1245294]: Finished cleanup for 553
Sep 24 11:17:43 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 51 retries
Sep 24 11:17:48 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:17:51 DEST-HOST sshd-session[1247553]: Received disconnect from <IP-REDACTED> port 42042:11: disconnected by user
Sep 24 11:17:51 DEST-HOST sshd-session[1247553]: Disconnected from user root <IP-REDACTED> port 42042
Sep 24 11:17:53 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:17:56 DEST-HOST pvedaemon[1086964]: <user@domain> update VM 381: -net1 virtio=XX:XX:XX:XX:XX:XX,bridge=VLAN70,link_down=1
Sep 24 11:18:19 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:18:24 DEST-HOST pvestatd[33327]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - got timeout
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2783.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2784.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2785.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2786.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2787.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2788.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2783.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2784.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2785.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2786.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2787.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2788.
Sep 24 11:18:29 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries

I tried to capture relevant info at the dest host. Grabbing everything from that time frame would have been a lot for a forum post.