live migration issue on one VM

pmarasse · Wednesday at 10:23

We have 8+1 nodes in our proxmox 9 cluster and proxlb to balance VM guests on these hosts. storage used is ceph. This morning a live migration has failed and the guest was found stopped with this error message :

Code:

stopped previously running dbus-vmstate helper for VM 120
2025-09-24 08:07:26 starting migration of VM 120 to node 'node2' (192.168.0.12)
2025-09-24 08:07:26 starting VM 120 on remote node 'node2'
2025-09-24 08:07:27 [node2] trying to acquire lock...
2025-09-24 08:07:27 [node2]  OK
2025-09-24 08:07:27 start remote tunnel
2025-09-24 08:07:27 ssh tunnel ver 1
2025-09-24 08:07:27 starting online/live migration on unix:/run/qemu-server/120.migrate
2025-09-24 08:07:27 set migration capabilities
2025-09-24 08:07:27 migration downtime limit: 100 ms
2025-09-24 08:07:27 migration cachesize: 2.0 GiB
2025-09-24 08:07:27 set migration parameters
2025-09-24 08:07:27 start migrate command to unix:/run/qemu-server/120.migrate
2025-09-24 08:07:28 migration active, transferred 968.8 MiB of 16.0 GiB VM-state, 1.4 GiB/s
2025-09-24 08:07:29 migration active, transferred 2.3 GiB of 16.0 GiB VM-state, 1.3 GiB/s
2025-09-24 08:07:30 migration active, transferred 3.8 GiB of 16.0 GiB VM-state, 1.7 GiB/s
2025-09-24 08:07:31 migration active, transferred 5.5 GiB of 16.0 GiB VM-state, 1.8 GiB/s
2025-09-24 08:07:32 migration active, transferred 7.3 GiB of 16.0 GiB VM-state, 1.5 GiB/s
2025-09-24 08:07:33 average migration speed: 2.7 GiB/s - downtime 74 ms
2025-09-24 08:07:33 migration completed, transferred 8.9 GiB VM-state
2025-09-24 08:07:33 migration status: completed
2025-09-24 08:07:33 ERROR: tunnel replied 'ERR: resume failed - VM 120 qmp command 'query-status' failed - client closed connection' to command 'resume 120'
2025-09-24 08:07:33 stopping migration dbus-vmstate helpers
2025-09-24 08:07:33 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 120 not running locally on node 'node2'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-09-24 08:07:34 failed to stop dbus-vmstate on node2: command 'pvesh create /nodes/node2/qemu/120/dbus-vmstate --action stop' failed: exit code 2
2025-09-24 08:07:34 flushing conntrack state for guest on source node
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
2025-09-24 08:07:52 ERROR: migration finished with problems (duration 00:00:27)
TASK ERROR: migration problems

pveversion is the same on node2 and node3 :

Code:

proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.6 (running version: 9.0.6/49c767b70aeb6648)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14: 6.14.11-2
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.10
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.7
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.14-1
proxmox-backup-file-restore: 4.0.14-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.1
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.11
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-4
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.21
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.4-pve1

Balancing is done every 6 hours and it's the first time it fails with a stopped guest VM. Any idea of this issue ? (attached, the log file of node2 around the failed migration)

Regards.

pmarasse · Wednesday at 10:27

I've Forgotten the VM config :

Code:

affinity: 16,17,40,41
bios: seabios
boot: order=ide0;scsi0;scsi1;scsi2;scsi3;scsi4
cores: 1
cpu: x86-64-v2-AES
ide0: none,media=cdrom
memory: 16384
meta: creation-qemu=10.0.2,ctime=1758629387
name: cortexte2
net0: virtio=00:50:56:bd:00:0d,bridge=dc
numa: 0
ostype: l26
scsi0: prod_nvme:vm-120-disk-0,discard=on,iothread=1,size=15G
scsi1: prod_nvme:vm-120-disk-1,discard=on,iothread=1,size=100G
scsi2: prod_nvme:vm-120-disk-2,discard=on,iothread=1,size=100G
scsi3: prod_nvme:vm-120-disk-3,discard=on,iothread=1,size=100G
scsi4: prod_nvme:vm-120-disk-4,discard=on,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=423d7766-3000-2211-72b4-b54b4e4ef17c
sockets: 4
tags: plb_pin_node1;plb_pin_node2;plb_pin_node3;plb_pin_node4;prod
vmgenid: 7e1c76e3-9141-486d-88aa-a9fe70189f8d

And of course, our 8 nodes have the same hardware.

ManFriday · Wednesday at 23:24

I am seeing this as well.
All hosts are upgraded to PVE 9, and sometimes when a VM migrates, I get:

2025-09-24 16:08:14 ERROR: tunnel replied 'ERR: resume failed - VM 102 qmp command 'query-status' failed - client closed connection' to command 'resume 102'
2025-09-24 16:08:18 ERROR: migration finished with problems (duration 00:00:13)

I am able to reproduce it by disabling the Conntrack state button in the migration panel.

I am guessing what is going on here is the script we are using (Proxlb) is not using this new Conntrack feature correctly.

fiona · 2025-09-25T11:46:51+0200

Hi,
@pmarasse @ManFriday please share the system journal from the target node around the time of the migration. Please share the configuration of the bridge used by the VM's virtual NIC(s).

@ManFriday please share the full log from a failed migration as well as an affected VM configuration.

ManFriday · 2025-09-25T17:24:44+0200

Bridge config:

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
# VM Networks

Bond Config:

auto bond1
iface bond1 inet manual
bond-slaves ens2081f0np0 ens2086f0np0
bond-miimon 100
bond-mode active-backup
bond-primary ens2081f0np0
# VM Networks Bond

For each VLAn we use an SDN
Example:

auto vlan110
iface vlan110
bridge_ports vmbr1.110
bridge_stp off
bridge_fd 0
alias VLAN-110

VM Config:

agent: 1
balloon: 6144
bios: seabios
boot: order=sata0;scsi0
cores: 2
cpu: Skylake-Server-noTSX-IBRS
memory: 8192
meta: creation-qemu=9.0.2,ctime=1743094877
name: vmname
net0: virtio=00:50:56:0B:69:B8,bridge=vlan70
net1: virtio=00:50:56:03:A9:3B,bridge=vlan107
ostype: l26
sata0: none,media=cdrom
scsi0: DS9:vm-553-disk-0,iothread=1,size=50G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=42114333-e583-5dfc-671f-e2ebcdd68957
sockets: 1
tags: bsg;redhat;webserver
vmgenid: bd5a837c-0fc5-4405-a13d-b301919f242a

oot@SOURCE-NODE:/var/log/pve/tasks/4# cat 'UPID:SOURCE-NODE:0016528B:0007F636:68D41974:qmigrate:553:<token>:’
2025-09-24 11:16:53 use dedicated network address for sending migration traffic (172.17.200.51)
2025-09-24 11:16:53 starting migration of VM 553 to node 'DEST-NODE' (172.17.200.51)
2025-09-24 11:16:53 starting VM 553 on remote node 'DEST-NODE'
2025-09-24 11:16:58 start remote tunnel
2025-09-24 11:16:59 ssh tunnel ver 1
2025-09-24 11:16:59 starting online/live migration on unix:/run/qemu-server/553.migrate
2025-09-24 11:16:59 set migration capabilities
2025-09-24 11:16:59 migration downtime limit: 100 ms
2025-09-24 11:16:59 migration cachesize: 1.0 GiB
2025-09-24 11:16:59 set migration parameters
2025-09-24 11:16:59 start migrate command to unix:/run/qemu-server/553.migrate
2025-09-24 11:17:00 migration active, transferred 515.6 MiB of 8.0 GiB VM-state, 524.3 MiB/s
2025-09-24 11:17:01 migration active, transferred 1.0 GiB of 8.0 GiB VM-state, 534.1 MiB/s
2025-09-24 11:17:02 migration active, transferred 1.4 GiB of 8.0 GiB VM-state, 335.0 MiB/s
2025-09-24 11:17:03 migration active, transferred 1.7 GiB of 8.0 GiB VM-state, 264.1 MiB/s
2025-09-24 11:17:04 migration active, transferred 1.9 GiB of 8.0 GiB VM-state, 338.2 MiB/s
2025-09-24 11:17:05 migration active, transferred 2.2 GiB of 8.0 GiB VM-state, 264.4 MiB/s
2025-09-24 11:17:06 migration active, transferred 2.5 GiB of 8.0 GiB VM-state, 278.9 MiB/s
2025-09-24 11:17:07 migration active, transferred 2.7 GiB of 8.0 GiB VM-state, 740.5 MiB/s
2025-09-24 11:17:08 migration active, transferred 3.0 GiB of 8.0 GiB VM-state, 553.0 MiB/s
2025-09-24 11:17:09 migration active, transferred 3.3 GiB of 8.0 GiB VM-state, 526.0 MiB/s
2025-09-24 11:17:10 migration active, transferred 3.6 GiB of 8.0 GiB VM-state, 341.4 MiB/s
2025-09-24 11:17:11 migration active, transferred 3.9 GiB of 8.0 GiB VM-state, 383.6 MiB/s
2025-09-24 11:17:12 migration active, transferred 4.3 GiB of 8.0 GiB VM-state, 373.9 MiB/s
2025-09-24 11:17:13 migration active, transferred 4.7 GiB of 8.0 GiB VM-state, 563.2 MiB/s
2025-09-24 11:17:14 migration active, transferred 5.2 GiB of 8.0 GiB VM-state, 548.6 MiB/s
2025-09-24 11:17:15 migration active, transferred 5.7 GiB of 8.0 GiB VM-state, 516.9 MiB/s
2025-09-24 11:17:16 migration active, transferred 6.3 GiB of 8.0 GiB VM-state, 514.5 MiB/s
2025-09-24 11:17:17 migration active, transferred 6.8 GiB of 8.0 GiB VM-state, 536.5 MiB/s
2025-09-24 11:17:18 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 144.5 MiB/s
2025-09-24 11:17:18 xbzrle: send updates to 368 pages in 76.2 KiB encoded memory
2025-09-24 11:17:18 average migration speed: 432.1 MiB/s - downtime 60 ms
2025-09-24 11:17:18 migration completed, transferred 7.3 GiB VM-state
2025-09-24 11:17:18 migration status: completed
2025-09-24 11:17:19 ERROR: tunnel replied 'ERR: resume failed - VM 553 qmp command 'query-status' failed - client closed connection' to command 'resume 553'
2025-09-24 11:17:19 stopping migration dbus-vmstate helpers
2025-09-24 11:17:19 migrated 0 conntrack state entries
400 Parameter verification failed.
node: VM 553 not running locally on node 'DEST-NODE'
proxy handler failed: pvesh create <api_path> --action <string> [OPTIONS] [FORMAT_OPTIONS]
2025-09-24 11:17:21 failed to stop dbus-vmstate on DEST-NODE: command 'pvesh create /nodes/DEST-NODE/qemu/553/dbus-vmstate --action stop' failed: exit code 2
2025-09-24 11:17:21 flushing conntrack state for guest on source node
2025-09-24 11:17:25 ERROR: migration finished with problems (duration 00:00:32)
TASK ERROR: migration problems

Logs at DEST HOST:
Sep 24 11:16:01 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:16:02 DEST-HOST pvedaemon[550553]: <user@domain> update VM 381: -net0 virtio=XX:XX:XX:XX:XX:XX,bridge=VLAN107
Sep 24 11:16:06 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:16:06 DEST-HOST pvestatd[33327]: VM 660 qmp command failed - VM 660 not running
Sep 24 11:16:26 DEST-HOST pvedaemon[550553]: <root@pam> successful auth for user 'root@pam'
Sep 24 11:16:30 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:16:35 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:16:40 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
Sep 24 11:16:55 DEST-HOST qm[1243414]: <root@pam> starting task UPID

EST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam:
Sep 24 11:16:55 DEST-HOST qm[1243421]: start VM 553: UPID

EST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam:
Sep 24 11:16:56 DEST-HOST systemd[1]: Started 553.scope.
Sep 24 11:16:57 DEST-HOST kernel: tap553i0: entered promiscuous mode
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered blocking state
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered disabled state
Sep 24 11:16:57 DEST-HOST kernel: tap553i0: entered allmulticast mode
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered blocking state
Sep 24 11:16:57 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered forwarding state
Sep 24 11:16:58 DEST-HOST kernel: tap553i1: entered promiscuous mode
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered blocking state
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered disabled state
Sep 24 11:16:58 DEST-HOST kernel: tap553i1: entered allmulticast mode
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered blocking state
Sep 24 11:16:58 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered forwarding state
Sep 24 11:16:58 DEST-HOST qm[1243421]: VM 553 started with PID 1243503.
Sep 24 11:16:58 DEST-HOST systemd[1]: Started pve-dbus-vmstate@553.service - PVE DBus VMState Helper (VM 553).
Sep 24 11:16:58 DEST-HOST qm[1243414]: <root@pam> end task UPID

EST-HOST:0012F91D:041B46BA:68D41977:qmstart:553:root@pam: OK
Sep 24 11:16:58 DEST-HOST dbus-vmstate[1243684]: pve-vmstate-553 listening on :1.74331
Sep 24 11:17:06 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 51 retries
Sep 24 11:17:11 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:17:15 DEST-HOST systemd[1]: pve-dbus-vmstate@553.service: Deactivated successfully.
Sep 24 11:17:17 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:17:18 DEST-HOST QEMU[1243503]: kvm: Unknown savevm section or instance 'dbus-vmstate/dbus-vmstate' 0. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices
Sep 24 11:17:18 DEST-HOST QEMU[1243503]: kvm: load of migration failed: Invalid argument
Sep 24 11:17:19 DEST-HOST kernel: tap553i1: left allmulticast mode
Sep 24 11:17:19 DEST-HOST kernel: VLAN107: port 4(tap553i1) entered disabled state
Sep 24 11:17:19 DEST-HOST kernel: tap553i0: left allmulticast mode
Sep 24 11:17:19 DEST-HOST kernel: VLAN70: port 10(tap553i0) entered disabled state
Sep 24 11:17:19 DEST-HOST qm[1243729]: VM 553 qmp command failed - VM 553 qmp command 'query-status' failed - client closed connection
Sep 24 11:17:19 DEST-HOST systemd[1]: 553.scope: Deactivated successfully.
Sep 24 11:17:19 DEST-HOST systemd[1]: 553.scope: Consumed 10.466s CPU time, 7.6G memory peak.
Sep 24 11:17:20 DEST-HOST qmeventd[1245294]: Starting cleanup for 553
Sep 24 11:17:21 DEST-HOST qmeventd[1245294]: Finished cleanup for 553
Sep 24 11:17:43 DEST-HOST pvestatd[33327]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 51 retries
Sep 24 11:17:48 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:17:51 DEST-HOST sshd-session[1247553]: Received disconnect from <IP-REDACTED> port 42042:11: disconnected by user
Sep 24 11:17:51 DEST-HOST sshd-session[1247553]: Disconnected from user root <IP-REDACTED> port 42042
Sep 24 11:17:53 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries
Sep 24 11:17:56 DEST-HOST pvedaemon[1086964]: <user@domain> update VM 381: -net1 virtio=XX:XX:XX:XX:XX:XX,bridge=VLAN70,link_down=1
Sep 24 11:18:19 DEST-HOST pvestatd[33327]: VM 114 qmp command failed - VM 114 qmp command 'query-proxmox-support' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
Sep 24 11:18:24 DEST-HOST pvestatd[33327]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - got timeout
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2783.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2784.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2785.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2786.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2787.
Sep 24 11:18:24 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2788.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2783.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2784.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2785.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2786.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2787.
Sep 24 11:18:25 DEST-HOST pvedaemon[550553]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 2788.
Sep 24 11:18:29 DEST-HOST pvestatd[33327]: VM 322 qmp command failed - VM 322 qmp command 'query-proxmox-support' failed - unable to connect to VM 322 qmp socket - timeout after 51 retries

I tried to capture relevant info at the dest host. Grabbing everything from that time frame would have been a lot for a forum post.

Search

Search

live migration issue on one VM

pmarasse

New Member

Attachments

pmarasse

New Member

ManFriday

New Member

fiona

Proxmox Staff Member

ManFriday

New Member

We value your privacy