Live VM Migration fails

Aug 9, 2024
25
2
3
So after upgrading to 9.0 I am unable to live migrate any of my VMs. It copies it over and then errors out and kills it - I've tried on multiple hosts, doesn't matter which direction this is done. We haven't changed the migration network and it doesn't matter if it's a new or old VM. Tried disabling connect track but doesn't make a difference.

2025-08-10 01:22:45 conntrack state migration not supported or disabled, active connections might get dropped
2025-08-10 01:22:45 use dedicated network address for sending migration traffic (10.100.2.6)
2025-08-10 01:22:45 starting migration of VM 105 to node 'pve1' (10.100.2.6)
2025-08-10 01:22:45 starting VM 105 on remote node 'pve1'
2025-08-10 01:22:49 start remote tunnel
2025-08-10 01:22:49 ssh tunnel ver 1
2025-08-10 01:22:49 starting online/live migration on unix:/run/qemu-server/105.migrate
2025-08-10 01:22:49 set migration capabilities
2025-08-10 01:22:50 migration downtime limit: 100 ms
2025-08-10 01:22:50 migration cachesize: 512.0 MiB
2025-08-10 01:22:50 set migration parameters
2025-08-10 01:22:50 start migrate command to unix:/run/qemu-server/105.migrate
2025-08-10 01:22:51 migration active, transferred 628.5 MiB of 4.0 GiB VM-state, 831.8 MiB/s
2025-08-10 01:22:52 average migration speed: 2.0 GiB/s - downtime 70 ms
2025-08-10 01:22:52 migration completed, transferred 1.3 GiB VM-state
2025-08-10 01:22:52 migration status: completed
2025-08-10 01:22:52 ERROR: tunnel replied 'ERR: resume failed - VM 105 qmp command 'query-status' failed - client closed connection' to command 'resume 105'
VM quit/powerdown failed - terminating now with SIGTERM
2025-08-10 01:23:05 ERROR: migration finished with problems (duration 00:00:20)
TASK ERROR: migration problems
 
please post the VM config and pveversion -v from both sides
 
Hi,
please also share the system logs/journal from the target node around the time the issue occurred.
 
please post the VM config and pveversion -v from both sides
I am no longer able to reproduce, it happened whenever we moved a VM after upgrading to 9.0, but going back and forth with the same VMs seems to work now. Again haven't changed or updated anything...I'll keep an eye on it and if it happens again grab more logs.

For the sake of completeness I'm dropping some info below for the affected nodes/VMs.

root@pve1:~# pversion -v
-bash: pversion: command not found
root@pve1:~# pveversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
proxmox-kernel-helper: 9.0.3
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
dnsmasq: 2.91-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.3
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.7
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
openvswitch-switch: 3.5.0-1+b1
proxmox-backup-client: 4.0.9-1
proxmox-backup-file-restore: 4.0.9-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.0
proxmox-widget-toolkit: 5.0.4
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.7
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.16
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux:
root@pve1:~# qm config 105
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cipassword: **********
ciuser: skypulse
cores: 2
cpu: host
efidisk0: vm_pool:vm-105-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
ipconfig0: ip=10.80.0.10/24,gw=10.80.0.1,ip6=2607:8dc0:3000::10/64,gw6=2607:8dc0:3000::1
ipconfig1: ip=dhcp
machine: q35
memory: 4092
meta: creation-qemu=9.2.0,ctime=1754078032
name: skyp-kea-dhcp1
nameserver: 23.170.233.18 23.170.233.19
net0: virtio=BC:24:11:46:3D:6F,bridge=VXL15001,firewall=1
net1: virtio=BC:24:11:31:D2:C7,bridge=VXL16001,firewall=1
numa: 0
ostype: l26
scsi0: vm_pool:vm-105-disk-1,size=32000M
scsi1: vm_pool:vm-105-cloudinit,media=cdrom,size=4M
scsihw: virtio-scsi-single
searchdomain: mdc.elp.skypulse.net
smbios1: uuid=ddaaa424-2238-4a04-b3f0-ba46930ada47
sockets: 1
sshkeys: ssh-rsa%20AAAAB3NzaC1yc2EAAAADAQABAAACAQDF1%2B%2FO4MhRlrxm1cclXMqOxxB%2FifEUoAEfz6BKUct91WnVK19MfDWYCev6XXI36EEnGZCNrXpUVUyKNszSrTwgBeLD9vH%2Blv%2F44C97e4F1%2F4LB5lwCvxtcZYwPiognDeGndNkVmA1mHGFzmoVY2OjN6XIfwbjYW1NfRm1Hge2XkAZ0klHbWD8iznZ97ubyooxOMinb2vrWBfASeStkJ%2BELN%2F65Hb8GTOzD4xZHOWLhFx3DrjaD9MWRDN9zDe0TqgvdNctbaoRkMHBlk19uNHRkRCY%2B%2BDDcu1%2B7b4HkSBmACaV%2ByjBtL%2Fuu5q6buOFrLuu8Iiu1J%2FS60KxYHBhy7KQmo39v7u9%2FkeRFk%2Fs2W%2BP7AufRdcpys9vChD5NEDEtVKAi1BU7u%2FAC2CnX7v%2BY6tDqJYwqJT8SHHBookvlKBb4zn3nzlleiKpgURpazgPbonNhdpPXkIXYZyvc7SEJDkGvYnzN3C%2BbJBsEnXyQFKEfrrIpwipLls%2B8WS%2Fe5RI5E5CI5nr%2BCdD%2FswrXO4t%2BUcDSmT6GbvGjLQyAo2no1LsaNqemzAGscsG6yGPX5sL5vtoBqyFJosP1g8iHz9dJMAnzjg%2FGDYQhCyUslSB%2FE6c9H083Al6WcFoVWOXxp%2BH8J0h%2FkDHOXIiSMTPtQMiXG58cidT0Z96A3a67rzLjWnuBxQ%3D%3D%0Assh-rsa%20AAAAB3NzaC1yc2EAAAADAQABAAABgQDDpgnd7cn5p%2BzDcHOueVj7PQScMeqTg7joF5Q%2FPq4A6fuWYd7TmQZyQSR9bFuK65Ik3hZbb%2FN%2BlTyqAaeE3K1ERkVguezVkVFBpTWMCpqKmvHac0B14QOFbnMZQUJ9nS3WNj9YJ2vBzV%2BSyTZ49hEVeEcubnVmRyqBZ02xCxtx3Mq8mPUPZ%2B%2Bp23m9Juus17JD9SMDGTPnQWBln4KFgozrhlhXosj0VIvLfxYU%2FiVDi2r5%2BJtxbFHuegBpdJyla5rzq5t3ByL3fnyXxsIbCteLnOtfhmKzK%2FJfPmOQbj4is8%2BbnMLR4OPkHGCPgHKBklcfHHnL4jT35iBZOhyDCJFfc%2B9Q9wzhjnIljIqDj9vujYol1Z5du3e3OBxWsOZTyKKRCnMVFwqFr6Y2Mba0zSFpd7%2FSIrM92hiIb2%2B%2BXQkFaZovbKyMb7tb%2Bo7ajYVYrx%2B14dXcLL5b8SIFWz%2FjxZ9lKroir2jIgFIJlKkOwqNz232pqciVmrleXVn8tkg779E%3D%0A
tpmstate0: vm_pool:vm-105-disk-2,size=4M,version=v2.0
vmgenid: 9076ac18-6fc0-489e-867e-f5ad3cc573c4

root@pve2:~# qm config 107
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cipassword: **********
ciuser: skypulse
cores: 2
cpu: host
efidisk0: vm_pool:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
ipconfig0: ip=10.80.0.14/24,gw=10.80.0.1,ip6=2607:8dc0:2000::14/64,gw6=2607:8dc0:2000::1
ipconfig1: ip=dhcp
machine: q35
memory: 4092
meta: creation-qemu=9.2.0,ctime=1754078032
name: skyp-pdns-postgresql1
nameserver: 23.170.233.18 23.170.233.19
net0: virtio=BC:24:11:FC:E6:AD,bridge=VXL15001,firewall=1
net1: virtio=BC:24:11:97:DC:E2,bridge=VXL16001,firewall=1,link_down=1
numa: 0
ostype: l26
scsi0: vm_pool:vm-107-disk-1,size=32000M
scsi1: vm_pool:vm-107-cloudinit,media=cdrom,size=4M
scsi2: vm_pool:vm-107-disk-3,cache=writeback,iothread=1,size=100G
scsihw: virtio-scsi-single
searchdomain: mdc.elp.skypulse.net
smbios1: uuid=227443d4-ee1f-411e-abf4-0743ce14a5d8
sockets: 1
sshkeys: ssh-rsa%20AAAAB3NzaC1yc2EAAAADAQABAAACAQDF1%2B%2FO4MhRlrxm1cclXMqOxxB%2FifEUoAEfz6BKUct91WnVK19MfDWYCev6XXI36EEnGZCNrXpUVUyKNszSrTwgBeLD9vH%2Blv%2F44C97e4F1%2F4LB5lwCvxtcZYwPiognDeGndNkVmA1mHGFzmoVY2OjN6XIfwbjYW1NfRm1Hge2XkAZ0klHbWD8iznZ97ubyooxOMinb2vrWBfASeStkJ%2BELN%2F65Hb8GTOzD4xZHOWLhFx3DrjaD9MWRDN9zDe0TqgvdNctbaoRkMHBlk19uNHRkRCY%2B%2BDDcu1%2B7b4HkSBmACaV%2ByjBtL%2Fuu5q6buOFrLuu8Iiu1J%2FS60KxYHBhy7KQmo39v7u9%2FkeRFk%2Fs2W%2BP7AufRdcpys9vChD5NEDEtVKAi1BU7u%2FAC2CnX7v%2BY6tDqJYwqJT8SHHBookvlKBb4zn3nzlleiKpgURpazgPbonNhdpPXkIXYZyvc7SEJDkGvYnzN3C%2BbJBsEnXyQFKEfrrIpwipLls%2B8WS%2Fe5RI5E5CI5nr%2BCdD%2FswrXO4t%2BUcDSmT6GbvGjLQyAo2no1LsaNqemzAGscsG6yGPX5sL5vtoBqyFJosP1g8iHz9dJMAnzjg%2FGDYQhCyUslSB%2FE6c9H083Al6WcFoVWOXxp%2BH8J0h%2FkDHOXIiSMTPtQMiXG58cidT0Z96A3a67rzLjWnuBxQ%3D%3D%0Assh-rsa%20AAAAB3NzaC1yc2EAAAADAQABAAACAQDDdO3XO29qcQmPzAs08aAUcJU89GURRCPnqaJMpyK%2FTF0XkKB88v5jX6TdQ1vty2j8K57mF58fhtM4lJm3Pz%2BJ2dvERxEoX39l2p5njlES7tACaxHXHWp5s59RZs4oVBHsl58ab3M2Nh7cVnLIm%2FernEknU814Exx0rLtWI01Cq%2FbIaYQs7Q9Ai%2F4%2Fiu%2FXU7ask%2FS2ZUacIXywc%2FaHyOMzkX7128IHDCpzAHikFhNPcgafMWbUuMuaxX%2BqXX6FpkucDLpjeW86swgjYG99UCG4jUZPeptq4BDsFdUV0aINpFz76bRB%2Bg7%2BMskDAdPL%2Fy7SkczSLnB1wR%2ByKU0%2Bxn85sLcj666IB88u9DZe5sppyuPwUPqobC7nrDL6cCYWTW50nqvDh1EbFVbyl8FhxX8Dusb9feYtVEShQaXAuYDdJTd1VjfSujgqL6nSYYrKETxPCQL4SNby5mFzp6D8D%2BkKK3lMF8GwX8lTgELZjTHxa%2FdD%2BzjNDK4oCv%2B%2F7bvOP%2F39C80NNB%2FJ6GECRxMyUbpFa7Tkr4m1Za1Db7NaX4rEAv7e8rbWvn%2F3Ijj0GB0K2IPVpJ87nHhyK0Y%2Bq5ag8MgTmlkQ2cnQx0rtcjbHlHIMtjGDrapIn1wf32PqdnAN6JJdpLNW6t4Yn2CGeVouVPPX%2FYuYU1lft4Ut%2BnCQyfZDDQ%3D%3D%0A
tpmstate0: vm_pool:vm-107-disk-2,size=4M,version=v2.0
vmgenid: e72ff3c3-da04-42fd-8caf-879f8a1690

root@pve2:~# pveversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
proxmox-kernel-helper: 9.0.3
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
dnsmasq: 2.91-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.3
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.7
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
openvswitch-switch: 3.5.0-1+b1
proxmox-backup-client: 4.0.9-1
proxmox-backup-file-restore: 4.0.9-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.0
proxmox-widget-toolkit: 5.0.4
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.7
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.16
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.3-pve1
 
please also share the system logs/journal from the target node around the time the issue occurred.
The actual error message is likely there, because VM 105 qmp command 'query-status' failed - client closed connection' to command most often means that the QEMU instance on the target crashed. Would be great if you could provide that too!
 
Target:
Aug 10 01:22:45 pve1 pmxcfs[4027]: [status] notice: received log
Aug 10 01:22:45 pve1 unix_chkpwd[3177431]: account root has password changed in future
Aug 10 01:22:45 pve1 sshd-session[3177429]: Accepted publickey for root from 10.100.2.8 port 33008 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:45 pve1 sshd-session[3177429]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:45 pve1 systemd-logind[2254]: New session 285 of user root.
Aug 10 01:22:45 pve1 systemd[1]: Started session-285.scope - Session 285 of User root.
Aug 10 01:22:45 pve1 sshd-session[3177436]: Received disconnect from 10.100.2.8 port 33008:11: disconnected by user
Aug 10 01:22:45 pve1 sshd-session[3177436]: Disconnected from user root 10.100.2.8 port 33008
Aug 10 01:22:45 pve1 sshd-session[3177429]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:45 pve1 systemd-logind[2254]: Session 285 logged out. Waiting for processes to exit.
Aug 10 01:22:45 pve1 systemd[1]: session-285.scope: Deactivated successfully.
Aug 10 01:22:45 pve1 systemd-logind[2254]: Removed session 285.
Aug 10 01:22:45 pve1 unix_chkpwd[3177442]: account root has password changed in future
Aug 10 01:22:45 pve1 sshd-session[3177440]: Accepted publickey for root from 10.100.2.8 port 49078 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:45 pve1 sshd-session[3177440]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:45 pve1 systemd-logind[2254]: New session 286 of user root.
Aug 10 01:22:45 pve1 systemd[1]: Started session-286.scope - Session 286 of User root.
Aug 10 01:22:45 pve1 sshd-session[3177447]: Received disconnect from 10.100.2.8 port 49078:11: disconnected by user
Aug 10 01:22:45 pve1 sshd-session[3177447]: Disconnected from user root 10.100.2.8 port 49078
Aug 10 01:22:45 pve1 sshd-session[3177440]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:45 pve1 systemd[1]: session-286.scope: Deactivated successfully.
Aug 10 01:22:45 pve1 systemd-logind[2254]: Session 286 logged out. Waiting for processes to exit.
Aug 10 01:22:45 pve1 systemd-logind[2254]: Removed session 286.
Aug 10 01:22:45 pve1 unix_chkpwd[3177457]: account root has password changed in future
Aug 10 01:22:45 pve1 sshd-session[3177453]: Accepted publickey for root from 10.100.2.8 port 49092 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:45 pve1 sshd-session[3177453]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:45 pve1 systemd-logind[2254]: New session 287 of user root.
Aug 10 01:22:45 pve1 systemd[1]: Started session-287.scope - Session 287 of User root.
Aug 10 01:22:45 pve1 sshd-session[3177462]: Received disconnect from 10.100.2.8 port 49092:11: disconnected by user
Aug 10 01:22:45 pve1 sshd-session[3177462]: Disconnected from user root 10.100.2.8 port 49092
Aug 10 01:22:45 pve1 sshd-session[3177453]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:45 pve1 systemd-logind[2254]: Session 287 logged out. Waiting for processes to exit.
Aug 10 01:22:45 pve1 systemd[1]: session-287.scope: Deactivated successfully.
Aug 10 01:22:45 pve1 systemd-logind[2254]: Removed session 287.
Aug 10 01:22:46 pve1 unix_chkpwd[3177467]: account root has password changed in future
Aug 10 01:22:46 pve1 sshd-session[3177465]: Accepted publickey for root from 10.100.2.8 port 49094 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:46 pve1 sshd-session[3177465]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:46 pve1 systemd-logind[2254]: New session 288 of user root.
Aug 10 01:22:46 pve1 systemd[1]: Started session-288.scope - Session 288 of User root.
Aug 10 01:22:46 pve1 qm[3177473]: <root@pam> starting task UPID:pve1:00307C02:0228F144:689848C6:qmstart:105:root@pam:
Aug 10 01:22:46 pve1 qm[3177474]: start VM 105: UPID:pve1:00307C02:0228F144:689848C6:qmstart:105:root@pam:
Aug 10 01:22:47 pve1 systemd[1]: Started 105.scope.
Aug 10 01:22:47 pve1 kernel: rbd: rbd2: capacity 4194304 features 0x3d
Aug 10 01:22:47 pve1 kernel: audit: type=1400 audit(1754810567.206:226): apparmor="DENIED" operation="capable" class="cap" profile="swtpm" pid=3177545 comm="swtpm" capability=21 capname="sys_admin"
Aug 10 01:22:47 pve1 zebra[2965583]: libyang Invalid boolean value "". (/frr-vrf:lib/vrf/state/active)
Aug 10 01:22:47 pve1 zebra[2965583]: libyang Invalid type uint32 empty value. (/frr-vrf:lib/vrf/state/id)
Aug 10 01:22:47 pve1 kernel: tap105i0: entered promiscuous mode
Aug 10 01:22:47 pve1 zebra[2965583]: libyang Invalid type uint32 empty value. (/frr-interface:lib/interface/state/mtu)
Aug 10 01:22:48 pve1 ovs-vsctl[3177618]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Aug 10 01:22:48 pve1 ovs-vsctl[3177618]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Aug 10 01:22:48 pve1 ovs-vsctl[3177619]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Aug 10 01:22:48 pve1 ovs-vsctl[3177619]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Aug 10 01:22:48 pve1 kernel: VXL15001: port 3(tap105i0) entered blocking state
Aug 10 01:22:48 pve1 kernel: VXL15001: port 3(tap105i0) entered disabled state
Aug 10 01:22:48 pve1 kernel: tap105i0: entered allmulticast mode
Aug 10 01:22:48 pve1 kernel: VXL15001: port 3(tap105i0) entered blocking state
Aug 10 01:22:48 pve1 kernel: VXL15001: port 3(tap105i0) entered forwarding state
Aug 10 01:22:48 pve1 zebra[2965583]: libyang Invalid boolean value "". (/frr-vrf:lib/vrf/state/active)
Aug 10 01:22:48 pve1 zebra[2965583]: libyang Invalid type uint32 empty value. (/frr-vrf:lib/vrf/state/id)
Aug 10 01:22:48 pve1 kernel: tap105i1: entered promiscuous mode
Aug 10 01:22:48 pve1 ovs-vsctl[3177658]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i1
Aug 10 01:22:48 pve1 ovs-vsctl[3177658]: ovs|00002|db_ctl_base|ERR|no port named tap105i1
Aug 10 01:22:48 pve1 ovs-vsctl[3177660]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i1
Aug 10 01:22:48 pve1 ovs-vsctl[3177660]: ovs|00002|db_ctl_base|ERR|no port named fwln105i1
Aug 10 01:22:48 pve1 kernel: VXL16001: port 7(tap105i1) entered blocking state
Aug 10 01:22:48 pve1 kernel: VXL16001: port 7(tap105i1) entered disabled state
Aug 10 01:22:48 pve1 kernel: tap105i1: entered allmulticast mode
Aug 10 01:22:48 pve1 kernel: VXL16001: port 7(tap105i1) entered blocking state
Aug 10 01:22:48 pve1 kernel: VXL16001: port 7(tap105i1) entered forwarding state
Aug 10 01:22:49 pve1 qm[3177474]: VM 105 started with PID 3177550.
Aug 10 01:22:49 pve1 qm[3177473]: <root@pam> end task UPID:pve1:00307C02:0228F144:689848C6:qmstart:105:root@pam: OK
Aug 10 01:22:49 pve1 sshd-session[3177472]: Received disconnect from 10.100.2.8 port 49094:11: disconnected by user
Aug 10 01:22:49 pve1 sshd-session[3177472]: Disconnected from user root 10.100.2.8 port 49094
Aug 10 01:22:49 pve1 sshd-session[3177465]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:49 pve1 systemd[1]: session-288.scope: Deactivated successfully.
Aug 10 01:22:49 pve1 systemd[1]: session-288.scope: Consumed 874ms CPU time, 132.7M memory peak.
Aug 10 01:22:49 pve1 systemd-logind[2254]: Session 288 logged out. Waiting for processes to exit.
Aug 10 01:22:49 pve1 systemd-logind[2254]: Removed session 288.
Aug 10 01:22:49 pve1 unix_chkpwd[3177692]: account root has password changed in future
Aug 10 01:22:49 pve1 sshd-session[3177686]: Accepted publickey for root from 10.100.2.8 port 49106 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:49 pve1 sshd-session[3177686]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:49 pve1 systemd-logind[2254]: New session 289 of user root.
Aug 10 01:22:49 pve1 systemd[1]: Started session-289.scope - Session 289 of User root.
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load PCIDevice:config
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load virtio-net:virtio
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: error while loading state for instance 0x0 of device '0000:00:1e.0:01.0:12.0/virtio-net'
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: load of migration failed: Invalid argument
Aug 10 01:22:52 pve1 kernel: tap105i1: left allmulticast mode
Aug 10 01:22:52 pve1 kernel: VXL16001: port 7(tap105i1) entered disabled state
Aug 10 01:22:52 pve1 ovs-vsctl[3177721]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i1
Aug 10 01:22:52 pve1 ovs-vsctl[3177721]: ovs|00002|db_ctl_base|ERR|no port named fwln105i1
Aug 10 01:22:52 pve1 ovs-vsctl[3177722]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i1
Aug 10 01:22:52 pve1 ovs-vsctl[3177722]: ovs|00002|db_ctl_base|ERR|no port named tap105i1
Aug 10 01:22:52 pve1 kernel: tap105i0: left allmulticast mode
Aug 10 01:22:52 pve1 kernel: VXL15001: port 3(tap105i0) entered disabled state
Aug 10 01:22:52 pve1 ovs-vsctl[3177725]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Aug 10 01:22:52 pve1 ovs-vsctl[3177725]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Aug 10 01:22:52 pve1 ovs-vsctl[3177726]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Aug 10 01:22:52 pve1 ovs-vsctl[3177726]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Aug 10 01:22:52 pve1 qm[3177702]: VM 105 qmp command failed - VM 105 qmp command 'query-status' failed - client closed connection
Aug 10 01:22:52 pve1 sshd-session[3177697]: Received disconnect from 10.100.2.8 port 49106:11: disconnected by user
Aug 10 01:22:52 pve1 sshd-session[3177697]: Disconnected from user root 10.100.2.8 port 49106
Aug 10 01:22:52 pve1 sshd-session[3177686]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:52 pve1 systemd-logind[2254]: Session 289 logged out. Waiting for processes to exit.
Aug 10 01:22:52 pve1 systemd[1]: session-289.scope: Deactivated successfully.
Aug 10 01:22:52 pve1 systemd[1]: session-289.scope: Consumed 2.546s CPU time, 117M memory peak.
Aug 10 01:22:52 pve1 systemd-logind[2254]: Removed session 289.
Aug 10 01:22:52 pve1 systemd[1]: 105.scope: Deactivated successfully.
Aug 10 01:22:52 pve1 systemd[1]: 105.scope: Consumed 3.441s CPU time, 1.4G memory peak.
Aug 10 01:22:53 pve1 qmeventd[3177752]: Starting cleanup for 105
Aug 10 01:22:53 pve1 ovs-vsctl[3177755]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177755]: ovs|00002|db_ctl_base|ERR|no port named fwln105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177756]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177756]: ovs|00002|db_ctl_base|ERR|no port named tap105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177757]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177757]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177758]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177758]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Aug 10 01:22:53 pve1 qmeventd[3177752]: Finished cleanup for 105
Aug 10 01:22:53 pve1 ovs-vsctl[3177755]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177755]: ovs|00002|db_ctl_base|ERR|no port named fwln105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177756]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177756]: ovs|00002|db_ctl_base|ERR|no port named tap105i1
Aug 10 01:22:53 pve1 ovs-vsctl[3177757]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177757]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177758]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Aug 10 01:22:53 pve1 ovs-vsctl[3177758]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Aug 10 01:22:53 pve1 qmeventd[3177752]: Finished cleanup for 105
Aug 10 01:22:53 pve1 unix_chkpwd[3177782]: account root has password changed in future
Aug 10 01:22:53 pve1 sshd-session[3177780]: Accepted publickey for root from 10.100.2.8 port 49110 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:22:53 pve1 sshd-session[3177780]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:22:53 pve1 systemd-logind[2254]: New session 290 of user root.
Aug 10 01:22:53 pve1 systemd[1]: Started session-290.scope - Session 290 of User root.
Aug 10 01:22:53 pve1 sshd-session[3177787]: Received disconnect from 10.100.2.8 port 49110:11: disconnected by user
Aug 10 01:22:53 pve1 sshd-session[3177787]: Disconnected from user root 10.100.2.8 port 49110
Aug 10 01:22:53 pve1 sshd-session[3177780]: pam_unix(sshd:session): session closed for user root
Aug 10 01:22:53 pve1 systemd-logind[2254]: Session 290 logged out. Waiting for processes to exit.
Aug 10 01:22:53 pve1 systemd[1]: session-290.scope: Deactivated successfully.
Aug 10 01:22:53 pve1 systemd-logind[2254]: Removed session 290.
Aug 10 01:23:04 pve1 unix_chkpwd[3177890]: account root has password changed in future
Aug 10 01:23:04 pve1 sshd-session[3177888]: Accepted publickey for root from 10.100.2.8 port 54208 ssh2: RSA SHA256:VcY1kqbWUgJ/mEVHVa14AFJxT3qjTDTZYTwcvSACJq8
Aug 10 01:23:04 pve1 sshd-session[3177888]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 10 01:23:04 pve1 systemd-logind[2254]: New session 291 of user root.
Aug 10 01:23:04 pve1 systemd[1]: Started session-291.scope - Session 291 of User root.
Aug 10 01:23:05 pve1 sshd-session[3177895]: Received disconnect from 10.100.2.8 port 54208:11: disconnected by user
Aug 10 01:23:05 pve1 sshd-session[3177895]: Disconnected from user root 10.100.2.8 port 54208
Aug 10 01:23:05 pve1 sshd-session[3177888]: pam_unix(sshd:session): session closed for user root
Aug 10 01:23:05 pve1 systemd-logind[2254]: Session 291 logged out. Waiting for processes to exit.
Aug 10 01:23:05 pve1 systemd[1]: session-291.scope: Deactivated successfully.
Aug 10 01:23:05 pve1 systemd[1]: session-291.scope: Consumed 718ms CPU time, 114.8M memory peak.
Aug 10 01:23:05 pve1 systemd-logind[2254]: Removed session 291.
Aug 10 01:23:05 pve1 pmxcfs[4027]: [status] notice: received log
 
Last edited:
Source:

Source:
Aug 10 01:20:41 pve3 dbus-vmstate[3438456]: received 1 conntrack entries
Aug 10 01:20:41 pve3 dbus-vmstate[3438456]: transferring 205 bytes of conntrack state
Aug 10 01:20:42 pve3 pvedaemon[2469134]: <root@pam> end task UPID:pve3:0034772E:022C7FBA:6898483C:vncproxy:109:root@pam: OK
Aug 10 01:20:42 pve3 dbus-vmstate[3438456]: shutting down gracefully ..
Aug 10 01:20:42 pve3 systemd[1]: pve-dbus-vmstate@109.service: Deactivated successfully.
Aug 10 01:20:50 pve3 pvedaemon[3438453]: VM 109 qmp command failed - VM 109 qmp command 'quit' failed - got timeout
Aug 10 01:20:50 pve3 pvedaemon[3438453]: VM quit/powerdown failed - terminating now with SIGTERM
Aug 10 01:20:54 pve3 pvedaemon[2165261]: worker exit
Aug 10 01:20:54 pve3 pvedaemon[4707]: worker 2165261 finished
Aug 10 01:20:54 pve3 pvedaemon[4707]: starting 1 worker(s)
Aug 10 01:20:54 pve3 pvedaemon[4707]: worker 3438680 started
Aug 10 01:20:55 pve3 kernel: tap109i0: left allmulticast mode
Aug 10 01:20:55 pve3 kernel: VXL15001: port 4(tap109i0) entered disabled state
Aug 10 01:20:55 pve3 ovs-vsctl[3438706]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln109i0
Aug 10 01:20:55 pve3 ovs-vsctl[3438706]: ovs|00002|db_ctl_base|ERR|no port named fwln109i0
Aug 10 01:20:55 pve3 ovs-vsctl[3438707]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap109i0
Aug 10 01:20:55 pve3 ovs-vsctl[3438707]: ovs|00002|db_ctl_base|ERR|no port named tap109i0
Aug 10 01:20:56 pve3 kernel: tap109i1: left allmulticast mode
Aug 10 01:20:56 pve3 kernel: VXL16001: port 4(tap109i1) entered disabled state
Aug 10 01:20:56 pve3 ovs-vsctl[3438712]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln109i1
Aug 10 01:20:56 pve3 ovs-vsctl[3438712]: ovs|00002|db_ctl_base|ERR|no port named fwln109i1
Aug 10 01:20:56 pve3 ovs-vsctl[3438713]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap109i1
Aug 10 01:20:56 pve3 ovs-vsctl[3438713]: ovs|00002|db_ctl_base|ERR|no port named tap109i1
Aug 10 01:20:56 pve3 qmeventd[2256]: read: Connection reset by peer
Aug 10 01:20:56 pve3 systemd[1]: 109.scope: Deactivated successfully.
Aug 10 01:20:56 pve3 systemd[1]: 109.scope: Consumed 36min 11.927s CPU time, 1.5G memory peak.
Aug 10 01:20:57 pve3 pvedaemon[3438453]: migration problems
Aug 10 01:21:22 pve3 pmxcfs[4034]: [status] notice: received log
Aug 10 01:21:24 pve3 pmxcfs[4034]: [status] notice: received log
Aug 10 01:21:27 pve3 pmxcfs[4034]: [status] notice: received log
Aug 10 01:21:27 pve3 pvedaemon[3439007]: starting vnc proxy UPID:pve3:0034799F:022C969C:68984877:vncproxy:109:root@pam:
Aug 10 01:21:27 pve3 pvedaemon[2469134]: <root@pam> starting task UPID:pve3:0034799F:022C969C:68984877:vncproxy:109:root@pam:
Aug 10 01:21:27 pve3 pvedaemon[3439013]: starting vnc proxy UPID:pve3:003479A5:022C96C7:68984877:vncproxy:109:root@pam:
Aug 10 01:21:27 pve3 pvedaemon[2472134]: <root@pam> starting task UPID:pve3:003479A5:022C96C7:68984877:vncproxy:109:root@pam:
Aug 10 01:21:32 pve3 pvedaemon[2472134]: <root@pam> end task UPID:pve3:003479A5:022C96C7:68984877:vncproxy:109:root@pam: OK
Aug 10 01:21:32 pve3 pvedaemon[3439096]: starting vnc proxy UPID:pve3:003479F8:022C98C1:6898487C:vncproxy:107:root@pam:
Aug 10 01:21:32 pve3 pvedaemon[3438680]: <root@pam> starting task UPID:pve3:003479F8:022C98C1:6898487C:vncproxy:107:root@pam:
Aug 10 01:21:37 pve3 pvedaemon[3439007]: connection timed out
Aug 10 01:21:37 pve3 pvedaemon[2469134]: <root@pam> end task UPID:pve3:0034799F:022C969C:68984877:vncproxy:109:root@pam: connection timed out
Aug 10 01:22:05 pve3 pvedaemon[3438680]: <root@pam> end task UPID:pve3:003479F8:022C98C1:6898487C:vncproxy:107:root@pam: OK
Aug 10 01:22:06 pve3 pvedaemon[3439387]: starting vnc proxy UPID:pve3:00347B1B:022CA5C6:6898489E:vncproxy:109:root@pam:
Aug 10 01:22:06 pve3 pvedaemon[3438680]: <root@pam> starting task UPID:pve3:00347B1B:022CA5C6:6898489E:vncproxy:109:root@pam:
Aug 10 01:22:10 pve3 pvedaemon[3438680]: <root@pam> end task UPID:pve3:00347B1B:022CA5C6:6898489E:vncproxy:109:root@pam: OK
Aug 10 01:22:11 pve3 pvedaemon[3439455]: starting vnc proxy UPID:pve3:00347B5F:022CA7EB:689848A3:vncproxy:107:root@pam:
Aug 10 01:22:11 pve3 pvedaemon[2469134]: <root@pam> starting task UPID:pve3:00347B5F:022CA7EB:689848A3:vncproxy:107:root@pam:
Aug 10 01:22:19 pve3 pvedaemon[2469134]: <root@pam> end task UPID:pve3:00347B5F:022CA7EB:689848A3:vncproxy:107:root@pam: OK
Aug 10 01:22:45 pve3 pvedaemon[2469134]: <root@pam> starting task UPID:pve3:00347C73:022CB509:689848C5:qmigrate:105:root@pam:
Aug 10 01:22:46 pve3 pmxcfs[4034]: [status] notice: received log
Aug 10 01:22:49 pve3 pmxcfs[4034]: [status] notice: received log
Aug 10 01:22:58 pve3 pvedaemon[3439731]: VM 105 qmp command failed - VM 105 qmp command 'quit' failed - got timeout
Aug 10 01:22:58 pve3 pvedaemon[3439731]: VM quit/powerdown failed - terminating now with SIGTERM
Aug 10 01:23:04 pve3 kernel: tap105i0: left allmulticast mode
Aug 10 01:23:04 pve3 kernel: VXL15001: port 2(tap105i0) entered disabled state
Aug 10 01:23:04 pve3 ovs-vsctl[3439975]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Aug 10 01:23:04 pve3 ovs-vsctl[3439975]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Aug 10 01:23:04 pve3 ovs-vsctl[3439976]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Aug 10 01:23:04 pve3 ovs-vsctl[3439976]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Aug 10 01:23:04 pve3 kernel: tap105i1: left allmulticast mode
Aug 10 01:23:04 pve3 kernel: VXL16001: port 2(tap105i1) entered disabled state
Aug 10 01:23:04 pve3 ovs-vsctl[3439982]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i1
Aug 10 01:23:04 pve3 ovs-vsctl[3439982]: ovs|00002|db_ctl_base|ERR|no port named fwln105i1
Aug 10 01:23:04 pve3 ovs-vsctl[3439983]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i1
Aug 10 01:23:04 pve3 ovs-vsctl[3439983]: ovs|00002|db_ctl_base|ERR|no port named tap105i1
Aug 10 01:23:04 pve3 qmeventd[2256]: read: Connection reset by peer
Aug 10 01:23:04 pve3 systemd[1]: 105.scope: Deactivated successfully.
Aug 10 01:23:04 pve3 systemd[1]: 105.scope: Consumed 34min 57.471s CPU time, 1.5G memory peak.
Aug 10 01:23:05 pve3 pvedaemon[3439731]: migration problems
Aug 10 01:23:05 pve3 pvedaemon[2469134]: <root@pam> end task UPID:pve3:00347C73:022CB509:689848C5:qmigrate:105:root@pam: migration problems
 
So the error occurred here:
Code:
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load PCIDevice:config
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load virtio-net:virtio
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: error while loading state for instance 0x0 of device '0000:00:1e.0:01.0:12.0/virtio-net'
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: load of migration failed: Invalid argument

Could you tell us what guest OS and kernel version was running in that guest?

EDIT: the configuration for the VXL15001 and VXL16001 bridges on the host would also be interesting, as would be the network configuration inside the guest.
 
Last edited:
So the error occurred here:
Code:
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load PCIDevice:config
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: Failed to load virtio-net:virtio
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: error while loading state for instance 0x0 of device '0000:00:1e.0:01.0:12.0/virtio-net'
Aug 10 01:22:52 pve1 QEMU[3177550]: kvm: load of migration failed: Invalid argument

Could you tell us what guest OS and kernel version was running in that guest?

EDIT: the configuration for the VXL15001 and VXL16001 bridges on the host would also be interesting.
 
Ubuntu Server 24.04 on 6.8.0.71-generic. We use cloud-init to provision network info, there's nothing special about the VM itself. We use EVPN SDN, and these are VXLANs attached to an EVPN zone. The only difference is that VXL15001 is for general internet access and has an anycast gateway on the hosts. VXL16001 is for internal management purposes, the gateway is on the firewall itself.

Attached cloud-init, the sdn interface snippets and frr.conf

YAML:
skypulse@skyp-pdns-postgresql1:~$ sudo cat /etc/netplan/50-cloud-init.yaml
network:
  version: 2
  ethernets:
    eth0:
      match:
        macaddress: "bc:24:11:fc:e6:ad"
      addresses:
      - "10.80.0.14/24"
      - "2607:8dc0:2000::14/64"
      nameservers:
        addresses:
        - 23.170.233.18
        - 23.170.233.19
        search:
        - mdc.elp.skypulse.net
      set-name: "eth0"
      routes:
      - to: "default"
        via: "10.80.0.1"
      - to: "default"
        via: "2607:8dc0:2000::1"
    eth1:
      match:
        macaddress: "bc:24:11:97:dc:e2"
      dhcp4: true
      set-name: "eth1"
auto vxlan_VXL15001
iface vxlan_VXL15001
vxlan-id 15001
vxlan-local-tunnelip 10.127.1.5
bridge-learning off
bridge-arp-nd-suppress on
mtu 9000

auto vxlan_VXL16001
iface vxlan_VXL16001
vxlan-id 16001
vxlan-local-tunnelip 10.127.1.5
bridge-learning off
bridge-arp-nd-suppress on
mtu 1500

root@pve1:~# cat /etc/frr/frr.conf
frr version 10.3.1
frr defaults datacenter
hostname pve1
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_access
vni 15000
exit-vrf
!
vrf vrf_dmz
vni 14000
exit-vrf
!
vrf vrf_mgmt
vni 16000
exit-vrf
!
interface vlan3000
# Fix BGP peering - RR clients only peer with RRs
ip ospf bfd
!
router bgp 65100
bgp router-id 10.127.1.5
no bgp hard-administrative-reset
no bgp default ipv4-unicast
coalesce-time 1000
no bgp graceful-restart notification
neighbor VTEP peer-group
neighbor VTEP remote-as 65100
neighbor VTEP bfd
neighbor VTEP update-source dummy_PVE
neighbor 10.127.1.6 peer-group VTEP
neighbor 10.127.1.7 peer-group VTEP
# Remove BFD from peer-group (avoid multihop BFD issues)
no neighbor VTEP bfd
# Remove incorrect SDN-generated mesh peers
no neighbor 10.127.1.6 peer-group VTEP
no neighbor 10.127.1.7 peer-group VTEP
# Add only RR peers
neighbor 10.127.1.3 peer-group VTEP
neighbor 10.127.1.4 peer-group VTEP
!
address-family l2vpn evpn
neighbor VTEP activate
neighbor VTEP route-map MAP_VTEP_IN in
neighbor VTEP route-map MAP_VTEP_OUT out
advertise-all-vni
exit-address-family
exit
!
router bgp 65100 vrf vrf_access
bgp router-id 10.127.1.5
no bgp hard-administrative-reset
no bgp graceful-restart notification
!
address-family ipv4 unicast
redistribute connected
exit-address-family
!
address-family ipv6 unicast
redistribute connected
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
route-target import 65100:15000
exit-address-family
exit
!
router bgp 65100 vrf vrf_dmz
bgp router-id 10.127.1.5
no bgp hard-administrative-reset
no bgp graceful-restart notification
!
address-family ipv4 unicast
redistribute connected
exit-address-family
!
address-family ipv6 unicast
redistribute connected
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
router bgp 65100 vrf vrf_mgmt
bgp router-id 10.127.1.5
no bgp hard-administrative-reset
no bgp graceful-restart notification
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
exit
router ospf
ospf router-id 10.127.1.5
exit
!
interface dummy_PVE
ip ospf area 0.0.0.0
ip ospf passive
exit
!
interface vlan3000
ip ospf area 0.0.0.0
exit
!
access-list pve_ospf_PVE_ips permit 10.127.1.0/24
!
route-map pve_ospf permit 100
match ip address pve_ospf_PVE_ips
set src 10.127.1.5
exit
!
ip protocol ospf route-map pve_ospf
!
!
line vty
!root@pve1:~#
 
  • Like
Reactions: fiona
I’ll look at moving some VMs around tonight see if I can reproduce again. Btw we do use queues for all our VMs but it’s tied to the # of cores. These are 4vcpu and 4 queues each NIC.
 
I'm still not able to reproduce, I had several VMs fail to migrate but now it just works, even with VMs created pre 9.0. If someone else can repro and post logs that would help. I kind of expect to have to migrate a customer VM and it will shutdown on me randomly, not good.
 
Suddenly, today I can live migrate old and new VMs normally between two of my three nodes. Any live migration to or from the third node still gets stopped. I will reboot it later and try again.
 
Suddenly, today I can live migrate old and new VMs normally between two of my three nodes. Any live migration to or from the third node still gets stopped. I will reboot it later and try again.
That's sort of what happened to me, but I have not rebooted any nodes. I don't think it's fixed, we just aren't triggering the bug.
 
I have 5 nodes. pve was recently installed on 2 of the nodes.

I can live migrate to the 3 older nodes.

The 2 newer nodes get this warning [ sometimes , sometimes no warning ] : conntrack state migration not supported or disabled, active connections might get dropped

Live migration fails 100% of the time to the two new nodes.

let me know if debugging data is needed.

here is an example of a migration where I did not get the warning and fail:
Code:
task started by HA resource agent
2025-08-21 15:42:59 conntrack state migration not supported or disabled, active connections might get dropped
2025-08-21 15:42:59 starting migration of VM 7211 to node 'pve-15' (10.1.10.15)
2025-08-21 15:42:59 starting VM 7211 on remote node 'pve-15'
2025-08-21 15:43:00 start remote tunnel
2025-08-21 15:43:01 ssh tunnel ver 1
2025-08-21 15:43:01 starting online/live migration on tcp:10.1.10.15:60000
2025-08-21 15:43:01 set migration capabilities
2025-08-21 15:43:01 migration downtime limit: 100 ms
2025-08-21 15:43:01 migration cachesize: 512.0 MiB
2025-08-21 15:43:01 set migration parameters
2025-08-21 15:43:01 start migrate command to tcp:10.1.10.15:60000
2025-08-21 15:43:02 migration active, transferred 1.8 GiB of 4.0 GiB VM-state, 1.9 GiB/s
2025-08-21 15:43:03 average migration speed: 2.0 GiB/s - downtime 124 ms
2025-08-21 15:43:03 migration completed, transferred 3.7 GiB VM-state
2025-08-21 15:43:03 migration status: completed
2025-08-21 15:43:04 ERROR: tunnel replied 'ERR: resume failed - VM 7211 qmp command 'query-status' failed - client closed connection' to command 'resume 7211'
VM quit/powerdown failed - terminating now with SIGTERM
2025-08-21 15:43:11 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems
 
here is a migration that worked to an old node:
Code:
task started by HA resource agent
2025-08-21 15:40:39 conntrack state migration not supported or disabled, active connections might get dropped
2025-08-21 15:40:39 starting migration of VM 107 to node 'pve2' (10.10.0.2)
2025-08-21 15:40:39 starting VM 107 on remote node 'pve2'
2025-08-21 15:40:41 start remote tunnel
2025-08-21 15:40:42 ssh tunnel ver 1
2025-08-21 15:40:42 starting online/live migration on tcp:10.10.0.2:60000
2025-08-21 15:40:42 set migration capabilities
2025-08-21 15:40:42 migration downtime limit: 100 ms
2025-08-21 15:40:42 migration cachesize: 2.0 GiB
2025-08-21 15:40:42 set migration parameters
2025-08-21 15:40:42 start migrate command to tcp:10.10.0.2:60000
2025-08-21 15:40:43 migration active, transferred 107.4 MiB of 16.2 GiB VM-state, 122.7 MiB/s
2025-08-21 15:40:44 migration active, transferred 220.0 MiB of 16.2 GiB VM-state, 113.1 MiB/s
2025-08-21 15:40:45 migration active, transferred 332.3 MiB of 16.2 GiB VM-state, 187.6 MiB/s
2025-08-21 15:40:46 migration active, transferred 444.9 MiB of 16.2 GiB VM-state, 113.1 MiB/s
2025-08-21 15:40:47 migration active, transferred 556.7 MiB of 16.2 GiB VM-state, 113.1 MiB/s
2025-08-21 15:40:48 migration active, transferred 669.5 MiB of 16.2 GiB VM-state, 145.7 MiB/s
2025-08-21 15:40:49 migration active, transferred 782.0 MiB of 16.2 GiB VM-state, 113.1 MiB/s
2025-08-21 15:40:50 migration active, transferred 894.7 MiB of 16.2 GiB VM-state, 342.9 MiB/s
2025-08-21 15:40:51 migration active, transferred 1006.8 MiB of 16.2 GiB VM-state, 155.4 MiB/s
2025-08-21 15:40:52 migration active, transferred 1.1 GiB of 16.2 GiB VM-state, 115.5 MiB/s
2025-08-21 15:40:53 migration active, transferred 1.2 GiB of 16.2 GiB VM-state, 160.3 MiB/s
..
2025-08-21 15:42:03 migration active, transferred 8.9 GiB of 16.2 GiB VM-state, 110.7 MiB/s
2025-08-21 15:42:04 migration active, transferred 9.0 GiB of 16.2 GiB VM-state, 208.4 MiB/s
2025-08-21 15:42:05 average migration speed: 200.2 MiB/s - downtime 7 ms
2025-08-21 15:42:05 migration completed, transferred 9.1 GiB VM-state
2025-08-21 15:42:05 migration status: completed
2025-08-21 15:42:07 migration finished successfully (duration 00:01:28)
TASK OK
 
I have 5 nodes. pve was recently installed on 2 of the nodes.

I can live migrate to the 3 older nodes.

The 2 newer nodes get this warning [ sometimes , sometimes no warning ] : conntrack state migration not supported or disabled, active connections might get dropped

Live migration fails 100% of the time to the two new nodes.

let me know if debugging data is needed.

here is an example of a migration where I did not get the warning and fail:
Code:
task started by HA resource agent
2025-08-21 15:42:59 conntrack state migration not supported or disabled, active connections might get dropped
2025-08-21 15:42:59 starting migration of VM 7211 to node 'pve-15' (10.1.10.15)
2025-08-21 15:42:59 starting VM 7211 on remote node 'pve-15'
2025-08-21 15:43:00 start remote tunnel
2025-08-21 15:43:01 ssh tunnel ver 1
2025-08-21 15:43:01 starting online/live migration on tcp:10.1.10.15:60000
2025-08-21 15:43:01 set migration capabilities
2025-08-21 15:43:01 migration downtime limit: 100 ms
2025-08-21 15:43:01 migration cachesize: 512.0 MiB
2025-08-21 15:43:01 set migration parameters
2025-08-21 15:43:01 start migrate command to tcp:10.1.10.15:60000
2025-08-21 15:43:02 migration active, transferred 1.8 GiB of 4.0 GiB VM-state, 1.9 GiB/s
2025-08-21 15:43:03 average migration speed: 2.0 GiB/s - downtime 124 ms
2025-08-21 15:43:03 migration completed, transferred 3.7 GiB VM-state
2025-08-21 15:43:03 migration status: completed
2025-08-21 15:43:04 ERROR: tunnel replied 'ERR: resume failed - VM 7211 qmp command 'query-status' failed - client closed connection' to command 'resume 7211'
VM quit/powerdown failed - terminating now with SIGTERM
2025-08-21 15:43:11 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems

this is the same issue I've experienced, just no idea what fixed it. Maybe turning nftables on/off again on each node, or running the sdn config again.