Hi, this just happened here. Once. I tried to reproduce, but I couldn't.

Here is the relevant information:

Node A (Origin)
Code:
root@tcn-05-lon-vh22:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
root@tcn-05-lon-vh22:~#


Node B (Destination)
Code:
root@tcn-05-lon-vh23:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
root@tcn-05-lon-vh23:~#

Task Log:
Code:
2020-07-07 12:36:15 use dedicated network address for sending migration traffic (192.168.254.23)
2020-07-07 12:36:15 starting migration of VM 115 to node 'tcn-05-lon-vh23' (192.168.254.23)
2020-07-07 12:36:16 starting VM 115 on remote node 'tcn-05-lon-vh23'
2020-07-07 12:36:18 start remote tunnel
2020-07-07 12:36:19 ssh tunnel ver 1
2020-07-07 12:36:19 starting online/live migration on tcp:192.168.254.23:60000
2020-07-07 12:36:19 set migration_caps
2020-07-07 12:36:19 migration speed limit: 8589934592 B/s
2020-07-07 12:36:19 migration downtime limit: 100 ms
2020-07-07 12:36:19 migration cachesize: 1073741824 B
2020-07-07 12:36:19 set migration parameters
2020-07-07 12:36:19 start migrate command to tcp:192.168.254.23:60000
2020-07-07 12:36:20 migration status: active (transferred 860171064, remaining 7604506624), total 8601477120)
2020-07-07 12:36:20 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:21 migration status: active (transferred 1490839321, remaining 5166247936), total 8601477120)
2020-07-07 12:36:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:22 migration status: active (transferred 2419432698, remaining 4145524736), total 8601477120)
2020-07-07 12:36:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:23 migration status: active (transferred 3341693116, remaining 3168620544), total 8601477120)
2020-07-07 12:36:23 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:24 migration status: active (transferred 4250583963, remaining 2085208064), total 8601477120)
2020-07-07 12:36:24 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:25 migration status: active (transferred 5133897493, remaining 1171369984), total 8601477120)
2020-07-07 12:36:25 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 5993867428, remaining 122421248), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6087660295, remaining 28807168), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6132640532, remaining 158900224), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 6066 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6144808989, remaining 146718720), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 9031 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6152910364, remaining 138604544), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 11005 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6159620448, remaining 131891200), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 12640 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6164504584, remaining 126853120), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 13830 overflow 0
2020-07-07 12:36:26 migration status: active (transferred 6167890491, remaining 123428864), total 8601477120)
2020-07-07 12:36:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 14655 overflow 0
2020-07-07 12:36:27 migration status: active (transferred 6172339540, remaining 118849536), total 8601477120)
2020-07-07 12:36:27 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 15740 overflow 0
2020-07-07 12:36:27 migration status: active (transferred 6177980550, remaining 112267264), total 8601477120)
2020-07-07 12:36:27 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 17113 overflow 0
2020-07-07 12:36:27 migration status: active (transferred 6216921470, remaining 72413184), total 8601477120)
2020-07-07 12:36:27 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 26603 overflow 0
2020-07-07 12:36:27 migration speed: 1024.00 MB/s - downtime 40 ms
2020-07-07 12:36:27 migration status: completed
2020-07-07 12:36:27 ERROR: tunnel replied 'ERR: resume failed - VM 115 qmp command 'query-status' failed - client closed connection' to command 'resume 115'
2020-07-07 12:36:30 ERROR: migration finished with problems (duration 00:00:15)
TASK ERROR: migration problems

VM config:
Code:
root@tcn-05-lon-vh23:~# cat /etc/pve/qemu-server/115.conf
#start_at_boot=1
balloon: 0
bios: ovmf
boot: dcn
bootdisk: virtio0
cores: 8
efidisk0: TN01:115/vm-115-disk-2.raw,size=128K
ide2: SN4-ISOS:iso/virtio-win-0.1.164.iso,media=cdrom,size=362130K
memory: 8192
name: ASN-05-LON-VPS03
net0: e1000=16:1A:60:D0:C2:AC,bridge=vmbr1,tag=1617
net1: e1000=DA:6A:E3:F9:39:A6,bridge=vmbr1,tag=1549
numa: 0
onboot: 1
ostype: win10
sata1: TN01:115/vm-115-disk-1.raw,size=2T
scsihw: virtio-scsi-pci
smbios1: uuid=2c68ba39-f37b-45c7-ade0-2f73724b32d0
sockets: 1
startup: up=5
vga: virtio
virtio0: TN01:115/vm-115-disk-0.raw,size=120G
vmgenid: dbb06674-47de-44ea-80d0-f1329ea57c45
root@tcn-05-lon-vh23:~#
 
I'm on ProxMox 6.2-6 now and my problem with migrations has gone away.
I'm not sure why I had a Bandwidth limit on Migrations but I removed it around the same time as the upgrade, not sure if that helped.
 
We also randomly experience similar issues:

Code:
2020-12-02 12:30:55 migration xbzrle cachesize: 1073741824 transferred 48252 pages 256 cachemiss 45394 overflow 12
2020-12-02 12:30:55 migration speed: 101.14 MB/s - downtime 59 ms
2020-12-02 12:30:55 migration status: completed
2020-12-02 12:30:56 ERROR: tunnel replied 'ERR: resume failed - VM 248 qmp command 'query-status' failed - client closed connection' to command 'resume 248'
2020-12-02 12:30:59 ERROR: migration finished with problems (duration 00:01:28)
TASK ERROR: migration problems

Not really reproducible and migration works most of the time. No nested Virtualization here. Storage is Multipathed iSCSI.
 
I've experienced the same ugly issue today. The "workaround" of using offline migration first and booting on the other node works and VM can be migrated online since then. But it's really sad. It afects all VMs on 1 of my 7 nodes. All nodes are fully upgraded.
 
Hello,

Please check have you created a "Linux Bridge". In my case, this fixed my problem.

1. When trying to migrate KVM/QUEMU I get this error:
Code:
2020-12-02 12:30:56 ERROR: tunnel replied 'ERR: resume failed - VM 248 qmp command 'query-status' failed - client closed connection' to command 'resume 248'
2020-12-02 12:30:59 ERROR: migration finished with problems ....

2. I successfully migrate LXC/Container but it fails to Start. So, the error that I am getting is:
Code:
run_buffer: 314 Script exited with status 2
lxc_create_network_priv: 3068 No such device - Failed to create a network device
lxc_spawn: 1786 Failed to create the network
__lxc_start: 1999 Failed to spawn container "103"
TASK ERROR: startup for container '103' failed

Obviously, it seems like a networking opportunity ...

The catch here I think it's because we guys have successfully joined a node to the cluster without setting up "Linux Bridge".
 
Last edited:
I have the same error with an archlinux KVM guest. Resuming after livemigration fails. I narrowed it down. It is working if you use the LTS kernel in the archlinux guest. Which options do I have to set to get the standard kernel working on resume?
 
I migrate VM from 6.8.8-2 to 6.8.12-4. The problem remains. Is there any information on how to solve the problem?
 
And I have such an experience. Proxmox 8 is already installed. and the problem seems to be when using the Host processor type. This type was chosen in order to use more than 512 GB of RAM on a VM. Otherwise, YOU would not have started.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!