[SOLVED] problem with the last update and a live migration

sidereus

Member
Jul 25, 2019
45
8
13
55
After tonight's update there are many live migrations were failed, what has never happened before. I did reboot all nodes after installing updates, but it didn't help.
The installation log:
Code:
Preparing to unpack .../0-libgstreamer-plugins-base1.0-0_1.14.4-2+deb10u1_amd64.deb ...
Unpacking libgstreamer-plugins-base1.0-0:amd64 (1.14.4-2+deb10u1) over (1.14.4-2) ...
Preparing to unpack .../1-proxmox-backup-client_1.1.3-1_amd64.deb ...
Unpacking proxmox-backup-client (1.1.3-1) over (1.1.1-1) ...
Preparing to unpack .../2-proxmox-widget-toolkit_2.5-2_all.deb ...
Unpacking proxmox-widget-toolkit (2.5-2) over (2.5-1) ...
Preparing to unpack .../3-pve-container_3.3-5_all.deb ...
Unpacking pve-container (3.3-5) over (3.3-4) ...
Preparing to unpack .../4-pve-manager_6.3-7_amd64.deb ...
Unpacking pve-manager (6.3-7) over (6.3-6) ...
Preparing to unpack .../5-pve-qemu-kvm_5.2.0-6_amd64.deb ...
Unpacking pve-qemu-kvm (5.2.0-6) over (5.2.0-5) ...
Setting up pve-container (3.3-5) ...
Setting up proxmox-widget-toolkit (2.5-2) ...
Setting up pve-qemu-kvm (5.2.0-6) ...
Setting up libgstreamer-plugins-base1.0-0:amd64 (1.14.4-2+deb10u1) ...
Setting up pve-manager (6.3-7) ...
Setting up proxmox-backup-client (1.1.3-1) ...
Processing triggers for mime-support (3.62) ...
Processing triggers for libc-bin (2.28-10) ...
Processing triggers for systemd (241-7~deb10u7) ...
Processing triggers for man-db (2.8.5-2) ...
Processing triggers for pve-ha-manager (3.1-1) ...
The log from the one of failed live migrations:
Code:
2021-04-27 02:46:56 use dedicated network address for sending migration traffic (192.168.122.5)
2021-04-27 02:46:56 starting migration of VM 301 to node 'asr5' (192.168.122.5)
2021-04-27 02:46:57 starting VM 301 on remote node 'asr5'
2021-04-27 02:46:58 start remote tunnel
2021-04-27 02:46:59 ssh tunnel ver 1
2021-04-27 02:46:59 starting online/live migration on tcp:192.168.122.5:60000
2021-04-27 02:46:59 set migration_caps
2021-04-27 02:46:59 migration speed limit: 8589934592 B/s
2021-04-27 02:46:59 migration downtime limit: 100 ms
2021-04-27 02:46:59 migration cachesize: 2147483648 B
2021-04-27 02:46:59 set migration parameters
2021-04-27 02:46:59 start migrate command to tcp:192.168.122.5:60000
2021-04-27 02:47:00 migration status: active (transferred 663433381, remaining 13931319296), total 17197539328)
2021-04-27 02:47:00 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-27 02:47:01 migration status: active (transferred 672836408, remaining 9651982336), total 17197539328)
2021-04-27 02:47:01 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-27 02:47:02 migration status: active (transferred 1393323902, remaining 7556857856), total 17197539328)
2021-04-27 02:47:02 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-27 02:47:03 migration status: active (transferred 1924478457, remaining 4483584000), total 17197539328)
2021-04-27 02:47:03 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-27 02:47:04 migration status: active (transferred 2142633219, remaining 104054784), total 17197539328)
2021-04-27 02:47:04 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 9088 overflow 0
2021-04-27 02:47:04 migration speed: 3276.80 MB/s - downtime 38 ms
2021-04-27 02:47:04 migration status: completed
2021-04-27 02:47:04 ERROR: tunnel replied 'ERR: resume failed - VM 301 not running' to command 'resume 301'
2021-04-27 02:47:12 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems
 
Replying to myself. Problem was not related to the last proxmox update. I have added a new node to cluster, but forgot to turn on a nested virtualizaton there according to the guide. Without this option migrations of windows 2012 server and ubuntu 21.04 guests has failed to there. After turning on a nested virtualization at the all nodes problem gone, migrated successfully. Here the one of these vm config:
Code:
agent: 1,fstrim_cloned_disks=1
balloon: 1024
boot: order=scsi0
cores: 2
cpu: host
hotplug: disk,network,usb
machine: pc-q35-5.2
memory: 16384
name: i-1-win
net0: virtio=72:62:5E:5A:9A:58,bridge=vmbr0,firewall=1,tag=128
numa: 1
ostype: win8
scsi0: ceph_pool:vm-301-disk-0,cache=writeback,discard=on,size=300G
scsihw: virtio-scsi-pci
smbios1: uuid=201008dd-fb18-40c8-aeb4-4209f8dff003
sockets: 2
vmgenid: 273cf21d-2273-4582-ac16-3b163d19e273
 
Thanks for taking the time to share the issue and its solution! - This will certainly help others who also run into the issue while upgrading a cluster.

If possible it would be great if you edit such threads (the 'Edit Thread' button on top of your first post) and select the 'SOLVED' prefix - for the next time - this time I'll mark it as 'SOLVED'

Thanks again!