Testing a new PVE9 cluster with high-speed (2x400G LACP) network connections has shown that live migration of large VMs will occasionally fail with,
This was initially thought to be related to the PBS network problem with recent kernels as the frequency of this problem does appear to depend on the kernel version with no successful migrations obtained with 6.14.0-{1,2}-pve and one-out-of-a-few failure rate with older or newer kernels. However, as @fiona indicated in https://forum.proxmox.com/threads/s...o-pve-9-1-1-and-pbs-4-0-20.176444/post-823442 this **QEMU** assertion failure should not be triggered by those kernel TCP receive buffer problems. Here is additional information from n an example VM (1TB memory/2TB local storage) failure between host hov1 and hov2. Note, In all of the tests I have run so far there has never been a failure while migrating the 2TB local storage, just during the 1TB memory migration, so I believe the network is stable.
VM configuration:
Failed migration task log and journalctl logs attached
journalctl logs from both nodes around the time of the failure
QEMU[949247]: kvm: ../util/bitmap.c:167: bitmap_set: Assertion `start >= 0 && nr >= 0' failed.This was initially thought to be related to the PBS network problem with recent kernels as the frequency of this problem does appear to depend on the kernel version with no successful migrations obtained with 6.14.0-{1,2}-pve and one-out-of-a-few failure rate with older or newer kernels. However, as @fiona indicated in https://forum.proxmox.com/threads/s...o-pve-9-1-1-and-pbs-4-0-20.176444/post-823442 this **QEMU** assertion failure should not be triggered by those kernel TCP receive buffer problems. Here is additional information from n an example VM (1TB memory/2TB local storage) failure between host hov1 and hov2. Note, In all of the tests I have run so far there has never been a failure while migrating the 2TB local storage, just during the 1TB memory migration, so I believe the network is stable.
VM configuration:
Code:
root@hov1:~# qm config 102
allow-ksm: 0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 96
cpu: host
hotplug: disk,network,usb,cpu
ide2: none,media=cdrom
memory: 1048576
meta: creation-qemu=10.0.2,ctime=1761354940
name: node2412.cluster.ldas.cit
net0: virtio=BC:24:11:D3:10:A8,bridge=vmbr0,queues=32
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: local-zfs:vm-102-disk-0,format=raw,iothread=1,size=2T
scsihw: virtio-scsi-single
smbios1: uuid=20721900-0449-43a2-aec7-41c44ce7a68d
sockets: 1
vcpus: 96
vmgenid: 6b1b9d99-097c-4b3f-a290-f7d79b89160e
Code:
root@hov1:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.11-2-test-pve)
pve-manager: 9.1.2 (running version: 9.1.2/9d436f37a0ac4172)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.11-2-test-pve: 6.17.11-2
proxmox-kernel-6.17.11-1-test-pve: 6.17.11-1
proxmox-kernel-6.17.4-1-pve-signed: 6.17.4-1
proxmox-kernel-6.17: 6.17.4-1
proxmox-kernel-6.17.2-2-pve-signed: 6.17.2-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-4-pve: 6.14.11-4
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: not correctly installed
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.0
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.0-1
proxmox-backup-file-restore: 4.1.0-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.5
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.1
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
Code:
root@hov2:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.11-2-test-pve)
pve-manager: 9.1.2 (running version: 9.1.2/9d436f37a0ac4172)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.11-2-test-pve: 6.17.11-2
proxmox-kernel-6.17.11-1-test-pve: 6.17.11-1
proxmox-kernel-6.17.4-1-pve-signed: 6.17.4-1
proxmox-kernel-6.17: 6.17.4-1
proxmox-kernel-6.17.2-2-pve-signed: 6.17.2-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: not correctly installed
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.0
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.0-1
proxmox-backup-file-restore: 4.1.0-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.5
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.1
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
Failed migration task log and journalctl logs attached
journalctl logs from both nodes around the time of the failure