Hello, after updating my proxmox cluster to latest version live migration became less stable and crashes randomly, here are logs of it happening while migrating between two servers with identical Intel CPU:
Proxmox log:
Journal on source:
Journal on target:
Migrated Virtual Machine is running AlmaLinux 8 with hotpluggable memory, migration is initiated normally via GUI (bulk action). While migrating a large amount of VMs crash happens in about 10% of cases, other VMs migrate normally.
Is it a known issue with this proxmox version? Should I consider downgrading, using older kernel or qemu? I can provide more logs if that helps resolve this issue.
Proxmox log:
Code:
mirror-scsi0: transferred 56.0 GiB of 56.0 GiB (100.00%) in 6m 11s, ready
all 'mirror' jobs are ready
2025-12-13 10:24:43 switching mirror jobs to actively synced mode
mirror-scsi0: switching to actively synced mode
mirror-scsi0: successfully switched to actively synced mode
2025-12-13 10:24:44 starting online/live migration on unix:/run/qemu-server/21069.migrate
2025-12-13 10:24:44 set migration capabilities
2025-12-13 10:24:44 migration downtime limit: 100 ms
2025-12-13 10:24:44 migration cachesize: 512.0 MiB
2025-12-13 10:24:44 set migration parameters
2025-12-13 10:24:44 start migrate command to unix:/run/qemu-server/21069.migrate
2025-12-13 10:24:45 migration active, transferred 112.9 MiB of 4.0 GiB VM-state, 118.1 MiB/s
2025-12-13 10:24:46 migration active, transferred 225.2 MiB of 4.0 GiB VM-state, 137.5 MiB/s
2025-12-13 10:24:47 migration active, transferred 336.9 MiB of 4.0 GiB VM-state, 122.8 MiB/s
2025-12-13 10:24:48 migration active, transferred 437.7 MiB of 4.0 GiB VM-state, 120.4 MiB/s
2025-12-13 10:24:49 migration active, transferred 549.8 MiB of 4.0 GiB VM-state, 114.8 MiB/s
2025-12-13 10:24:50 migration active, transferred 662.3 MiB of 4.0 GiB VM-state, 113.7 MiB/s
2025-12-13 10:24:51 migration active, transferred 773.5 MiB of 4.0 GiB VM-state, 132.0 MiB/s
2025-12-13 10:24:52 migration active, transferred 884.8 MiB of 4.0 GiB VM-state, 111.2 MiB/s
2025-12-13 10:24:53 migration active, transferred 996.4 MiB of 4.0 GiB VM-state, 109.9 MiB/s
2025-12-13 10:24:54 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 97.3 MiB/s
2025-12-13 10:24:55 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 118.4 MiB/s
2025-12-13 10:24:56 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 117.0 MiB/s
2025-12-13 10:24:57 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 118.2 MiB/s
2025-12-13 10:24:58 migration active, transferred 1.5 GiB of 4.0 GiB VM-state, 122.7 MiB/s
2025-12-13 10:24:59 migration active, transferred 1.6 GiB of 4.0 GiB VM-state, 131.9 MiB/s
2025-12-13 10:25:00 migration active, transferred 1.7 GiB of 4.0 GiB VM-state, 130.6 MiB/s
2025-12-13 10:25:01 migration active, transferred 1.8 GiB of 4.0 GiB VM-state, 112.0 MiB/s
2025-12-13 10:25:02 migration active, transferred 1.9 GiB of 4.0 GiB VM-state, 120.6 MiB/s
query migrate failed: VM 21069 not running
2025-12-13 10:25:03 query migrate failed: VM 21069 not running
query migrate failed: VM 21069 not running
2025-12-13 10:25:05 query migrate failed: VM 21069 not running
query migrate failed: VM 21069 not running
2025-12-13 10:25:07 query migrate failed: VM 21069 not running
query migrate failed: VM 21069 not running
2025-12-13 10:25:09 query migrate failed: VM 21069 not running
query migrate failed: VM 21069 not running
2025-12-13 10:25:11 query migrate failed: VM 21069 not running
query migrate failed: VM 21069 not running
2025-12-13 10:25:13 query migrate failed: VM 21069 not running
2025-12-13 10:25:13 ERROR: online migrate failure - too many query migrate failures - aborting
2025-12-13 10:25:13 aborting phase 2 - cleanup resources
2025-12-13 10:25:13 migrate_cancel
2025-12-13 10:25:13 migrate_cancel error: VM 21069 not running
2025-12-13 10:25:13 ERROR: query-status error: VM 21069 not running
mirror-scsi0: Cancelling block job
2025-12-13 10:25:13 ERROR: VM 21069 not running
2025-12-13 10:25:17 ERROR: migration finished with problems (duration 00:06:52)
TASK ERROR: migration problems
Journal on source:
Code:
Dec 13 10:25:02 HOST QEMU[4706]: kvm: ../util/bitmap.c:167: bitmap_set: Assertion `start >= 0 && nr >= 0' failed.
Dec 13 10:25:03 HOST pvedaemon[1576635]: VM 21069 qmp command failed - VM 21069 not running
Dec 13 10:25:03 HOST pvedaemon[1576635]: query migrate failed: VM 21069 not running
Dec 13 10:25:03 HOST kernel: vmbr0: port 17(tap21069i0) entered disabled state
Dec 13 10:25:03 HOST kernel: tap21069i0 (unregistering): left allmulticast mode
Dec 13 10:25:03 HOST kernel: vmbr0: port 17(tap21069i0) entered disabled state
Dec 13 10:25:03 HOST kernel: zd48: p1
Dec 13 10:25:03 HOST kernel: vmbr1: port 17(tap21069i1) entered disabled state
Dec 13 10:25:03 HOST kernel: tap21069i1 (unregistering): left allmulticast mode
Dec 13 10:25:03 HOST kernel: vmbr1: port 17(tap21069i1) entered disabled state
Journal on target:
Code:
Dec 13 10:25:02 HOST2 QEMU[1595041]: kvm: error while loading state section id 1(ram)
Dec 13 10:25:02 HOST2 QEMU[1595041]: kvm: load of migration failed: Input/output error
Migrated Virtual Machine is running AlmaLinux 8 with hotpluggable memory, migration is initiated normally via GUI (bulk action). While migrating a large amount of VMs crash happens in about 10% of cases, other VMs migrate normally.
Is it a known issue with this proxmox version? Should I consider downgrading, using older kernel or qemu? I can provide more logs if that helps resolve this issue.