Hi all,
I am experiencing an online migration failure with a VM on Proxmox VE 9.1 (running qemu-server: 9.1.6 and pve-qemu-kvm: 10.1.2-3).
During the final phase of the live migration, the QEMU process crashes completely on the source node, causing the migration task to abort with "client closed connection" and "VM not running".
Here is the migration task log:
Looking at the system journal at the exact millisecond of the failure (07:50:29), I found that QEMU explicitly crashed due to a failed assertion inside job.c during an AioContext switch, likely related to the active drive mirror (mirror-virtio0):
qm congfig of this vm
Version Information:
This crash caused the production VPS (VM 2427) to power off abruptly, causing a sudden downtime and severely affecting our users.
I am experiencing an online migration failure with a VM on Proxmox VE 9.1 (running qemu-server: 9.1.6 and pve-qemu-kvm: 10.1.2-3).
During the final phase of the live migration, the QEMU process crashes completely on the source node, causing the migration task to abort with "client closed connection" and "VM not running".
Here is the migration task log:
Bash:
2026-06-29 07:50:28 migration active, transferred 15.9 GiB of 16.0 GiB VM-state, 383.6 MiB/s
2026-06-29 07:50:29 migration active, transferred 16.3 GiB of 16.0 GiB VM-state, 369.0 MiB/s
2026-06-29 07:50:29 xbzrle: send updates to 2149 pages in 1.4 MiB encoded memory, cache-miss 94.22%, overflow 18
query migrate failed: VM 2427 qmp command 'query-migrate' failed - client closed connection
2026-06-29 07:50:35 query migrate failed: VM 2427 qmp command 'query-migrate' failed - client closed connection
query migrate failed: VM 2427 not running
2026-06-29 07:50:36 query migrate failed: VM 2427 not running
...
2026-06-29 07:50:40 ERROR: online migrate failure - too many query migrate failures - aborting
2026-06-29 07:50:40 aborting phase 2 - cleanup resources
2026-06-29 07:50:40 migrate_cancel
2026-06-29 07:50:40 migrate_cancel error: VM 2427 not running
2026-06-29 07:50:40 ERROR: query-status error: VM 2427 not running
mirror-virtio0: Cancelling block job
2026-06-29 07:50:40 ERROR: VM 2427 not running
2026-06-29 07:50:44 ERROR: migration finished with problems (duration 00:05:22)
TASK ERROR: migration problems
Looking at the system journal at the exact millisecond of the failure (07:50:29), I found that QEMU explicitly crashed due to a failed assertion inside job.c during an AioContext switch, likely related to the active drive mirror (mirror-virtio0):
Bash:
Jun 29 07:50:29 vnpttt-nvme-7003-s106 QEMU[2766751]: kvm: ../job.c:401: job_set_aio_context: Assertion `job->paused || job_is_completed_locked(job)' failed.
Jun 29 07:50:34 vnpttt-nvme-7003-s106 kernel: fwbr2427i0: port 2(tap2427i0) entered disabled state
Jun 29 07:50:35 vnpttt-nvme-7003-s106 pvedaemon[2694735]: VM 2427 qmp command failed - VM 2427 qmp command 'query-migrate' failed - client closed connection
qm congfig of this vm
Bash:
cat /etc/pve/qemu-server/2427.conf
agent: 1
boot: c
bootdisk: virtio0
cipassword: $5$VzXtLX78$IuF2A..Yv6j.cCnqJhWutCF7X/
ciuser: root
cores: 4
cpu: host
cpulimit: 4
cpuunits: 1024
ide2: none,media=cdrom
ipconfig0: ip=13.9.227.148/24,gw=13.9.227.1
kvm: 1
memory: 16384
name: 13.9.227.148
net0: virtio=6A:78:B9:81:D2:C0,bridge=vmbr0,firewall=1,queues=4,rate=25,tag=205
numa: 1
onboot: 1
ostype: l26
searchdomain: vi.vn
serial0: socket
smbios1: uuid=a72a318f-7d3f-489b-a127-181a9d528d1c
sockets: 1
tags: Linux
vga: std
virtio0: vms:vm-2427-disk-0,discard=on,format=raw,iops_rd=95000,iops_wr=35000,iothread=1,size=100G
Version Information:
- qemu-server: 9.1.6
- pve-qemu-kvm: 10.1.2-3
This crash caused the production VPS (VM 2427) to power off abruptly, causing a sudden downtime and severely affecting our users.