VM migration crashes on Proxmox 9.1: QEMU assertion failed job_set_aio_context: Assertion job->paused || job_is_completed_locked(job) failed

DolPhinSon

New Member
Jun 13, 2025
10
0
1
Hi all,


I am experiencing an online migration failure with a VM on Proxmox VE 9.1 (running qemu-server: 9.1.6 and pve-qemu-kvm: 10.1.2-3).

During the final phase of the live migration, the QEMU process crashes completely on the source node, causing the migration task to abort with "client closed connection" and "VM not running".

Here is the migration task log:

Bash:
2026-06-29 07:50:28 migration active, transferred 15.9 GiB of 16.0 GiB VM-state, 383.6 MiB/s
2026-06-29 07:50:29 migration active, transferred 16.3 GiB of 16.0 GiB VM-state, 369.0 MiB/s
2026-06-29 07:50:29 xbzrle: send updates to 2149 pages in 1.4 MiB encoded memory, cache-miss 94.22%, overflow 18
query migrate failed: VM 2427 qmp command 'query-migrate' failed - client closed connection

2026-06-29 07:50:35 query migrate failed: VM 2427 qmp command 'query-migrate' failed - client closed connection
query migrate failed: VM 2427 not running

2026-06-29 07:50:36 query migrate failed: VM 2427 not running
...
2026-06-29 07:50:40 ERROR: online migrate failure - too many query migrate failures - aborting
2026-06-29 07:50:40 aborting phase 2 - cleanup resources
2026-06-29 07:50:40 migrate_cancel
2026-06-29 07:50:40 migrate_cancel error: VM 2427 not running
2026-06-29 07:50:40 ERROR: query-status error: VM 2427 not running
mirror-virtio0: Cancelling block job
2026-06-29 07:50:40 ERROR: VM 2427 not running
2026-06-29 07:50:44 ERROR: migration finished with problems (duration 00:05:22)
TASK ERROR: migration problems


Looking at the system journal at the exact millisecond of the failure (07:50:29), I found that QEMU explicitly crashed due to a failed assertion inside job.c during an AioContext switch, likely related to the active drive mirror (mirror-virtio0):

Bash:
Jun 29 07:50:29 vnpttt-nvme-7003-s106 QEMU[2766751]: kvm: ../job.c:401: job_set_aio_context: Assertion `job->paused || job_is_completed_locked(job)' failed.
Jun 29 07:50:34 vnpttt-nvme-7003-s106 kernel: fwbr2427i0: port 2(tap2427i0) entered disabled state
Jun 29 07:50:35 vnpttt-nvme-7003-s106 pvedaemon[2694735]: VM 2427 qmp command failed - VM 2427 qmp command 'query-migrate' failed - client closed connection


qm congfig of this vm

Bash:
cat /etc/pve/qemu-server/2427.conf

agent: 1

boot: c

bootdisk: virtio0

cipassword: $5$VzXtLX78$IuF2A..Yv6j.cCnqJhWutCF7X/

ciuser: root

cores: 4

cpu: host

cpulimit: 4

cpuunits: 1024

ide2: none,media=cdrom

ipconfig0: ip=13.9.227.148/24,gw=13.9.227.1

kvm: 1

memory: 16384

name: 13.9.227.148

net0: virtio=6A:78:B9:81:D2:C0,bridge=vmbr0,firewall=1,queues=4,rate=25,tag=205

numa: 1

onboot: 1

ostype: l26

searchdomain: vi.vn

serial0: socket

smbios1: uuid=a72a318f-7d3f-489b-a127-181a9d528d1c

sockets: 1

tags: Linux

vga: std

virtio0: vms:vm-2427-disk-0,discard=on,format=raw,iops_rd=95000,iops_wr=35000,iothread=1,size=100G


Version Information:

  • qemu-server: 9.1.6
  • pve-qemu-kvm: 10.1.2-3
It seems like a block job (drive mirror) state tracking bug during the migration handoff/completion phase when changing AIO contexts. Has anyone encountered this specific QEMU assertion failure on PVE 9.x/QEMU 9.x? Is there a known patch or workaround for this?

This crash caused the production VPS (VM 2427) to power off abruptly, causing a sudden downtime and severely affecting our users.
 

Attachments

Also, please share the part of /etc/pve/storage.cfg relating to the vms storage.