Critical iowait on VM disk during live migration

rontex · Jan 30, 2024

When starting live migration, after the message is displayed

Code:

2024-01-30 12:58:11 scsi0: start migration to nbd:unix:/run/qemu-server/535_nbd.migrate:exportname=drive-scsi0drive mirror is starting for drive-scsi0An

Iowait jump occurs, which is very similar to a virtual machine disk being locked. Stopping migration doesn't help the situation. Only start/stop VM helps. The problem is not always reproduced. The more loaded the disk is, the more likely it is that a problem will occur. For example, on an empty running VM it is worth running

Code:

fio --randrepeat=1 --direct=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=2 --numjobs=16 --size= 16G --readwrite=randrw --rwmixread=75

And the problem is reproduced in 90% of cases.

My proxmox info:
File system - LVM-thin volume is used, which is created on Raid 10 mdadm. The raid itself is assembled from 8 NVME disks.

Code:

Linux myserver 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux

Virtual Environment 8.1.4

Code:

qm config 535
agent: 1,freeze-fs-on-backup=0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 24
cpu: host
ide2: none,media=cdrom
machine: q35
memory: 196608
meta: creation-qemu=8.0.2,ctime=1694503731
name: server535
net0: virtio=06:27:E4:FA:58:49,bridge=vmbr0,queues=24,tag=300
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: md-thinstorage:vm-535-disk-0,cache=none,discard=on,format=raw,iothread=1,size=4T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=aea34e10-bcf9-4bb3-8ce8-63d8a42daeed
sockets: 1
vmgenid: a4b5c843-2ceb-43c0-8615-bed53d824f03

What I did and it didn't help:
1. Rolled back the kernel to 6.2.16-20-pve both on the source and on the destination server.
2. Disabled gem-guest-agent
3. Removed the discard option from the VM disk
4. Changed the CPU type from host to x86-64-v2-AES
5. Disabled the CPU option "pcid"

sb-jw · Jan 30, 2024

There is currently a bug with iothread, have you tried deactivating it? Maybe it will help you.

rontex · Jan 30, 2024

sb-jw said:
There is currently a bug with iothread, have you tried deactivating it? Maybe it will help you.

Thank you very much! It really helped. Could you tell me how and where I can track when the problem is fixed?

sb-jw · Jan 30, 2024

Great, nice to hear!

You can find information and updates about this in the thread: https://forum.proxmox.com/threads/vms-hung-after-backup.137286/post-627915

Search

Search

Critical iowait on VM disk during live migration

rontex

Member

sb-jw

Famous Member

rontex

Member

sb-jw

Famous Member

We value your privacy