When starting live migration, after the message is displayed
Iowait jump occurs, which is very similar to a virtual machine disk being locked. Stopping migration doesn't help the situation. Only start/stop VM helps. The problem is not always reproduced. The more loaded the disk is, the more likely it is that a problem will occur. For example, on an empty running VM it is worth running
And the problem is reproduced in 90% of cases.
My proxmox info:
File system - LVM-thin volume is used, which is created on Raid 10 mdadm. The raid itself is assembled from 8 NVME disks.
Virtual Environment 8.1.4
What I did and it didn't help:
1. Rolled back the kernel to 6.2.16-20-pve both on the source and on the destination server.
2. Disabled gem-guest-agent
3. Removed the discard option from the VM disk
4. Changed the CPU type from host to x86-64-v2-AES
5. Disabled the CPU option "pcid"
Code:
2024-01-30 12:58:11 scsi0: start migration to nbd:unix:/run/qemu-server/535_nbd.migrate:exportname=drive-scsi0drive mirror is starting for drive-scsi0An
Code:
fio --randrepeat=1 --direct=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=2 --numjobs=16 --size= 16G --readwrite=randrw --rwmixread=75
My proxmox info:
File system - LVM-thin volume is used, which is created on Raid 10 mdadm. The raid itself is assembled from 8 NVME disks.
Code:
Linux myserver 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux
Code:
qm config 535
agent: 1,freeze-fs-on-backup=0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 24
cpu: host
ide2: none,media=cdrom
machine: q35
memory: 196608
meta: creation-qemu=8.0.2,ctime=1694503731
name: server535
net0: virtio=06:27:E4:FA:58:49,bridge=vmbr0,queues=24,tag=300
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: md-thinstorage:vm-535-disk-0,cache=none,discard=on,format=raw,iothread=1,size=4T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=aea34e10-bcf9-4bb3-8ce8-63d8a42daeed
sockets: 1
vmgenid: a4b5c843-2ceb-43c0-8615-bed53d824f03
What I did and it didn't help:
1. Rolled back the kernel to 6.2.16-20-pve both on the source and on the destination server.
2. Disabled gem-guest-agent
3. Removed the discard option from the VM disk
4. Changed the CPU type from host to x86-64-v2-AES
5. Disabled the CPU option "pcid"