Thanks @RoCE-geek.@benyamin - I'm sorry, but it's still buggy.
VM102 - the only one VM on a PVE node
Drivers back to 0.1.240
SCSI Single
Both disks (scsi0 + scsi1) on aio=threads, iothread=1, ssd=1
EFI disk unchanged (as no option is there)
Hanged in the first run, in the initialization phase of Write for Q32T16
Picture is from recent tests (SCSI Basic + Native), but the behavior is the same for the "threads/single/iothread" setup now: https://i.postimg.cc/JnR8yCQH/VM102-SCSI-Basic-Native.png
Hi @benyamin, I did all you've asked.@RoCE-geek, you might recall in the other thread I mentioned this:
"It is worth mentioning that there was a demonstrable performance difference when using Machine and BIOS setting combinations. This was especially evident when using the OVMF UEFI BIOS. The Q35 Machine type was also better performing than the i440fx. So in the end I used the Q35 running SeaBIOS.
I also disabled memory ballooning (balloon=0) and set the CPU to host."
Also, just above that I discussed setting the PhysicalBreaks registry entry. This is because the driver is unable to determine what this should be. Using HKLM\System\CurrentControlSet\Services\vioscsi\Parameters\Device : PhysicalBreaks = 0x20 (32), is a good catch-all for most environments.
I note I'm still using the 0.1.248 driver package.
Thank you for your willingness to explore this further.
Thanks for giving it a go.
journalctl | grep "kvm: virtio: zero sized buffers are not allowed"
Workload type - CrystalDiskMark is used, not as a benchmark here, but as a disk stress-tool only.
For this use-case it’s very sufficient, but some important settings required:
- Use „NVMe SSD“ mode (menu Settings) for higher random load (up to Q32T16)
- For all tests, use Disk D only (aka scsi1 = „data disk“)
- Use default 5 repetitions (it’s usually sufficient)
- Use sample file size 8 - 32GB. Start with 8GB, but higher data size does not mean higher probability of the issue
Hi @_gabriel, exactly, NVMe SSD mode is crucial for Q32T16 benchmark.Don't forget set CDM Settings to "NVMe SSD mode" + Profil "Peak Performance".
Here, VM hang or crash or VM reboot , not always Windows event , but during the hang, error in journalctl is present.
journalctl | grep "kvm: virtio: zero sized buffers are not allowed"
qemu-img create/convert -S 0
and copied with cp --sparse=never
.-Sh / -Suw
), but this might be due to underlying discards.zfs_arc_min
and zfs_arc_max
to limit memory pressure? I dont use ZFS so this might be a factor. I also set vm.swappiness
to 0 to avoid memorty pressure from exessive memory paging. I also added options kvm ignore_msrs=1 report_ignored_msrs=0
to /etc/modprobe.d/kvm.conf
.Well, the mentioned reproducer is the same like the one in CrystalDiskMark, as the underlying tool is the same - diskspeed (DiskSpd64).The reproducer on Github doesnt do any writes so less wear to nand.
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756#issuecomment-1649961285
Good to see more eyes on this issue.
As of my VM101, running of ZFS, I have ZFS ARC Min/Max on 32GB limit on the corresponding PVE node.@_gabriel and @RoCE-geek , I've run it with both, but so far no issues at all.
I do note I'm running monolithic RAW images on a Directory Backend formatted with ext4 on LVM, created withqemu-img create/convert -S 0
and copied withcp --sparse=never
.
I noticed CrystalDiskMark just uses diskspd, and the oddity in the graph above when running the GitHub reproducer is due to the image file creation. If I don't delete the file it produces I/O delay consistent with CrystalDiskMark runs. The lack of I/O on file creation indicates that zeroes are being discarded. As @davemcl just mentioned, the reproducer doesn't seem to perform any writes (even with-Sh / -Suw
), but this might be due to underlying discards.
I hope to try some other local storage backends in a few hours. I'm also invetigating some oddities with driver registry entries simultaneously.
I thought I should also ask, if using ZFS, have you setzfs_arc_min
andzfs_arc_max
to limit memory pressure? I dont use ZFS so this might be a factor. I also setvm.swappiness
to 0 to avoid memorty pressure from exessive memory paging. I also addedoptions kvm ignore_msrs=1 report_ignored_msrs=0
to/etc/modprobe.d/kvm.conf
.
I should mention I've settled on using the following:The reproducer on Github doesnt do any writes so less wear to nand.
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756#issuecomment-1649961285
Good to see more eyes on this issue.
diskspd.exe -b8K -d400 -Sh -o32 -t8 -r -w0 -c3g t:\random.dat
diskspd.exe -b64K -d400 -Sh -o32 -t8 -r -w0 -c3g t:\random.dat
Ok, my bad. I had thoughtThe reproducer on Github doesnt do any writes so less wear to nand.
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756#issuecomment-1649961285
Good to see more eyes on this issue.
-w0
was no warm-up, but that's -W0
. The -w
parameter is for percent writes, with -w
and -w0
being equivalent. Trying some -w20
runs on spindles. Considering sacrificing a small amount of SSD too... So far I haven't been able to reproduce any faults.kvm: Desc next is 3
and the kvm: virtio: zero sized buffers are not allowed
errors.diskspd.exe -b4K -d30 -Sh -o32 -t16 -r -W0 -w0 -ag -c400m <target_disk>:\random_4k.dat
kvm: virtio: zero sized buffers are not allowed
and hangs the VM. This one doesn't seem to hang other VMs, but then I'm not leaving it up for long, preferring to clean shutdown before doing a hard stop.kvm: Desc next is 3
error earlier, and should mention it was produced when running DiskSpd with 64KiB blocks and 16 threads.