High virtio-scsi I/O in one VM crashes all VMs using virtio-scsi

benyamin

Member
Oct 3, 2022
69
35
23
Hi everyone,

I've been having a performance issue with virtio-scsi for some time after migrating from ESX last year, with the VM becoming unresponsive and sometimes causing other VMs using virtio-scsi to hang too.

I previously only saw the issue with Windows VMs during Windows updates, usually the large monthly cumulative OS updates. I note many threads here on the forums related to this, e.g. this one. These seemed to be more of a distraction around installation of the drivers, which wasn't an issue for me on the VMs affected at the time.

Digging deeper, I began to suspect io_uring, so tried native, and also with and without iothreads but to no avail. FWIW, I ended up settling on aio=io_uring and iothread=1.

Anyway, I still had to update one system at a time, and often shut down other systems using virtio-scsi so that they wouldn't become unresponsive. Fast forward to today, i had a VM crash during updating and when performing a disk check on reboot it could not finish before crashing out. This caused several other virtio-scsi systems to hang too.

So, I set it upon myself to sort it out. I ended up upgrading all virtio drivers and clients to the latest/stable (virtio-win-0.1.229). Two legacy 2012r2 systems which previously only had netkvm drivers manually installed via device manager were also successfully upgraded to the full package including the guest agent after enabling TESTSIGNING and installing the RedHat cert (and deselecting the qemupciserial driver in the installer). This also effectively migrated them from VMware PVSCSI to virtio-scsi (to be clear, I explicitly migrated to virtio-scsi).

This "addition" of two more VMs to use virtio-scsi seems to be a tipping point in reliability. Any significant I/O, e.g. an update, or a disk check results in all of the virtio-scsi VMs hanging. If the I/O is prolonged, all VMs, even those using IDE, SATA and PVSCSI hang as well. I also had "failed to convert unwritten extents to written extents -- potential data loss!" on the console / syslog and had to reset the server to recover.

During my earlier research, I came across the many threads here and abroad regarding detect-zeroes=unmap and I understand that a patch was released regarding the BDRV_REQ_REGISTERED_BUF flag per the posts by @fiona in those places. I had been waiting for the patched pve-qemu-kvm to be released to see if that helped at all, but it doesn't appear to have had any effect on my issue.

From running ps aux | grep kvm I see that all my VMs are using detect-zeroes=on.

I only use RAW devices. I would like to try detect-zeroes=off to see what happens. Is this possible by passing some args:, if so what would they look like?

The relevant part of the running config is as follows:
Code:
-device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0
-drive file=/mnt/mirror/images/300/vm-300-disk0.raw,if=none,id=drive-scsi0,aio=io_uring,format=raw,cache=none,detect-zeroes=on
-device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100

All my disk images are in RAW format sitting on a ext4 formatted LVM volume group. Hardware is an LSI MegaRAID SAS 9260-8i controller with 512MB cache.

I'm running the latest pve no-subscription.

At this point, any advice would be appreciated.

Thanks in advance,
Ben
 
Hi there, this has also caused me hours of frustration. I was doing drive by drive passthrough to a TrueNAS Core installation; following update to 7.4, the VM would crash after about 10 seconds. I finally diagnosed it the the Virtio controller; switching to VMware PVSCSI controller in the VM's harware settings has solved the problem for the moment and given me a working VM.
 
I haven't had a chance to experiment further yet, but patch Tuesday approaches...

I have four VMs still using virtio-scsi (one windows, one debian and two HardenedBSD), two run PVSCSI and the rest (8 to 15 odd at times) are all IDE for now. I'd like to sort this out and move what I can to virtio-scsi if possible.

I'm wondering if this has anything to do with using monolithic flat disks rather than sparse disks...

In my case, I imported from ESX (where I used monolithic flat disks), to the extent I:
  1. Converted the vmdk images to the raw image format using qemu-image convert with the -S 0 option (SPARSE_SIZE = 0); &
  2. Copied the raw images using cp --sparse=never.
Both of these ensure the image remains a fully allocated monolithic flat disk.

This is one reason why I would like to try using a drive device with detect-zeroes=off.

@otymm, I'm curious if your passthrough device would be treated the same way, i.e. as fully allocated. Any chance you could revert to virtio and execute ps aux | grep kvm to confirm whether the relevant drive device has detect-zeroes=on? Maybe you already did...

I've noticed the Proxmox default appears to be a sparse format, e.g. if I grow an image it does not physically allocate the disk space, it "converts" the disk to a monolithic sparse format.

Anyone else care to weigh in?

EDIT: mixed up my detect-zeroes=on. This post is good.
 
I’ll have a look when I get home.

As a quick update, it ended up crashing with PVSCSI too (though significantly later, and only when I started conducting heavy IO; since then, I went back to virtio and installed the 6.2 edge kernel; it’s been okay since then.
 
I’ll have a look when I get home.

As a quick update, it ended up crashing with PVSCSI too (though significantly later, and only when I started conducting heavy IO; since then, I went back to virtio and installed the 6.2 edge kernel; it’s been okay since then.
@otymm, just curious, what CPUs are you running? Still running well?
 
Ok, I finally got somewhere with this.

Firstly, the following had no effect on the problem:
  • Running with detect-zeroes=off
  • Trying each new iteration of the virtio drivers as they were released
  • Using various versions of the i440fx Machine type
  • New Proxmox releases
  • New kernels (although this did improve performance)
  • New microcode (although this did improve performance - see below)
  • Using x86-64-v2-AES as mentioned here, or any other CPU type
However, running Async IO in threads (aio=threads) with iothread=1, as presently advocated by @RolandK and others, per https://bugzilla.kernel.org/show_bug.cgi?id=199727, really did resolve this for me. You still observe IO delays but it doesn't cause the system or other VMs to hang, and I no longer need to shutdown most VMs to conduct patching.

I did have an issue with one VM still crashing. It used VMware PVSCSI and so cannot use iothread=1. Migrating this to VirtIO SCSI single and setting aio=threads and iothread=1 immediately resolved the issue with that VM also.

As I mentioned above, you still see significant IO delays at certain points during the installation of cumulative updates. This is enough to cause delays with new network connections or spinning up new processes, etc., so performance tuning is still somewhat important to reduce these delays. A fast storage backing would also help, perhaps to the extent that this problem does not present at all.

To improve performance, I did the following in each Windows VM:
  • HKLM\System\CurrentControlSet\Services\Disk : TimeoutValue = 0x3c (60) - DEFAULT
  • HKLM\System\CurrentControlSet\Services\Disk : IoTimeoutValue = 0x3c (60) - DEFAULT
  • HKLM\System\CurrentControlSet\Services\vioscsi\Parameters\Device : PhysicalBreaks = 0x20 (32)
The PhysicalBreaks registry entry actually needs to be set to less than the max_segments size for each backing block device.
This can be determined by issuing the command grep "" /sys/block/*/queue/max_segments on the Proxmox host.
My block devices have very large stripes and as such have a max_segments value of 60
As such it should be set to no more than 59, but I considered 32 sufficient. It's also base2 pretty - which is irrelevant... 8^d
More info: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/827

It is worth mentioning that there was a demonstrable performance difference when using Machine and BIOS setting combinations. This was especially evident when using the OVMF UEFI BIOS. The Q35 Machine type was also better performing than the i440fx. So in the end I used the Q35 running SeaBIOS.

I also disabled memory ballooning (balloon=0) and set the CPU to host.

On the Proxmox host:

My mainboard manufacturer was too slow for my liking in releasing microcode updates via the BIOS, so I added:
deb http://ftp.au.debian.org/debian sid non-free-firmware to new file /etc/apt/sources.list.d/firmware.list
...although /etc/modprobe.d/intel-microcode-blacklist.conf might have blocked these (didn't seem to when I verified it though...).
it's probably important to note that at the time this issue was resolved, the microcode in my BIOS matched that in SID.

I also added options kvm ignore_msrs=1 report_ignored_msrs=0 to /etc/modprobe.d/kvm.conf.

I did not try resolving anything relating to LUN reset issues, or the Desc next is 3 or indirect_desc=off/on issue, as described in the following, but I note the threads were insightful:
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/623
...and many similar LUN reset issues.

Hopefully I didn't miss anything, and the above is helpful to someone.
 
Last edited:
Further reference (cross post): https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-659494

Also, the title of this thread, i.e. "High virtio-scsi I/O in one VM crashes all VMs using virtio-scsi" is somewhat inaccurate.

It seems VMs were more likely to fail with each additional iothread. There seems to be a proportional, perhaps even a linear relationship, between the number of iothreads and the probability of failure. Also, the one VM using a PVSCSI adapter I mentioned above was even more likely to fail.
 
Last edited:
To improve performance, I did the following in each Windows VM:
  • HKLM\System\CurrentControlSet\Services\Disk : TimeoutValue = 0x3c (60) - DEFAULT
  • HKLM\System\CurrentControlSet\Services\Disk : IoTimeoutValue = 0x3c (60) - DEFAULT
  • HKLM\System\CurrentControlSet\Services\vioscsi\Parameters\Device : PhysicalBreaks = 0x20 (32)
The PhysicalBreaks registry entry actually needs to be set to less than the max_segments size for each backing block device.
This can be determined by issuing the command grep "" /sys/block/*/queue/max_segments on the Proxmox host.
My block devices have very large stripes and as such have a max_segments value of 60
As such it should be set to no more than 59, but I considered 32 sufficient. It's also base2 pretty - which is irrelevant... 8^d
More info: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/827
I did some further testing with working vioscsi driver version 100.85.104.20800 from package virtio-win-0.1.208 - with and without the PhysicalBreaks registry entry. When present I set the value to 32 (0x20).

So for a MegaRAID + CacheCade based R50 span, with a max_segments value of 60 -

With the registry entry:

r50_nvme_peakperf_208_with_reg.jpg

Without the registry entry:

r50_nvme_peakperf_208_no_reg.jpg

So for a SATA-3 HDD, with a max_segments value of 168 -

With the registry entry:

sata_spindle_nvme_peakperf_208_with_reg.jpg

Without the registry entry:

sata_spindle_nvme_peakperf_208_no_reg.jpg

So for a NVMe SSD, with a max_segments value of 33 -

With the registry entry:

dedicated_ssd_nvme_peakperf_208_with_reg.jpg

Without the registry entry:

dedicated_ssd_nvme_peakperf_208_no_reg.jpg

The addition of the PhysicalBreaks registry entry essentially reduced throughput at the cost of increased I/O in the backing. Combined with aio=threads, this resulted in increased stability by permitting this additional I/O outside the QEMU global mutex.

The registry entry is probably beneficial in some networked storage scenarios, but it is probably best to avoid with local storage if you have a working driver.
 
Here is a screen showing the difference in I/O delay when testing the RAID50 span. The Y-axis max is 60%.

r50_nvme_peakperf_208_host_io_60pc.png

The peak on the right is with the PhysicalBreaks registry entry set .
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!