Windows Server 2019 - vioscsi Warning (129) - locking up data drive

Ive now managed to upgrade my cluster to PVE 8 - will update if that fixes the issue for me.
 
On PVE8 I got the kvm: Desc next is 3 again
Disk didnt fail entirely but SQL was broken until I rebooted so same result.

I have this occuring on 2 servers, the one it occurs most on is SATA SSD based RAID10, the other where it has occured only twice since April is SAS based SSD RAID10. The SATA system also had its PERC RAID controller in writeback mode, Ive now changed this to write through.
 
Did anybody figure this out?

We started to see these warnings. and problems arose on VMs with any sensitive service like Exchange, or standard SQL.

We are using PVE 8.03 with Ceph, in a 7-node cluster. I've tried all the suggestions in this and other threads, without any success.

It is causing big trouble for us.
 
Did anybody figure this out?

We started to see these warnings. and problems arose on VMs with any sensitive service like Exchange, or standard SQL.

We are using PVE 8.03 with Ceph, in a 7-node cluster. I've tried all the suggestions in this and other threads, without any success.

It is causing big trouble for us.

Its also being discussed here
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756

RHEL are also tracking it but I dont have access and there was a PR from Nutanix recently to bugcheck on Windows when a reset occurs.
https://github.com/virtio-win/kvm-g...mmit/eff7b43073b95c563d2e5c6b0d7b6dd954f1232a
 
Its also being discussed here
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756

RHEL are also tracking it but I dont have access and there was a PR from Nutanix recently to bugcheck on Windows when a reset occurs.
https://github.com/virtio-win/kvm-g...mmit/eff7b43073b95c563d2e5c6b0d7b6dd954f1232a
I just read through myself on the Github link before i posted here :)

I am a 100% sure that this error is occuring at others.

We started to think, that maybe it is caused by the lack of bandwidth on the backend. We are using 2x10Gbit with 9000 MTU for public and cluster networks. Switches are not overwhelmed, and error is occuring randomly, but this, and the Proxmox/Ceph version update is the only thing what happened before the errors started to happen.

@davemcl did you tried these settings:
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\vioscsi\Parameters
    • IoTimeoutValue = 0x5a (90)
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\vioscsi\Parameters\Device
    • PhysicalBreaks = 0x3f (63)
 
Last edited:
I just read through myself on the Github link before i posted here :)

I am a 100% sure that this error is occuring at others.

We started to think, that maybe it is caused by the lack of bandwidth on the backend. We are using 2x10Gbit with 9000 MTU for public and cluster networks. Switches are not overwhelmed, and error is occuring randomly, but this, and the Proxmox/Ceph version update is the only thing what happened before the errors started to happen.

It occurs on 2 SQL servers for me with local storage (SATA & SAS enterprise SSD's in RAID-10) when there is high IO during an ETL run or processing cubes.
 
It occurs on 2 SQL servers for me with local storage (SATA & SAS enterprise SSD's in RAID-10) when there is high IO during an ETL run or processing cubes.
Wow, so CEPH can't be, or most likely not the issue. That is a surprise for me. Then it can only be virtio / QEMU? Awesome.
 
Same issue here with pve 8.1.3.

Did anybody figure this out?
We started to see these warnings. and problems arose on VMs with any sensitive service like Exchange, or standard SQL.

I am happy to see this conversation. I discovered the same issue when I changed motherboard and CPU of my system. For any reason the issue was not present in my old set-up. But it helps me to know that this seems to be a general issue. In my case Exchange and Veeam B&R with PostgreSQL are affected.

My issue
https://forum.proxmox.com/threads/io-delay-how-to-find-out-reason-for.138990/
 
An update...
I was hitting this issue weekly on a certain server for nearly 12 months (in October it occurred 6 times) and has been a massive pain in the arse.
Hasnt re-occurred since November 8, 2023 and workloads havent changed.
I also eval'ed XCP-NG for quite some time and the issue doesnt occur, but XCP is unacceptably slow when using local disks and backups take forever so I didnt migrate.
I have found better disk IOPS dont fix this either - issue occured on a GENOA server with Gen4 NVMe drives (RAID 10)
 
Same here on Windowsserver 2019 mit SQL Server 2019. But only after the upgrade to Proxmox 8.x. The problem didn't exist before. I have to mention that the Windows VM was also updated, including SQL Server updates. It does not always occur. Every few days... I will switch to CPU Host and install the latest VirtIO drivers, maybe this will bring a positive change.
 
Same here on Windowsserver 2019 mit SQL Server 2019. But only after the upgrade to Proxmox 8.x. The problem didn't exist before. I have to mention that the Windows VM was also updated, including SQL Server updates. It does not always occur. Every few days... I will switch to CPU Host and install the latest VirtIO drivers, maybe this will bring a positive change.
I have the latest drivers and the CPU is set to Host. Error still occurs. However, I have the feeling that it occurs much more often on systems with ZFS as the storage system for the Vms than on my Ceph cluster.
 
I have the latest drivers and the CPU is set to Host. Error still occurs. However, I have the feeling that it occurs much more often on systems with ZFS as the storage system for the Vms than on my Ceph cluster.
Here we have to old DELL Servers with LVM-Thin.
 
Hadnt had this occur since November 23 then got this the other day.
So not resolved but happening a lot less.

Feb 25 08:07:30.022692 QEMU[1189154]: kvm: Desc next is 17
 
Very old problem ( since pve 3.x )

The solution/fix is simple:
- Don't use any "virtio" based emulation for WINDOWS based VM.

There was always problems with MS SQL server in VM (running backup-jobs always triggerred: virtio-disk dropping/disappearing), i gave up virtio for WINDOWS long time ago, no problems since then.

Other VMs ( Linux, FreeBSD ) virtio is working correctly.
 
Last edited:
Very old problem ( since pve 3.x )

The solution/fix is simple:
- Don't use any "virtio" based emulation for WINDOWS based VM.

There was always problems with MS SQL server in VM (running backup-jobs always triggerred: virtio-disk dropping/disappearing), i gave up virtio for WINDOWS long time ago, no problems since then.

Other VMs ( Linux, FreeBSD ) virtio is working correctly.

Dropping VirtIO drivers would tank performance, you may as well look at another hypervisor at that point.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!