Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"

complexplaster27 · May 14, 2025

kcybulski said:
Hi,

I updated MSSQL to driver 271 and the 129 error is back, although it is less frequent but it is back. I am going back to 208 tonight.View attachment 86006

I Just updated my test servers MSSQL to v271 from v208 about 3 hours ago and I've gotten zero events, I'll keep an eye on it and report if anything pops up.

kcybulski · May 14, 2025

I

complexplaster27 said:
I Just updated my test servers MSSQL to v271 from v208 about 3 hours ago and I've gotten zero events, I'll keep an eye on it and report if anything pops up.

I updated over the weekend and tonight there was one error that stopped the nightly task. This task was triggering this error pretty regularly. I was hoping this would be resolved after it worked for three days without a hitch.

RCK · May 14, 2025

kcybulski said:
Hi,

I updated MSSQL to driver 271 and the 129 error is back, although it is less frequent but it is back. I am going back to 208 tonight.

Well, I also had to rollback from v266 / v271 to v208 to be stable again.

I have a workload to run scsi stresstest with different KMS tuning :
It's using OODefrag to run "complete/name" hard drive optimisation 4 times consecutively (htop = 100% Disk IO)

Code:

----------------------------------------------------------------------------------------------------------------
OK = STABLE
CRASH = kvm: ../block/block-backend.c:1780: blk_drain: Assertion `qemu_in_main_thread()' failed.
----------------------------------------------------------------------------------------------------------------
pve-manager/7.4-19/f98bf8d4 (running kernel: 5.15.158-2-pve)
QEMU emulator version 7.2.10
scsihw: virtio-scsi-single
----------------------------------------------------------------------------------------------------------------
v271 + cache=unsafe,discard=on,iothread=1 : CRASH  (FIO_R = 3794MBs_58,0k_0,31ms / FIO_W = 3807MBs_58,0k_0,31ms)
v271 + cache=unsafe,discard=on,iothread=0 : OK     (FIO_R = 3748MBs_70,2k_0,26ms / FIO_W = 3762MBs_70,1k_0,27ms)
v266 + cache=unsafe,discard=on,iothread=1 : CRASH  (FIO_R = 3817MBs_56,2k_0,32ms / FIO_W = 3830MBs_56,2k_0,32ms)
v266 + cache=unsafe,discard=on,iothread=0 : OK     (FIO_R = 3804MBs_71,9k_0,26ms / FIO_W = 3818MBs_71,8k_0,26ms)
v208 + cache=unsafe,discard=on,iothread=1 : OK     (FIO_R = 3922MBs_55,6k_0,32ms / FIO_W = 3937MBs_55,6k_0,32ms)
v208 + cache=unsafe,discard=on,iothread=0 : OK     (FIO_R = 3823MBs_68,6k_0,27ms / FIO_W = 3835MBs_68,5k_0,27ms)        **BEST**
v208 + cache=unsafe,discard=ignore,iothread=1 : OK (FIO_R = 3856MBs_55,7k_0,32ms / FIO_W = 3867MBs_55,6k_0,32ms)
v208 + cache=unsafe,discard=ignore,iothread=0 : OK (FIO_R = 3806MBs_68,0k_0,27ms / FIO_W = 3819MBs_68,0k_0,27ms)
v208 + discard=on,iothread=1 : OK                  (FIO_R =  234MBs_30,9k_0,95ms / FIO_W =  245MBs_30,8k_1,10ms)
v208 + discard=on,iothread=0 : OK                  (FIO_R =  239MBs_29,9k_0,85ms / FIO_W =  252MBs_29,9k_1,14ms)
----------------------------------------------------------------------------------------------------------------

_gabriel · May 14, 2025

RCK said:
I also had to rollback from v266 / v271 to v208 to be stable again.

Do you use Local storage ?
What about PVE version 8.4 shipped with QEMU 9.0 and Kernel 6.8 ?

RCK · May 14, 2025

_gabriel said:
Do you use Local storage ?
What about PVE version 8.4 shipped with QEMU 9.0 and Kernel 6.8 ?

Thoses tests were done on a very simple station: Ryzen 7 5700X + SSD Crucial MX500 sata + Local Thin LVM.
When I will have time, I will also test PVE 8.4 and Qemu 9.2

_gabriel · May 14, 2025

RCK said:
SSD Crucial MX500 sata

No drivers can improve this consumer ssd where outside of their internal dram cache, they write slow.
Moreover cache=unsafe increase the slowness because of double if not triple cache involved.

RCK · May 14, 2025

_gabriel said:
No drivers can improve this consumer ssd where outside of their internal dram cache, they write slow.
Moreover cache=unsafe increase the slowness because of double if not triple cache involved.

I'm totally agree, this kind of hardware is not for for production.
What is interesting here is that the v208 windows driver doesn't kill the VM, contrary to v266 + v271 (blk_drain in block-backend.c with iothread=1)

_gabriel · May 14, 2025

What about v266 + default PVE cache=none + iothread=1

spirit · May 14, 2025

please test with pve8 && recent qemu, I remember than some iothreads crash has been fixed since. (and pve7 is EOL anyway)

complexplaster27 · May 15, 2025

Hey All,

Here are some of my testings that I hope help.
I have upgraded the drivers to v271 on one of my production servers (non critical) to do testing with, here's the verification of said drivers:

I ran that same crystal disk benchmark that was proven to force a vioscsi crash last year almost guaranteed and this time the test completed successfully without ever locking up the VM or causing a vioscsi crash

Here's the PVE version details:

ceph: 19.2.1-pve3
pve-qemu-kvm: 9.2.0-5
qemu-server: 8.3.12
proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)

And the configuration of the Virtual Machine:

Code:

cat /etc/pve/qemu-server/106.conf
agent: 1
bios: ovmf
boot: order=scsi0;net0;scsi2
cores: 4
cpu: host
efidisk0: cluster-storage:vm-106-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.5,ctime=1710814878
name: <name>
net0: virtio=BC:24:11:8A:D4:F1,bridge=vmbr1,firewall=1
numa: 1
onboot: 1
ostype: win10
scsi0: cluster-storage:vm-106-disk-1,discard=on,iothread=1,size=70G
scsi1: cluster-storage:vm-106-disk-2,discard=on,iothread=1,size=60G
scsihw: virtio-scsi-single

Happy to do more testing if required but I essentially don't have that vioscsi error on either this machine or my test MSSQL servers.

Search

Search

Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"

complexplaster27

Member

kcybulski

Active Member

RCK

Renowned Member

_gabriel

Famous Member

RCK

Renowned Member

_gabriel

Famous Member

RCK

Renowned Member

_gabriel

Famous Member

spirit

Distinguished Member

complexplaster27

Member

We value your privacy