Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"

Hi! For now, I'm focusing on the VirtIO SCSI (and, apparently also Block) problems with 0.1.285 reported here.

@RoCE-geek, thank you for your in-depth investigation of this bug, reporting this issue upstream, and providing a simple fio reproducer [1] as well. Your debugging efforts are much appreciated.
Your fio reproducer reports the same verification failures on Windows Server 2025 with virtio-win 0.1.285 for me. I still need to try with older virtio-win versions as well as a direct revert of the commit you pointed out.

As you can imagine, our expertise in Windows driver development is a bit more limited compared to Linux. So when we could not quickly reproduce the issue with a similar fio workload a few weeks ago, combined with our main focus being on preparing the Proxmox VE 9.1 release during the last few weeks, this issue did indeed receive less attention than it deserves.
We'll run some more tests and, if needed or useful, join the upstream discussion.

For everyone encountering this issue with virtio-win 0.1.285, downgrading to 0.1.271 seems to be a valid workaround for now, as already suggested by @RoCE-geek [2].

With regards to the other Windows Server 2025 issues you mention (1 and 2): I did not have time to look at the threads you referenced in detail yet, but will try to do that soon. As this thread here is already quite large, I'd suggest we keep it dedicated to the VirtIO SCSI/Block issues to avoid it becoming too confusing. Let's continue the discussion of the other Windows Server 2025 issues in the threads you linked.

[1] https://github.com/virtio-win/kvm-guest-drivers-windows/issues/1453#issuecomment-3527322212
[2] https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-812442
 
Hi! For now, I'm focusing on the VirtIO SCSI (and, apparently also Block) problems with 0.1.285 reported here.

@RoCE-geek, thank you for your in-depth investigation of this bug, reporting this issue upstream, and providing a simple fio reproducer [1] as well. Your debugging efforts are much appreciated.
Your fio reproducer reports the same verification failures on Windows Server 2025 with virtio-win 0.1.285 for me. I still need to try with older virtio-win versions as well as a direct revert of the commit you pointed out.

As you can imagine, our expertise in Windows driver development is a bit more limited compared to Linux. So when we could not quickly reproduce the issue with a similar fio workload a few weeks ago, combined with our main focus being on preparing the Proxmox VE 9.1 release during the last few weeks, this issue did indeed receive less attention than it deserves.
We'll run some more tests and, if needed or useful, join the upstream discussion.

For everyone encountering this issue with virtio-win 0.1.285, downgrading to 0.1.271 seems to be a valid workaround for now, as already suggested by @RoCE-geek [2].

With regards to the other Windows Server 2025 issues you mention (1 and 2): I did not have time to look at the threads you referenced in detail yet, but will try to do that soon. As this thread here is already quite large, I'd suggest we keep it dedicated to the VirtIO SCSI/Block issues to avoid it becoming too confusing. Let's continue the discussion of the other Windows Server 2025 issues in the threads you linked.

[1] https://github.com/virtio-win/kvm-guest-drivers-windows/issues/1453#issuecomment-3527322212
[2] https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-812442
Hi @fweber, thanks for the response, but to be clear, nothing is needed to cooperate on the vioscsi bug.

I've proposed 3 patches, @benyamin will add his own, so we HAVE a solution, and we're just benchmarking them. The final resolution and virtio PR will be soon.

And based on this, I moved to another problem, i.e. the crippled idle CPU state on WS2025/Win24H2+ and will report my new findings here: High VM-EXIT and Host CPU usage on idle with Windows Server 2025. It seems that I already isolated the problem (like with the vioscsi problem). At least on my side (all-EPYC infra), it's all about extensive Hyper-V calls, specifically:
  • STIMER0_CONFIG (0x400000b0)
  • STIMER0_COUNT (0x400000b1)
  • HV_X64_MSR_EOI (0x40000070)
  • HV_X64_MSR_ICR (0x40000071)

Long story short: current vioscsi problems are "almost resolved", no more help is needed at the moment.
 
  • Like
Reactions: MarkusKo
And @fweber, of course I understand the omnipresent DEV buzz, so all is OK, just please, there should be some regular "bumps" in the rising threads from the Proxmox staff. We, as a community, are quite mighty and capable, but definitely not super-mighty. And everyone just needs to know that they are not alone in their suffering, and that their problems are not being ignored.
 
  • Like
Reactions: Whatever
Hi @fweber, thanks for the response, but to be clear, nothing is needed to cooperate on the vioscsi bug.

I've proposed 3 patches, @benyamin will add his own, so we HAVE a solution, and we're just benchmarking them. The final resolution and virtio PR will be soon.
Yes, I saw that (and thanks to @benyamin for preparing a PR!). But I think once a PR is ready, being able to contribute additional testing (with regards to the bug as well as to performance) can't hurt and might help to get the PR merged faster.
And based on this, I moved to another problem, i.e. the crippled idle CPU state on WS2025/Win24H2+ and will report my new findings here: High VM-EXIT and Host CPU usage on idle with Windows Server 2025. It seems that I already isolated the problem (like with the vioscsi problem). At least on my side (all-EPYC infra), it's all about extensive Hyper-V calls, specifically:
  • STIMER0_CONFIG (0x400000b0)
  • STIMER0_COUNT (0x400000b1)
  • HV_X64_MSR_EOI (0x40000070)
  • HV_X64_MSR_ICR (0x40000071)
I see. We'll try to reproduce the issue as well -- but let's please move further discussion of this to the dedicated thread [1].
And @fweber, of course I understand the omnipresent DEV buzz, so all is OK, just please, there should be some regular "bumps" in the rising threads from the Proxmox staff. We, as a community, are quite mighty and capable, but definitely not super-mighty. And everyone just needs to know that they are not alone in their suffering, and that their problems are not being ignored.
Initially, since i didn't manage to reproduce the issue, I decided to hold off posting until I have something substantial to report or ask, but I do see how a quick post, even if light on substance, might have been beneficial here.

[1] https://forum.proxmox.com/threads/h...n-idle-with-windows-server-2025.163564/page-3