Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"

zrbite · Oct 30, 2025

We’ve seen similar behavior but we only use cache=writeback in our environment for high performance. In our testing, only virtio-scsi versions 0.1.204 and earlier remain stable under high I/O. Anything from 0.1.208 onward becomes unstable during stress and we can easily reproduce instability when running multiple synthetic workloads with DiskSpd. Without cache=writeback, 0.1.208, 0.1.266 and 0.1.271 appear stable but we don't want to give up writeback caching.

Whatever · Oct 31, 2025

ncik said:
This has been my anecdotal experience as well. I installed 0.1.285 on a windows/sql host and it caused a lot of suspect virtio-related messages in event viewer. Rolling it to the version before that (0.1.271) seems to have cleared it up. The older established bug-free versions (0.1.204 and I think .208) also still work well, though like the rest of us I’m sure, I worry a little about not getting the other unrelated bugfixes etc in the newer versions.

Thanks for continuing your efforts here!

In our environment, we use dozens of Windows Server 2019/2022, and we don't see any issues with version 1.285 drivers. However, there is only 1 server with the MSSQL database engine (2019, if I'm not mistaken), and I'm not entirely sure if it has been updated to the virtio drives version 1.285 or not.

P.S. @RoCE-geek
Maybe you can write a T-SQL script to reproduce the problem? I'm sure this can greatly speed up the solution.

RoCE-geek · Oct 31, 2025

Whatever said:
P.S. @RoCE-geek
Maybe you can write a T-SQL script to reproduce the problem? I'm sure this can greatly speed up the solution.

This can be hard, because there's no clear initial condition. We have another two VMs with WS2025 and SQL2022, but there's a low traffic/load, so these errors/issues are quite rare and no service hangs so far.

But I'm quite sure that majority of setups with WS2025 and SQL Server are affected. Even with the low SQL load, there are similar reports/errors, at least a few dozens per week, but no one is probably aware of. Only if you dig into the SQL logs or Windows Application logs, you can find similar reports. But still, they are all "just" informative, no warnings, no errors. Just a sign there's something buggy inside the storage/VM stack.

Last but not least, given it's not just about the storage, as network may be affected as well, there's probably something more general to the Windows Server 2025 stack differences, i.e. maybe some general abstraction layer is the root cause.

But as always, deep and comprehensive diff between 0.1.271 and 0.1.285 should be a starting point for the initial analysis.

RoCE-geek · Oct 31, 2025

I did some new tests, this time with an unreleased, Pre-WHQL (i.e. non-certified) vioscsi driver 0.1.292, found here: attestation-virtio-win-prewhql-0.1-292.zip

And it was quite fast, the bug is present, still with the high incidence, so even the new, unreleased-yet driver is buggy, but those affected with the network issues can try the newer NetKVM driver.

So still go with the 0.1.271 only for the WS2025 and SQL Server combo, it's the "safe" rollback solution for the problems described here.

RoCE-geek · Nov 5, 2025

OK, I'm quite confident that I've found and isolated the problem (and will drop it on github soon).

So let's start with the analysis. At first, be careful that virtio releases in time are not relevant to the commit dates.
I mean that if you see e.g. version 0.1.271 was released on fedorapeople.org at 2025-04-07, it was definitely not committed this date.
If you're really interested what is inside, you have to download *.src.rpm package from the same path (here e.g. virtio-win-0.1.271-1).
This can be done within the Windows (VM), no Linux needed. It's just double 7-zipped archive. So do unzipping until you see the final folder (e.g. virtio-win-prewhql-0.1-271-sources -> internal-kvm-guest-drivers-windows). This is the real source folder.

Next check the status.txt file immediately. First surprise is that these fedora releases are not based on the public github repo.
This is clearly visible as this files is referring to the RHEL's internal git://git.engineering.redhat.com/users/vrozenfe/internal-kvm-guest-drivers-windows/.git
"vrozenfe", aka Vadim Rozenfeld, is the key maintainer of the whole virtio project, and specially dedicated to storage + balloon + serial + gpu.

So in this file you'll see the latest commit first, for 0.1.271 it's something like this:

Date 14 Jan 2025
repo git://git.engineering.redhat.com/users/vrozenfe/internal-kvm-guest-drivers-windows/.git
tag mm291

Fixed issues:

RHEL-69076: Formatting viomem driver with clang-format
RHEL-69073: broken style fix for viofs driver
RHEL-69079: clang-format for vioserial folder
...

Now you have to check the official repo for the corresponding commits around the date: virtio-commits-master

After checking the content we decide go with the Jan 13, 2025 commits, as they contain all mentioned RHEL issues in status.txt.
So we can conclude that 0.1.271 build commit is 0e263be. And FYI, Jan 13, 2025 is also the driver date visible in Windows Device Manager.
You can also see the tag "mm291" - these are some incremental RHEL's tags, and I just decoded that probably the virtio release number is the mm-tag-id minus 20, so here it is 0.1.271 - I don't know why, but it's not so important. The same is true for mm312, which corresponds to 0.1.292, etc.

And here you can see another problem - release timing. It's not driven by the community needs, but by the RHEL releases / milestones.
And this is why I'm saying that Proxmox Server Solutions GmbH should fork the repo and maintain regular updates, i.e. based on user reports and/or bug-resolution importance. In case of 0.1.271, there was almost 3 months pointless delay.

The same delay is true for 0.1.285 - the corresponding commit bd965ef is from Jul 2, 2025, version was built 7 days later (2025-07-09), but it was released on fedorapeople 2025-09-12, so more than 2 months later.

OK, back to the core problem. I was sure that in 0.271 all is working well (in terms of vioscsi with SQL Server and Windows Server 2025), and in the next published version, 0.1.285, it was already crippled.

So we have to check commits > 0e263be (Jan 13) and <= bd965ef (Jul 2), and hope that limiting to vioscsi changes would be enough.

Looking to those vioscsi commits, there's only one suspicious, related to the symptoms I've described earlier:

Date (UTC on GitHub)	Commit	What it changed	Why it’s suspicious for WS 2025
2025-03-05	1bbc422 – “Address possible memory management issues when receiving interrupts for already completed requests”	Reworks how completions are matched: introduces an SRB ID counter, uses that ID (cast to pointer) as the virtqueue cookie, adds a free-list for SRB extensions, and guards against interrupts for “already completed” SRBs.	This is a hot path (ISR/DPC ↔ completion mapping). If any path leaves id uninitialized / reused, or a race slips in, you can get a one-off bad read that succeeds on retry—very similar to your “succeeded after failing once” SQL Server messages. WS 2025 uses newer Storport and can expose timing/locking bugs that WS 2022 never hits.

Sounds like a rocket science? Maybe, but it's not the point. SRB is the omnipresent buzzword around the whole vioscsi stack, as it's the "SCSI Request Block". Almost all the bugs in vioscsi are related to the SRB calls and its processing. And the last year, when we all were fighting with the "Reset to device, \Device\RaidPort[X], was issued." bugs, it was all about SRB as well. And in that time, after a brief period of hesitation and his doubts (regarding my strict confidence that all is in the driver itself), @benyamin quickly became a greatest viosci contributor. And based on his deep analysis and corresponding changes, this issue was successfully resolved, so this is why 0.1.266 was a bug-free version (kudos to him, but it seems he's no longer active here).

In theory, I've been in the hunt of bugs between 0.1.271 and 0.1.285, but I finally decided to work simply on the latest master (checked out a few days ago), which corresponds to the not-yet-released (and latest) 0.1.292. So I wanted ideally one simple revert to mitigate it clearly.

Spoiler: after almost 48 hours in production, I can say that revert of 1bbc422 is the only mitigation needed. So really, this commit revert solves the problem entirely. But while it's probably not your use-case, the same bug was introduced into viostor as well (aka virtio-block), so for an universal mitigation revert of c09af90 will be need too.

OK, we can now revert and test the buggy commit, but what's the root cause? As I wrote in the first post: "some kind of race conditions (my expectation)". And really, it's more than valid here. While it's true that on Windows Server 2022 there are no such issues so far (according to the reports of others), Storport in Windows Server 2025 changed towards better performance. It's quite clear that it scales much better on multi-core systems, so while in WS2022 there was a low or zero risk that the driver requests would be parallelized, the opposite is true for WS2025.

In other words, on WS2025’s Storport, the timing mix can cause a rare mis-match or stale completion → one read returns the wrong bytes (page checksum/pageid “wrong”), SQL Server retries immediately, the next read hits the right buffer and “succeeds after failing 1 time”. This also explains why WS2022 looks fine (different Storport timing), and why it shows up only with newer drivers.

OK, but is there a clear problem visible in that commit? Simply said: yes!

At least for me, and even more for my virtual coworker (GPT-5 Thinking), these two lines (1bbc422 ‎vioscsi/vioscsi.c) are highly suspicious:

Code:

        srbExt->id = adaptExt->last_srb_id;
        adaptExt->last_srb_id++;

It's clear that the assignment and increment are not one atomic call (no lock, no interlocked op), so if occasionally this call is processed per-partes, the read attempt have to fail.

More details: On modern Storport (as in Windows Server 2025), StartIo can run on multiple CPUs/queues much more aggressively than older builds, so two threads can actually read the same last_srb_id and hand out duplicate IDs. When the host later completes those I/Os, the driver can mismatch which SRB it completes or fail to find one (“No SRB found for ID”), causing a transient bad read that SQL Server retries — exactly the “succeeded after failing 1 time” messages mentioned above.

But please note: although it's not relevant to this specific bug, version 0.1.292 contains fresh, Sep 26, 2025 revert (commit a6d690a - Revert "NO-SDV [vioscsi] Reduce spinlock management complexity") for the @benyamin's Nov 20, 2024 initial commit 15e64ac - which is already included in 0.1.271. The corresponding PRs are 1175 and 1293. So this commit/revert is also generally suspicious (check the threads), but as the initial commit is already present in the very well working 0.1.271, I can safely refuse any potential negative impact to this read-retry problem. And although this commit/revert is mainly about potentially buggy driver initialization (without expected runtime influence), you've to be always cautious about similar commits/reverts.

Long story short: I've analyzed, isolated, solved, validated and described the root cause of this read-retry issue on Windows Server 2025.
And I hope this long post will serve as either educational or motivational kick for you, to be aware that bug hunting, iterating and driver building is definitely not a nuclear science, as I served to myself as a zero-to-hero out of necessity. My GPT-5 Thinking buddy further generated probably viable code diff for the buggy commit 1bbc422, so this (my) revert is just a temporary solution.

Bottom line: I had completely zero previous experience with the windows driver build process. But with the help of GPT-5, it was really easy, in addition to this clear virtio-win driver guide. But even if I've the EV code-signing certificate, still it's not possible to sign Windows Server drivers this way. This is really not simple process for an individual, but quite often for an medium company like Proxmox.

Technically, you will need to install and run Windows Hardware Lab Kit (HLK), process the required driver tests, submit all to MS Partner Central (here's your EV cert required - both for account authentication and signing the submission CAB file), and then Microsoft returns a dashboard/WHQL signature if all is clear. As you can see, this is definitely a no go for any bug hunting and testing.

Instead, there's a quite straightforward path - shipped virtio source (I mean the *.src.rpm and the internal-kvm-guest-drivers-windows folder) already includes test-signing certificate (and the corresponding signing batch files), so all you need is disable secure boot for the testing Windows VM (in the UEFI/boot phase), then enable test-signing (via bcdedit /set testsigning on) and at this point, you can install your fresh driver build.

In my case, I had installed complete virtio drivers for 0.1.285, and was just changing/testing the vioscsi drivers (every single build for every single commit/revert compiled). Here you should be careful to correctly version the builds (in the INF file). For instance, the baseline for 0.1.292 is 100.102.104.29200, so I created multiple subsequent 100.102.104.29201, 100.102.104.29202, 100.102.104.29203, etc. driver versions. It's really important to retain correct numbering, otherwise your driver list will be a complete mess (and you always need the correct upstream version binding).

stanthewizzard2025 · Nov 9, 2025

What a digging !

RoCE-geek · Nov 9, 2025

Github report is here: virtio issue 1453 - Read-retry errors on Windows Server 2025 with SQL Server (0.1.285+)

But please note that the network-related problems are still there (although I'm not sure if I'm affected), as it was out of my focus,
but another volunteer is wanted to check this out deeply (W2025 virtio NIC -> connection drop outs).

I've just demonstrated that any previous experience is not needed.

RoCE-geek · Nov 9, 2025

My quick GPT-5 generated kick-starter regarding mentioned network errors: What changed in NetKVM (Jan 13 → Jul 2, 2025)

dcuadrados · Nov 12, 2025

RoCE-geek said:
OK, I'm quite confident that I've found and isolated the problem (and will drop it on github soon).

So let's start with the analysis. At first, be careful that virtio releases in time are not relevant to the commit dates.
I mean that if you see e.g. version 0.1.271 was released on fedorapeople.org at 2025-04-07, it was definitely not committed this date.
If you're really interested what is inside, you have to download *.src.rpm package from the same path (here e.g. virtio-win-0.1.271-1).
This can be done within the Windows (VM), no Linux needed. It's just double 7-zipped archive. So do unzipping until you see the final folder (e.g. virtio-win-prewhql-0.1-271-sources -> internal-kvm-guest-drivers-windows). This is the real source folder.

Next check the status.txt file immediately. First surprise is that these fedora releases are not based on the public github repo.
This is clearly visible as this files is referring to the RHEL's internal git://git.engineering.redhat.com/users/vrozenfe/internal-kvm-guest-drivers-windows/.git
"vrozenfe", aka Vadim Rozenfeld, is the key maintainer of the whole virtio project, and specially dedicated to storage + balloon + serial + gpu.

So in this file you'll see the latest commit first, for 0.1.271 it's something like this:

Date 14 Jan 2025
repo git://git.engineering.redhat.com/users/vrozenfe/internal-kvm-guest-drivers-windows/.git
tag mm291

Fixed issues:

RHEL-69076: Formatting viomem driver with clang-format
RHEL-69073: broken style fix for viofs driver
RHEL-69079: clang-format for vioserial folder
...

Now you have to check the official repo for the corresponding commits around the date: virtio-commits-master

After checking the content we decide go with the Jan 13, 2025 commits, as they contain all mentioned RHEL issues in status.txt.
So we can conclude that 0.1.271 build commit is 0e263be. And FYI, Jan 13, 2025 is also the driver date visible in Windows Device Manager.
You can also see the tag "mm291" - these are some incremental RHEL's tags, and I just decoded that probably the virtio release number is the mm-tag-id minus 20, so here it is 0.1.271 - I don't know why, but it's not so important. The same is true for mm312, which corresponds to 0.1.292, etc.

And here you can see another problem - release timing. It's not driven by the community needs, but by the RHEL releases / milestones.
And this is why I'm saying that Proxmox Server Solutions GmbH should fork the repo and maintain regular updates, i.e. based on user reports and/or bug-resolution importance. In case of 0.1.271, there was almost 3 months pointless delay.

The same delay is true for 0.1.285 - the corresponding commit bd965ef is from Jul 2, 2025, version was built 7 days later (2025-07-09), but it was released on fedorapeople 2025-09-12, so more than 2 months later.

OK, back to the core problem. I was sure that in 0.271 all is working well (in terms of vioscsi with SQL Server and Windows Server 2025), and in the next published version, 0.1.285, it was already crippled.

So we have to check commits > 0e263be (Jan 13) and <= bd965ef (Jul 2), and hope that limiting to vioscsi changes would be enough.

Looking to those vioscsi commits, there's only one suspicious, related to the symptoms I've described earlier:

Date (UTC on GitHub) Commit What it changed Why it’s suspicious for WS 2025
2025-03-05 1bbc422 – “Address possible memory management issues when receiving interrupts for already completed requests” Reworks how completions are matched: introduces an SRB ID counter, uses that ID (cast to pointer) as the virtqueue cookie, adds a free-list for SRB extensions, and guards against interrupts for “already completed” SRBs. This is a hot path (ISR/DPC ↔ completion mapping). If any path leaves id uninitialized / reused, or a race slips in, you can get a one-off bad read that succeeds on retry—very similar to your “succeeded after failing once” SQL Server messages. WS 2025 uses newer Storport and can expose timing/locking bugs that WS 2022 never hits.

Sounds like a rocket science? Maybe, but it's not the point. SRB is the omnipresent buzzword around the whole vioscsi stack, as it's the "SCSI Request Block". Almost all the bugs in vioscsi are related to the SRB calls and its processing. And the last year, when we all were fighting with the "Reset to device, \Device\RaidPort[X], was issued." bugs, it was all about SRB as well. And in that time, after a brief period of hesitation and his doubts (regarding my strict confidence that all is in the driver itself), @benyamin quickly became a greatest viosci contributor. And based on his deep analysis and corresponding changes, this issue was successfully resolved, so this is why 0.1.266 was a bug-free version (kudos to him, but it seems he's no longer active here).

In theory, I've been in the hunt of bugs between 0.1.271 and 0.1.285, but I finally decided to work simply on the latest master (checked out a few days ago), which corresponds to the not-yet-released (and latest) 0.1.292. So I wanted ideally one simple revert to mitigate it clearly.

Spoiler: after almost 48 hours in production, I can say that revert of 1bbc422 is the only mitigation needed. So really, this commit revert solves the problem entirely. But while it's probably not your use-case, the same bug was introduced into viostor as well (aka virtio-block), so for an universal mitigation revert of c09af90 will be need too.

OK, we can now revert and test the buggy commit, but what's the root cause? As I wrote in the first post: "some kind of race conditions (my expectation)". And really, it's more than valid here. While it's true that on Windows Server 2022 there are no such issues so far (according to the reports of others), Storport in Windows Server 2025 changed towards better performance. It's quite clear that it scales much better on multi-core systems, so while in WS2022 there was a low or zero risk that the driver requests would be parallelized, the opposite is true for WS2025.

In other words, on WS2025’s Storport, the timing mix can cause a rare mis-match or stale completion → one read returns the wrong bytes (page checksum/pageid “wrong”), SQL Server retries immediately, the next read hits the right buffer and “succeeds after failing 1 time”. This also explains why WS2022 looks fine (different Storport timing), and why it shows up only with newer drivers.

OK, but is there a clear problem visible in that commit? Simply said: yes!

At least for me, and even more for my virtual coworker (GPT-5 Thinking), these two lines (1bbc422 ‎vioscsi/vioscsi.c) are highly suspicious:

Code:

srbExt->id = adaptExt->last_srb_id; adaptExt->last_srb_id++;

It's clear that the assignment and increment are not one atomic call (no lock, no interlocked op), so if occasionally this call is processed per-partes, the read attempt have to fail.

More details: On modern Storport (as in Windows Server 2025), StartIo can run on multiple CPUs/queues much more aggressively than older builds, so two threads can actually read the same last_srb_id and hand out duplicate IDs. When the host later completes those I/Os, the driver can mismatch which SRB it completes or fail to find one (“No SRB found for ID”), causing a transient bad read that SQL Server retries — exactly the “succeeded after failing 1 time” messages mentioned above.

But please note: although it's not relevant to this specific bug, version 0.1.292 contains fresh, Sep 26, 2025 revert (commit a6d690a - Revert "NO-SDV [vioscsi] Reduce spinlock management complexity") for the @benyamin's Nov 20, 2024 initial commit 15e64ac - which is already included in 0.1.271. The corresponding PRs are 1175 and 1293. So this commit/revert is also generally suspicious (check the threads), but as the initial commit is already present in the very well working 0.1.271, I can safely refuse any potential negative impact to this read-retry problem. And although this commit/revert is mainly about potentially buggy driver initialization (without expected runtime influence), you've to be always cautious about similar commits/reverts.

Long story short: I've analyzed, isolated, solved, validated and described the root cause of this read-retry issue on Windows Server 2025.
And I hope this long post will serve as either educational or motivational kick for you, to be aware that bug hunting, iterating and driver building is definitely not a nuclear science, as I served to myself as a zero-to-hero out of necessity. My GPT-5 Thinking buddy further generated probably viable code diff for the buggy commit 1bbc422, so this (my) revert is just a temporary solution.

Bottom line: I had completely zero previous experience with the windows driver build process. But with the help of GPT-5, it was really easy, in addition to this clear virtio-win driver guide. But even if I've the EV code-signing certificate, still it's not possible to sign Windows Server drivers this way. This is really not simple process for an individual, but quite often for an medium company like Proxmox.

Technically, you will need to install and run Windows Hardware Lab Kit (HLK), process the required driver tests, submit all to MS Partner Central (here's your EV cert required - both for account authentication and signing the submission CAB file), and then Microsoft returns a dashboard/WHQL signature if all is clear. As you can see, this is definitely a no go for any bug hunting and testing.

Instead, there's a quite straightforward path - shipped virtio source (I mean the *.src.rpm and the internal-kvm-guest-drivers-windows folder) already includes test-signing certificate (and the corresponding signing batch files), so all you need is disable secure boot for the testing Windows VM (in the UEFI/boot phase), then enable test-signing (via bcdedit /set testsigning on) and at this point, you can install your fresh driver build.

In my case, I had installed complete virtio drivers for 0.1.285, and was just changing/testing the vioscsi drivers (every single build for every single commit/revert compiled). Here you should be careful to correctly version the builds (in the INF file). For instance, the baseline for 0.1.292 is 100.102.104.29200, so I created multiple subsequent 100.102.104.29201, 100.102.104.29202, 100.102.104.29203, etc. driver versions. It's really important to retain correct numbering, otherwise your driver list will be a complete mess (and you always need the correct upstream version binding).

I’m having I/O error problems. I had version 0.1.285 — if I uninstall that version and install virtio-win-0.1.271-1 instead, should that fix these random freezes?

Whatever · Nov 12, 2025

dcuadrados said:
I’m having I/O error problems. I had version 0.1.285 — if I uninstall that version and install virtio-win-0.1.271-1 instead, should that fix these random freezes?

Try, check and report

dcuadrados · Nov 12, 2025

Whatever said:
Try, check and report

I've already completed the full driver downgrade, so let's see what happens. I hope the yellow triangles and the damn IO error stop. I’ll keep you updated.

RoCE-geek · Nov 12, 2025

dcuadrados said:
I’m having I/O error problems. I had version 0.1.285 — if I uninstall that version and install virtio-win-0.1.271-1 instead, should that fix these random freezes?

No need for a full downgrade. I just simply upgrade/downgrade one specific driver if needed (like vioscsi in this case) - just "install" the INF file from ISO and in the driver update dialog, pick it from a list.

The bug I described is usually hidden, no warnings, AFAIK. So maybe it's something else in your case.

Edit: bug demonstrator is on my github post: Issue 1453 update

dcuadrados · Nov 12, 2025

RoCE-geek said:
No es necesaria una degradación total. Simplemente actualizo/descenso un controlador específico si es necesario (como vioscsi en este caso): simplemente "instalo" el archivo INF desde ISO y en el cuadro de diálogo de actualización del controlador, lo selecciono de una lista.

El error que describí suele estar oculto, sin advertencias, según tengo entendido. Entonces tal vez sea otra cosa en tu caso.

La demostración sencilla de "mi" error es:
[CÓDIGO]fio --name=prep --filename=vioscsi-test.bin --size=10G --rw=randwrite --bs=8k --yodepth=64 --numjobs=4 --ioengine=windowsaio --direct=1 --verify=crc32c --verify_state_save=1[/CÓDIGO]

La descarga de Fio es aquí. Es mejor ejecutarlo en un sistema que no sea de producción, ya que los IOPS son bastante excesivos.

Una vez finalizado, el resultado puede ser algo como esto (WS2025 + 0.1.285):

verify: bad header rand_seed 3781814534661607460, wanted 46386204153304124 at file vioscsi-test.bin offset 10490716160,length 8192 (requested block: offset=10490716160, length=8192)
verify: bad header rand_seed 4039341050776057232, wanted 9852480210356360750 at file vioscsi-test.bin offset 9297903616, length 8192 (requested block: offset=9297903616, length=8192)
verify: bad header rand_seed 2797352160983257816, wanted 4529082704144683152 at file vioscsi-test.bin offset 10498293760, length 8192 (requested block: offset=10498293760, length=8192)
verify: bad header rand_seed 6041879509267112382, wanted 7231768112772004165 at file vioscsi-test.bin offset 155074560,length 8192 (requested block: offset=155074560, length=8192)
verify: bad header rand_seed 17018352058924133772, wanted 4026730353332486637 at file vioscsi-test.bin offset 2428387328, length 8192 (requested block: offset=2428387328, length=8192)
verify: bad header rand_seed 4743695445502591656, wanted 1887339595702535041 at file vioscsi-test.bin offset 8976629760, length 8192 (requested block: offset=8976629760, length=8192)
verify: bad header rand_seed 6872855065889476167, wanted 456666279995977728 at file vioscsi-test.bin offset 3513737216,length 8192 (requested block: offset=3513737216, length=8192)
verify: bad header rand_seed 6006877349917101176, wanted 4726550845720924880 at file vioscsi-test.bin offset 7052009472, length 8192 (requested block: offset=7052009472, length=8192)
verify: bad header rand_seed 567683974876396272, wanted 1277040692297970740 at file vioscsi-test.bin offset 7239606272,length 8192 (requested block: offset=7239606272, length=8192)
verify: bad header rand_seed 2301109735957024629, wanted 3552800069219946112 at file vioscsi-test.bin offset 6071353344, length 8192 (requested block: offset=6071353344, length=8192)
verify: bad header rand_seed 4401395713825993800, wanted 9587931803982061130 at file vioscsi-test.bin offset 10211753984, length 8192 (requested block: offset=10211753984, length=8192)
verify: bad header rand_seed 10046267794706811499, wanted 7153490563776494 at file vioscsi-test.bin offset 4108222464, length 8192 (requested block: offset=4108222464, length=8192)

What I have done is uninstall driver version 0.1.285 and install version 0.1.271. The problem is that it’s a production server: I’ve now carried out some stress tests both with DiskSpd on the Windows servers and with FIO from the Proxmox host; so far it has held up well, I will keep monitoring. My mistake is that only on the Windows Server 2025 machines the servers randomly end up with a yellow triangle and IO error, and until you reset you cannot start them again. I hope this has worked.

D

Thread 'Random IO Error Server 2025 Data Center'

Nov 12, 2025

Hello everyone,

I'm experiencing a random "IO Error" that causes my two Windows Server 2025 Data Center VMs to randomly halt (yellow triangle in Proxmox). A reset/reboot resolves the issue temporarily.

My environment details are below. I suspect a potential conflict with my configuration, possibly related to I/O or the high RAM usage.

Node and Storage

Node: Proxmox VE 9.0.11 on Linux 6.14 kernel.

CPU: Intel Xeon E-2288G (16C).

RAM Usage: High (approx. 89% of 31 GiB).

Storage: ZFS pool built on two 960 GB Samsung NVMe SSDs (S.M.A.R.T. OK, low wearout).

Repo Status: Non...

RoCE-geek · Nov 12, 2025

Sorry, the test I've sent here is not complete, it's just one part, will update it soon. Don't try it as is.

emunt6 · Nov 13, 2025

dcuadrados said:
I’m having I/O error problems. I had version 0.1.285 — if I uninstall that version and install virtio-win-0.1.271-1 instead, should that fix these random freezes?

If you want permanent fix, already mentioned in this thread:
- Use SATA disk emulation with Default controller (LSI 53C895A),
- Do not use virtio- emulation in Windows VM,
- Other VMs works fine with virtio (Linux),

This can be fix up to Windows Server 2025,
for Windows Server 2025 there is no fix (There is no driver for legacy SATA controller - LSI 53C895A).
It will drop performance? Yes,but it will be stable and error free.

It is very sad to see, the upstream KVM/QEMU devs cannot solve this long exsisting problem, but other Virt. platform did ( Vmware )

fiona · Nov 13, 2025

emunt6 said:
- Use SATA disk emulation with Default controller (LSI 53C895A),

That's a SCSI controller, not SATA

RoCE-geek · Nov 13, 2025

OK, I just posted on my github thread a bug demonstrator and 2 possible patches: Issue 1453 update

benyamin · Nov 20, 2025

@RoCE-geek

Thank you very much for your efforts troubleshooting these recent problems and for reporting the storage issue at virtio-win.

Also: please accept my apologies for not responding to your mentions here (on the Proxmox forums).
I did not receive the usual email notifications - for the mentions nor for new posts in this watched topic...
Perhaps the same has happened for the RH devs too...

RoCE-geek · Nov 21, 2025

Just an intermediate update: while I'm quite sure that at least my latest Patch 3 is more than competent and production-ready candidate, in the hunt of another WS2025-related problems on PVE I found a whole bunch of another threads about the same:

1) Completely saturated guest CPUs since the VM start under Win11/WS2025
2) Significant "HOST-SIDE" increase of idle CPU load of Win11/WS2025 VMs (while guest/VM-load seem untouched, i.e. remain "zero")

The problem is that not everybody differentiate between these two main symptoms, so there're usually mixed reports within the threads like these:
High VM-EXIT and Host CPU usage on idle with Windows Server 2025,
CPU type `host` is significantly SLOWER than `x86-64-v2-AES`,
Win Server 2k25 - QEMU Disk ultra slow, etc.

ad 1) is expected to be about active nested virtualization / VBS (Virtualization-based Security), anyway @JonKohler pointed out it's more specifically about memory integrity / HVCI (see his KVM-forum presentation: https://www.youtube.com/watch?v=MooRtyPkxXc).

We have two independent solutions/workarounds so far:

- if you really need a "HOST" CPU (i.e. non-masked), you've to disable VBS, e.g. in a way like @benyamin described here.
- if you can, use any masked CPU type (like the "archaic" x86-64-v2-AES, or something modern like "x86-64-v3" or even "x86-64-v4" with AVX-512 support). In this case, Windows OS instance should be aware it's running as a VM and will not active the "bad stuff".

ad 2) In the PVE/Debian land, we have no real solution for this yet.
Instead, we have dozens of desperate forums users, i.e. possibly hundreds or thousands of desperate users in the wild.
Anyone on Intel with WS2025 or Win11 (24H2 / 25H2) is affected: Even if you see "almost zero" idle load inside the VM, on the PVE host you may see non-marginal CPU utilization and when you multiply that by the number of VMs per host/machine, it becomes a significant parasitic load.

And it's not just about such added CPU load, there are other side-effects, like the crippled Turbo Boost, as @nodoame highlighted here.

This reminds me very much of my experience last year (aka "hey, it's all about the drivers"). In other words: It is widely known that something is not working, but it seems that no one has started to address the issue yet. And with all respect, this seems to be easily applicable to the Proxmox staff members here, as the only response I could briefly find in these threads from the qualified personnel was This sounds really weird (and just wrt ad 1).

To be clear, this is not a witch-hunt, it's more about kindly reminder: HOUSTON, YOU/WE HAVE A PROBLEM, which has to be solved.
/And just for the record, some other commercial KVM-based systems like Nutanix already resolved this, so it's definitely widely known problem/.

And don't get me wrong, no one expects that the Proxmox staff should solve it since the very first report, but when so many undoubtedly experienced users and PVE administrators are trying to solve it on their own so long, and the "official channels" remain silent, something is not ballanced. And I had exactly the same feeling last year. And exactly for this reason I suddenly became a virtio-win contributor, out of nessesity and sheer desperation.

You know, sometimes it's just sufficient to write: "We're aware of the problems you're suffering, and we'll start addressing them in the near future."

FYI @fiona, @fweber, @t.lamprecht, @fabian, @aaron

benyamin · Nov 22, 2025

I seem to be getting notifications again - at least for watched topics - so that's good...

Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"

New Member

Renowned Member

Active Member

Active Member

Active Member

Member

Active Member

Active Member

New Member

Renowned Member

New Member

Active Member

New Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Member

Active Member

Member

We value your privacy