Windows guest IO performance

Antrill

New Member
Sep 24, 2025
4
0
1
Hi,
my background is 20+ years Linux and Windows experience, working for an SMB, virtualization was done with VMware as long as is was sanely priced. Since it is no more, we (as many others) are looking for alternatives and I really like Proxmox for the openness and simplicity. I want this to be our migration target platform but I need to make sure performance is mostly on par with the former solution on the hardware we just bought 1 year ago. (3 node cluster, all-flash Alletra SAN storage over iSCSI with 25 GBE connections). I was very excited to see the new snapshot feature with thick provisioned LVM in Proxmox 9, then did some testing. Unfortunately I hit a wall, and it is not related to the new feature.
Since we have a majority of Windows VMs, some of which host the ERP databases that need to be fast, I tested with the tool recommended by the vendor, Atto benchmark, specifically with small block sizes from 4k to 64k and max. queue depth=4. I used to get 20k IOPS in a Windows guest on ESXi, but in Proxmox it was just 8k. Then I retested on a local ssd storage in a test machine, getting 12k IOPS, so it wasn't the SAN / switches, NICs. (Yes, I did use VirtIO SCSI single and io_uring for the disks.)
To be able to see the virtualization overhead and possible hardware limitation, I did a comparable test with fio directly on the host and got great numbers, 470k IOPS with 4k blocks. I did the same in a Debian Trixie VM, also getting just above 400k IOPS. I thought I had some error in my fio job definition, but then I did the same test with fio on a Windows VM again. I got almost the same values as with Atto, between 9k and 12k IOPS. Translated to throughput, this is 40MB/s in Windows (at 4k random q=4), compared to 1500MB/s in Linux with the same fio job! Again, I used VirtIO drivers in Windows and the VM was on the same physical SSD as the Linux one. Storage was on a simple LVM, the one the installer creates "behind" the Proxmox partition on the install M.2 SSD. No ZFS, no Ceph, just a huge difference between Windows and Linux guests that I have no explanation for.
To not just rely on synthetic benchmarks I also did a HammerDB run against a MS SQL database running on VMware vs. Proxmox and the results are not as divergent but still point in the same direction. (I guess due to this test being more CPU bound and not only storage.)
Please tell me I overlooked something. I do not really like the prospect of going the Hyper-V route...
 
Last edited:
Hi,
my background is 20+ years Linux and Windows experience, working for an SMB, virtualization was done with VMware as long as is was sanely priced. Since it is no more, we (as many others) are looking for alternatives and I really like Proxmox for the openness and simplicity. I want this to be our migration target platform but I need to make sure performance is mostly on par with the former solution on the hardware we just bought 1 year ago. (3 node cluster, all-flash Alletra SAN storage over iSCSI with 25 GBE connections). I was very excited to see the new snapshot feature with thick provisioned LVM in Proxmox 9, then did some testing. Unfortunately I hit a wall, and it is not related to the new feature.
Since we have a majority of Windows VMs, some of which host the ERP databases that need to be fast, I tested with the tool recommended by the vendor, Atto benchmark, specifically with small block sizes from 4k to 64k and max. queue depth=4. I used to get 20k IOPS in a Windows guest on ESXi, but in Proxmox it was just 8k. Then I retested on a local ssd storage in a test machine, getting 12k IOPS, so it wasn't the SAN / switches, NICs. (Yes, I did use VirtIO SCSI single and io_uring for the disks.)
To be able to see the virtualization overhead and possible hardware limitation, I did a comparable test with fio directly on the host and got great numbers, 470k IOPS with 4k blocks. I did the same in a Debian Trixie VM, also getting just above 400k IOPS. I thought I had some error in my fio job definition, but then I did the same test with fio on a Windows VM again. I got almost the same values as with Atto, between 9k and 12k IOPS. Translated to throughput, this is 40MB/s in Windows (at 4k random q=4), compared to 1500MB/s in Linux with the same fio job! Again, I used VirtIO drivers in Windows and the VM was on the same physical SSD as the Linux one. Storage was on a simple LVM, the one the installer creates "behind" the Proxmox partition on the install M.2 SSD. No ZFS, no Ceph, just a huge difference between Windows and Linux guests that I have no explanation for.
To not just rely on synthetic benchmarks I also did a HammerDB run against a MS SQL database running on VMware vs. Proxmox and the results are not as divergent but still point in the same direction. (I guess due to this test being more CPU bound and not only storage.)
Please tell me I overlooked something. I do not really like the prospect of going the Hyper-V route...

Hi, check your Windows VM Settings, change CPU-Type to x86-64-v2 or x86-64-v3. + Hard Disk Cache Settings, tick Discard and Cache: Write back.
We have few Windows VMs with GPU-Passtrough, our Settings for Max-Performance Desktop Experience
.
Screenshot 2025-09-24 095208.png
 
Hi,
thanks for the reply.
I had CPU=host but tried these settings. Unfortunately, no change in my IO benchmark numbers.
 
What caught my attention was the claim of reaching 400K IOPS inside a VM at queue depth 4. In practice, most NAND-based NVMe devices can’t sustain that level of performance. A partial explanation might be that your sequential I/O pattern, combined with the NVMe device’s read-ahead behavior, is essentially causing the controller’s cache to serve many of those reads. Without that, the numbers are unusual on today’s commercially available NAND.

To dig deeper, I’d suggest testing with a QD1 random write over a 64 GB dataset. Compare the bare-metal latency with the in-VM latency. This will help isolate the virtualization overhead.And, if you want to cut to the chase with some practical guidance:
  • Make sure you are using an iothread.
  • Try aio=native.
  • If your system has 2 physical CPUs, consider using CPU affinity to bind processing to a single CPU package. Preferably the one with the closest NUMA proximity to the network device.
If you’re interested, here’s a detailed performance study of Windows guests in Proxmox that covers controllers, AIO models, and iothread options:

A Comprehensive Evaluation for Peak Efficiency of Windows Guests on Shared Block Storage

And if your focus is on single-VM performance, you may find this more relevant:

Unloaded Performance Study of Windows Guests on Shared Block Storage


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
One more question: For the LVM storage that you tested against. Did you have the "Allow Snapshots as Volume-Chain" enabled?

It's critically important to understand whether you are testing on a native LVM LV or a QCOW nested within an LV. The latter is known to introduce performance issues with or without active snapshots.


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox