[SOLVED] Poor disk speeds in Windows 11 guest

joeplaa

Member
Jul 1, 2021
8
5
8
39
I recently migrated my Windows 10 virtual desktop to Windows 11. However, the system has become very unresponsive and feels really sluggish. There is a lot of delay when clicking or typing.

I have scoured the Proxmox forums and Google, but so far I haven't found a solution. I ran some benchmarks in similar Windows 10 and 11 guests and narrowed it down to the virtual disk speeds. The difference is huge: the performance in Windows 11 is ~50% (or even worse with random rw) of that in Windows 10. I double checked the configs (pasted below) and I can't see a difference that could explain this. Both virtual drives are stored on a 6 drive RAID-Z2 SSD array with ZFS as filesystem and 8GB of ARC.

Results Windows 10 (also attached as image)
Code:
------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
                                  Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
  SEQ    1MiB (Q=  8, T= 1):  9729.605 MB/s [   9278.9 IOPS] <   763.75 us>
  SEQ    1MiB (Q=  1, T= 1):  1129.767 MB/s [   1077.4 IOPS] <   922.67 us>
  RND    4KiB (Q= 32, T= 1):   129.146 MB/s [  31529.8 IOPS] <   981.60 us>
  RND    4KiB (Q=  1, T= 1):    11.247 MB/s [   2745.8 IOPS] <   360.15 us>

[Write]
  SEQ    1MiB (Q=  8, T= 1):  7922.494 MB/s [   7555.5 IOPS] <  1042.13 us>
  SEQ    1MiB (Q=  1, T= 1):   925.921 MB/s [    883.0 IOPS] <  1126.28 us>
  RND    4KiB (Q= 32, T= 1):    95.687 MB/s [  23361.1 IOPS] <  1360.97 us>
  RND    4KiB (Q=  1, T= 1):    11.334 MB/s [   2767.1 IOPS] <   357.52 us>

Profile: Default
   Test: 1 GiB (x5) [C: 59% (37/63GiB)]
   Mode: [Admin]
   Time: Measure 5 sec / Interval 5 sec
   Date: 2022/08/14 17:52:38
     OS: Windows 10 Professional [10.0 Build 19044] (x64)

Results Windows 11 (also attached as image)
Code:
------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
                                  Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
  SEQ    1MiB (Q=  8, T= 1):  4430.158 MB/s [   4224.9 IOPS] <  1626.83 us>
  SEQ    1MiB (Q=  1, T= 1):   556.173 MB/s [    530.4 IOPS] <  1819.13 us>
  RND    4KiB (Q= 32, T= 1):    24.589 MB/s [   6003.2 IOPS] <  5131.38 us>
  RND    4KiB (Q=  1, T= 1):     2.808 MB/s [    685.5 IOPS] <  1392.71 us>

[Write]
  SEQ    1MiB (Q=  8, T= 1):  3661.265 MB/s [   3491.7 IOPS] <  1820.54 us>
  SEQ    1MiB (Q=  1, T= 1):   438.058 MB/s [    417.8 IOPS] <  2326.13 us>
  RND    4KiB (Q= 32, T= 1):    22.055 MB/s [   5384.5 IOPS] <  5851.68 us>
  RND    4KiB (Q=  1, T= 1):     3.353 MB/s [    818.6 IOPS] <  1158.43 us>

Profile: Default
   Test: 1 GiB (x5) [C: 53% (67/127GiB)]
   Mode: [Admin]
   Time: Measure 5 sec / Interval 5 sec
   Date: 2022/08/14 17:46:40
     OS: Windows 11 Professional [10.0 Build 22000] (x64)

Here is what I tried so far:
  • disabling ballooning: didn't help
  • changing machine version: no change
  • changing video card to VirtIO with more memory: no change
  • using 2 or 1 CPU socket: no change
I must be missing something, but I can't find it. I hope it is "just" a Proxmox setting. That would save me from reinstalling my Windows guest all over again. Although, looking at the configs I fear it is a Windows 11 issue. I'm running Proxmox on a HP ml350p server, I got second hand, with 2 Xeon 8 core 2650v2 CPU's with 112 GB of RAM. The Windows upgrade tool warned me about these CPU's as being too old. So I tricked them by changing them to qemu64 in Proxmox. Afterwards I changed them back to Host. However, the CPU benchmarks returned identical results on Win10 and Win11, so I don't really suspect the CPU.

Windows 10 guest
Code:
agent: 1,fstrim_cloned_disks=1
balloon: 2048
boot: order=sata0;scsi0;net0
cores: 8
cpu: host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+aes
hotplug: disk,network,usb,memory,cpu
localtime: 1
machine: q35
memory: 8192
name: jodiWin10
net0: virtio=E6:98:6D:B5:9A:29,bridge=vmbr2,firewall=1,tag=30
numa: 1
onboot: 1
ostype: win10
sata0: none,media=cdrom
scsi0: jplsrv-zfs:vm-153-disk-0,cache=writeback,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=44cc25a3-cf5a-4f54-910d-b169c2779df4
sockets: 1
tablet: 1
vmgenid: 594df829-3de8-42be-a023-3f18aacf4c92

Windows 11 guest
Code:
agent: 1
balloon: 4096
bios: ovmf
boot: order=sata0;scsi0;net0
cores: 8
cpu: host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: jplsrv-zfs:vm-192-disk-1,efitype=4m,pre-enrolled-keys=1,size=1M
hotplug: disk,network,usb,memory,cpu
localtime: 0
machine: pc-q35-5.2
memory: 8192
meta: creation-qemu=6.1.0,ctime=1644500696
name: PCJOEP-WIN11
net0: virtio=B2:21:00:20:63:E2,bridge=vmbr2,firewall=1,tag=20
numa: 1
onboot: 1
ostype: win11
sata0: none,media=cdrom
scsi0: jplsrv-zfs:vm-192-disk-2,cache=writeback,discard=on,iothread=1,size=128G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=5c5c3f84-afca-4512-8027-c7e0dc625205
sockets: 1
tablet: 1
tpmstate0: jplsrv-zfs:vm-192-disk-0,size=4M,version=v2.0
vmgenid: 573547b1-6091-4980-bd86-94fe768c1abc
 

Attachments

  • CrystalDiskMark_Win10.png
    CrystalDiskMark_Win10.png
    22.1 KB · Views: 14
  • CrystalDiskMark_Win11.png
    CrystalDiskMark_Win11.png
    22.8 KB · Views: 14
Its hard to benchmark the storage and CrystalDiskMark isn't doing anything. You are basically only benchmarking your RAM. First is Win caching in RAM, then you enabled writeback which again caches in the hosts RAM, then you got ZFS's ARC which again caches in RAM. So you basically cache everything three times. So according to your theory the RAM performance should be the problem and not the drives themselves.

If you really want to benchmark your storage and not the RAM you could:
1.) disable caching in Win as good as possible
2.) switch cache mode of virtual disk from "writeback" to "none"
3.) disable ARC caching by zfs set primarycache=none YourPoolName
4.) use the program fio with sync writes and very big workloads (like couple of hundrets of GBs so it can't fit your RAM) to benchmark instead of CrystalDiskMark
 
Thanks for the fast reply. This is really helpful as I'm still learning all these things. It makes total sense that I'm actually benchmarking RAM as I couldn't really believe these speeds anyway.

So I set the cache mode to default (none) and disabled the ARC cache for both VM's. I first ran CrystalDiskMark again, just to see what would happen. The results came back far more realistic, but still with a difference. And writing was still using RAM for caching. (attached for the curious).

Then I ran fio benchmark...man it took me a while to get it working correctly on Windows:confused:. There are a few questions I still have about fio:
  1. How do I determine what blocksize --bs to use? I now pretty randomly chose 16k, but I also saw 4k, 64k, 1m.
  2. What does iodepth do exactly? When do I use what value?
  3. What ioengine to choose? I now took windowsaio but should I have taken sync?
Anyway, I attached a 128GB disk to each VM and ran these commands in a script (hence the size of 16g opposed to "couple of hundrets of GBs" or do I simulate that in another way?):
Code:
REM Random write test for IOP/s
START /WAIT fio --name=random-write --ioengine=windowsaio --thread --rw=randwrite --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-rand-write.txt

REM Random read test for IOP/s
START /WAIT fio --name=random-read --ioengine=windowsaio --thread --rw=randwrite --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-rand-read.txt

REM Random read-write test for IOP/s
START /WAIT fio --name=random-readwrite --ioengine=windowsaio --thread --rw=randrw --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-rand-readwrite.txt

REM Sequential write test for throughput
START /WAIT fio --name=seq-write --ioengine=windowsaio --thread --rw=write --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-seq-write.txt

REM Sequential read test for throughput
START /WAIT fio --name=seq-read --ioengine=windowsaio --thread --rw=read --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-seq-read.txt

REM Sequential read-write test for throughput
START /WAIT fio --name=seq-readwrite --ioengine=windowsaio --thread --rw=readwrite --bs=16k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=fio-seq-readwrite.txt

Results:
seq rwseq rseq wrand rwrand rrand w
Read speed [MB/s]
Win1016.18.5-8.331.6-
Win116.13.6-2.91.5-
Write Speed [MB/s]
Win1016-1118.3-32.6
Win116.1-34.72.8-6.2
Read IOPS
Win10982519-5041930-
Win11372221-17490-
Write IOPS
Win10979-6758504-1991
Win11370-2115173-375
 

Attachments

  • CrystalDiskMark_Win10.txt
    1.2 KB · Views: 3
  • CrystalDiskMark_Win11.txt
    1.2 KB · Views: 1
and disabled the ARC cache for both VM's
Just don't forget to enable caching again later after benchmarking by changing primarycache from "none" to "all".
And writing was still using RAM for caching
Async writes will use write caching, sync writes will not. So you either could tell fio to only do sync writes (thats why CrystalDiskMark itsn't useful as its only doing async writes with no option to do sync writes) or you force ZFS to handle all async writes as sync writes by using zfs set sync=always YourPool (and set it value back to "standard" after benchmarking).
How do I determine what blocksize --bs to use? I now pretty randomly chose 16k, but I also saw 4k, 64k, 1m.
That depends on what you want to measure. If you want to benchmark IOPS (so how many small random writes/reads the storage can handle...for example for workloads like DBs) use --bs 4k --rw=randwrite. If you want to benchmark throughput (so for workloads like streaming a video, where you sequentially read/write big files) then use something like --bs 1M --rw=write.
What does iodepth do exactly? When do I use what value?
It tells fio to do more operations in parallel. For latency tests you want it to be 1 (so if you want to see the worst case scenario). Higher iodepth will show you better performance, expecially when using SSDs, as SSDs aren't that fast at writing to NAND but can write to a lot of NAND in parallel.
What ioengine to choose? I now took windowsaio but should I have taken sync?
aio is for async writes. If you don't want write caching you should use "sync". When using "--ioengine=sync" don't forget to also add "--direct=1".

So something like this should show you the worst case scenario that could hit your storage:
Code:
START /WAIT fio --name=sync-write-latency --ioengine=sync --thread --rw=randwrite --bs=4k --size=16g --numjobs=1 --iodepth=1 --direct=1 --runtime=60 --time_based --group_reporting --output=sync-write-latency.txt
START /WAIT fio --name=read-latency --ioengine=sync --thread --rw=randread --bs=4k --size=16g --numjobs=1 --iodepth=1 --direct=1 --runtime=60 --time_based --group_reporting --output=read-latency.txt
 
Thanks @Dunuin for the elaborate explanation. Really appreciate it. I see I did a lot of things incorrectly in the past experimenting with fio. Lot of re-benchmarking to do. Caching is cool for performance, but a bitch when debugging. Hopefully I have some time this weekend to run the tests you propose.
 
Some unexpected progress today.

I was pondering the disk benchmarks and my suboptimal disk layout; the 6 disk in raidz2, should have been 3 mirror vdevs (it was a stupider me when I started). So I was thinking about a way to reinstall Proxmox. One of the steps is to upload the virtio driver iso for Windows hosts.

Now, I had some issues installing version 0.1.217 in both Windows 10 and 11. That was a while ago, so I thought "let's see if they finally released an update". And yes, version 0.1.221 is available. After installing, without the "usual" errors I got with v217, everything works very smooth so far! The problem seems to be solved!

I'll keep an eye on it today. If nothing strange happens, I'm going to mark this as solved.
 
I'm still not happy with the increased latency in Windows 11, so I tried some more fio benchmarking today. Obviously this didn't go smoothly:
Code:
fio: Windows does not support direct or non-buffered io with the synchronous ioengines. Use the 'windowsaio' ioengine with 'direct=1' and 'iodepth=1' instead.

So I ran these tests instead:
Code:
CALL fio --name=async-write-latency --ioengine=windowsaio --thread --rw=randwrite --bs=4k --size=16g --numjobs=1 --iodepth=1 --direct=1 --runtime=60 --time_based --group_reporting --output=async-write-latency.txt
CALL fio --name=async-read-latency --ioengine=windowsaio --thread --rw=randread --bs=4k --size=16g --numjobs=1 --iodepth=1 --direct=1 --runtime=60 --time_based --group_reporting --output=async-read-latency.txt
CALL fio --name=sync-write-latency --ioengine=sync --thread --rw=randwrite --bs=4k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=sync-write-latency.txt
CALL fio --name=sync-read-latency --ioengine=sync --thread --rw=randread --bs=4k --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --output=sync-read-latency.txt

The results:
random read asyncrandom write asyncrandom read syncrandom write sync
Speed [MB/s]
Win104.12.93.715
Win111.50.31.56.2
IOPS
Win1010127059083665
Win11376773671501
 
Okay, I found the culprit. I updated my Windows 10 VM with all the latest software we use for software development. One of the new tools is Docker (on Windows) using WSL. This made my Windows 10 machine as slow as the Windows 11 one.

As I only used it to verify that development in Windows was possible too (I work on a machine running Ubuntu myself), I uninstalled Docker and WSL from my Windows VM's and voila...they are "fast" again (I reran the benchmarks from my last thread and they came back identical from both VM's).

So, running WSL on a physical machine seems to be workable, where on a VM it's (currently) not.

Thanks for the support, I'll close this thread.

Edit 2022-09-03: So my girlfriend is using Docker on Windows with wsl2 on a physical machine. And while I said it is workable, it is still painfully slow sometimes. I haven't solved that issue yet, but I found these possible solutions for anyone finding this topic and answer:
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!