[SOLVED] Proxmox VE 7.4-13 High I/O Delay Only With Windows 11 VMs

mhayhurst

Renowned Member
Jul 21, 2016
111
7
83
43
Hello everyone! I'm experiencing high IO delays (18% - 30% and sometimes higher) when I start a Windows 11 Pro VM on Proxmox VE 7.4-13. I suspect it might be caused by running everything on a ZFS RAID 1 storage but then again I do not experience high IO delays when I run a Linux VM...it's only when I start the Windows 11 VM. I'm only running one Linux and one Windows 11 VM on this hardware so that should not be overloading anything.

My hardware:
Intel NUC 12 Intel Core i7-1260P 12-Core, 3.4 GHz–4.7 GHz Turbo - NUC12WSHi7

Crucial RAM 64GB Kit (2x32GB) DDR4 3200MHz Laptop Memory - CT2K32G4SFD832A

Crucial P2 2TB 3D NAND NVMe PCIe M.2 SSD Up to 2400MB/s - CT2000P2SSD8 (ZFS RAID 1)

Crucial BX500 2TB 3D NAND SATA 2.5-Inch Internal SSD, up to 540MB/s - CT2000BX500SSD1 (ZFS RAID 1)


Would this high IO delay be caused by the two different speeds on my storage devices (NVMe and SSD)? If so, is there anything I could change in the ZFS configuration to remedy this issue or is there something else I should be looking at?
 
you can try to disable zfs sync

zfs set sync=disabled <zfspool>

but be carefull, you can loose last x seconds of write.

zfs really need datacenter ssd with powerloss protection for fast fsync. If you really want to use consumer ssd with zfs, you should at least use small datacenter ssd as zfs slog device.

(fsync on consumer ssd is something like 200 iops vs 20000 iops on datacenter ssd)
 
you can try to disable zfs sync

zfs set sync=disabled <zfspool>

but be carefull, you can loose last x seconds of write.

zfs really need datacenter ssd with powerloss protection for fast fsync. If you really want to use consumer ssd with zfs, you should at least use small datacenter ssd as zfs slog device.

(fsync on consumer ssd is something like 200 iops vs 20000 iops on datacenter ssd)

Thank you for the reply and information! I'm not opposed to using Enterprise/Data Center SSD's but would I need to replace both my SATA SSD and NVMe PCIe M.2 with an Enterprise/Data Center version or just the SATA SSD? I was assuming the SATA SSD is the bottleneck? I was thinking the 2TB version of this but not sure?
 
Thank you for the reply and information! I'm not opposed to using Enterprise/Data Center SSD's but would I need to replace both my SATA SSD and NVMe PCIe M.2 with an Enterprise/Data Center version or just the SATA SSD? I was assuming the SATA SSD is the bottleneck? I was thinking the 2TB version of this but not sure?
any consumer drive without supercapacitor (plp protection) will be slow. (because the supercapacitor is like a battery, and write/fsync are keeped in ssd memory).

so both m2 and sata ;) (and I don't think they are any m2 format with plp, as it's too small physically)

The problem with fsync, is that if you write a small 4k block , it'll write a full nand cell of the ssd (32/64MB). That mean it'll be slow and you'll burn it fast. (write amplication).

The plp keep write in memory cache, then write a full nand cell once.

the kingston is ok. (any DC grade ssd or nvme should be ok).

If you are low in budget, you could use a small DC ssd/nvme for zfs slog (they will take the write/fsync) then they will flush datas on your consumer ssd in background (with big writes)
 
Last edited:
(and I don't think they are any m2 format with plp, as it's too small physically)
Micron 7450 M.2 has PLP (and i do enjoy the tens of thousands fsync/sec according to pveperf) as do some of their other models. Also some Samsung M.2 appear to have it (PM983/9A3?).
 
Last edited:
any consumer drive without supercapacitor (plp protection) will be slow. (because the supercapacitor is like a battery, and write/fsync are keeped in ssd memory).

so both m2 and sata ;) (and I don't think they are any m2 format with plp, as it's too small physically)

The problem with fsync, is that if you write a small 4k block , it'll write a full nand cell of the ssd (32/64MB). That mean it'll be slow and you'll burn it fast. (write amplication).

The plp keep write in memory cache, then write a full nand cell once.

the kingston is ok. (any DC grade ssd or nvme should be ok).

If you are low in budget, you could use a small DC ssd/nvme for zfs slog (they will take the write/fsync) then they will flush datas on your consumer ssd in background (with big writes)

At this point I'm desperate to get this working correctly so I'm willing to purchase new storage devices if it would eliminate this issue permanently. I assume this is a really bad fsync rate?

Bash:
root@proxmox2:~# pveperf
CPU BOGOMIPS:      79872.00
REGEX/SECOND:      7026403
HD SIZE:           1512.89 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     281.96
 
As you hare running a simple nuc system with 2 drive and not 4-5.. just install prox in ext4 on 1 and put vm on the faster drive. And all will be ok, no need of zfs on limited drive... multiple like those 24bay drives in datacenter.. sure.
 
As you hare running a simple nuc system with 2 drive and not 4-5.. just install prox in ext4 on 1 and put vm on the faster drive. And all will be ok, no need of zfs on limited drive... multiple like those 24bay drives in datacenter.. sure.
There's no data redundancy installing Proxmox on 1 storage device and the VM's on another. My current installation using 2 storage devices in ZFS RAID 1 at least allows one storage device failure without data loss. I'm running a VMS server for my security cameras and a media server (PLEX) so I do not want to lose data.
 
Last edited:
Just an update, I ended up purchasing both:

SAMSUNG 870 EVO 2TB SATA III SSD (MZ-77E2T0B/AM)
Samsung 990 PRO Series - 2TB NVMe 2.0c - M.2 (MZ-V9P2T0B/AM)


I received the Samsung 870 EVO SSD yesterday, installed it, resilvered my ZFS pool and it made a huge difference already! Now I'm only getting IO Delays between 0% - 0.5% and sometimes 1% or 2% even with my Windows 11 VM powered on and Blue Iris VMS running. Fsyncs are still not great but maybe after I install the new M.2 fsyncs will improve:

Bash:
root@proxmox2:~# pveperf
CPU BOGOMIPS:      79872.00
REGEX/SECOND:      6065192
HD SIZE:           1358.76 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     82.05
DNS EXT:           22.21 ms
DNS INT:           7.40 ms

Just want to say thanks for all the help! It's unfortunate I made this mistake and now need to replace all the storage devices in my NUCs but sometimes the best lessons are hardest learned.
 
see you next year, or two , if light write storage usage, to check wear level / data written.
1200 TBW for your both drives, fingers crossed they don't fail at the same time + the nvme will be limited to the sata drive speed. don't expect write speed over 500 MB/s ...
Datacenter models are strongly recommended for ZFS because they have 3x-4x more endurance , like Kingston DC500M 2TB with its 4500 TBW.
ZFS isn't cheap.
 
see you next year, or two , if light write storage usage, to check wear level / data written.
1200 TBW for your both drives, fingers crossed they don't fail at the same time + the nvme will be limited to the sata drive speed. don't expect write speed over 500 MB/s ...
Datacenter models are strongly recommended for ZFS because they have 3x-4x more endurance , like Kingston DC500M 2TB with its 4500 TBW.
ZFS isn't cheap.
Guess we will see, I have another NUC using:

Crucial P2 2TB 3D NAND NVMe PCIe M.2 SSD Up to 2400MB/s - CT2000P2SSD8 (ZFS RAID 1)
Crucial BX500 2TB 3D NAND SATA 2.5-Inch Internal SSD, up to 540MB/s - CT2000BX500SSD1 (ZFS RAID 1)

for about 1.5 years and no issues.

Is there an equivalent M.2 drive for the Kingston DC500M?
 
Hi @mhayhurst !
What about your disks ? Do they still run ZFS ? Speedy ? TBW ?
SAMSUNG 870 EVO 2TB SATA III SSD (MZ-77E2T0B/AM)
Samsung 990 PRO Series - 2TB NVMe 2.0c - M.2 (MZ-V9P2T0B/AM)

Crucial P2 2TB 3D NAND NVMe PCIe M.2 SSD Up to 2400MB/s - CT2000P2SSD8 (ZFS RAID 1)
Crucial BX500 2TB 3D NAND SATA 2.5-Inch Internal SSD, up to 540MB/s - CT2000BX500SSD1 (ZFS RAID 1)

for about 1.5 years and no issues.
 
Hello, ZFS has been running very well on the Samsung storage devices! I've not burnt out any drives and the I/O latency is not an issue anymore.
hello want to ask
I use a Samsung 990 EVO with Heatshink with ZFS

problems sometimes failed in smart status, do you experience this??
when the server is restarted it passed again in status, but as time it fails again

did you upgrade the Samsung firmware?? Is there a way to upgrade the firemware via Proxmox?

If you update the firmware, is data lost?

1718767606803.png


1718767552884.png
 
hello want to ask
I use a Samsung 990 EVO with Heatshink with ZFS

problems sometimes failed in smart status, do you experience this??
when the server is restarted it passed again in status, but as time it fails again

did you upgrade the Samsung firmware?? Is there a way to upgrade the firemware via Proxmox?

If you update the firmware, is data lost?

Hello, I have not noticed any S.M.A.R.T. failures with my Samsung 990's. I also have not upgraded the Samsung 990's firmware. But here is the firmware version fwupdmgr get-devices provided:

1718851012735.png


You might be able to use fwupdmgr to update the firmware but I cannot say for certain.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!