Unusual I/O Delay on Proxmox 8.4 with ZFS RAID-Z1

AllanA

New Member
Jul 29, 2025
2
0
1
Hello,
I'm experiencing a high I/O delay issue, sometimes reaching up to 45%, which severely impacts the performance of my system. I've read through several threads on the subject, but none have provided a reliable solution so far.


The issue mainly occurs during:


  • Cloning virtual machines via the Proxmox web interface.
  • Restoring backups to the server.

During these operations, disk latency spikes significantly, while CPU usage remains low. Disk I/O activity appears to be the main contributing factor.

I've attached a screenshot showing a VM being cloned, where you can see the I/O delay spiking significantly.

Here is my setup:


  • Motherboard: Supermicro X11SPI-TF
  • CPU: 2× Intel Xeon Gold 6246R @ 3.4 GHz (64 threads)
  • Storage: ZFS with RAID-Z1
  • Proxmox VE: 8.4 (Kernel 6.8.12-10-pve)
  • Storage: 5× 1.8 TB SATA SSDs (WD Red SA500 2.5"), configured in ZFS RAID-Z1.
    System disk: 280 GB LVM volume used for root and swap.

I'm open to any suggestions or ideas to help resolve or improve this issue.
I can provide more details if needed.


Thanks in advance!
 

Attachments

  • forum_proxmox.png
    forum_proxmox.png
    66.5 KB · Views: 2
Hello everyone,


First of all, thank you for your feedback — it has really helped me better understand the issue I’m facing.


I’m currently considering deleting my existing RAID-Z1 ZFS pool and replacing it with a RAID-10 (striped mirrors) configuration using the same disks.


Since I’m not in a position to replace the drives at the moment, would switching to RAID-10 still help reduce the I/O delay ?


Here’s the layout I’m planning :

mirror-0
disk1
disk2

mirror-1
disk3
disk4

mirror-2
disk5
disk6

Thanks in advance for your insights!
 
would switching to RAID-10 still help reduce the I/O delay ?

If you have all drives in a single RaidZ you get the IOPS of a single device.

That three mirrors deliver... the IOPS of three devices.

So, yes, it triples the IOPS. But no, I do not know if that is a sufficient solution for your problem in your use case. If @LnxBil is right and those SSDs lack PLP everything is suboptimal, at least for usage as VM storage.

The other approach is to add a "Special Device", consisting of two (small, below 1 percent of the net capacity) Enterprise Class SSDs/NVMe with PLP for Metadata. Search for it, it has been discussed and described several times here in the forum.

----
Edit: also the blocksize/recordsize/volblocksize may be bad. I am not able to explain it on the fly, but "wrong" settings can possibly destroy performance. @Dunuin had posted some very good explanations some time ago...
 
Last edited: