[SOLVED] Noteable I/O Delay after upgrade to 9.1.14

ProxGamingMox · May 18, 2026

Good morning everyone.

As the title states, at midnight I did a full upgrade from version 9.0.11 over to 9.1.14 including all packages.
Since then I have noticed that I have got a somewhat stable I/O Delay of 7% and is apparently caused from 3 VMS that are running of which one is Truenas Scale, the second one being an Immich server and the third one a VM that hosts a k3s cluster for Matrix Ess Community.
As you can see from the screenshot below in the CPU Usage graph, it was basically 0% before the update and started showing up afterwards.
I also appear to have 75-80% IO Pressure Stall since updating/upgrading.
I looked around and what it came down to was that it probably is just a new metric for disk iops which is quite weird to me. If said 3 VMs are turned off IO Delay drops to 0%.
Any ideas ?

ProxGamingMox · May 18, 2026

Also worth noting that I am using kernel Linux 6.8.12-4-pve (2024-11-06T15:04Z) since my system kept freezing with anything pre 7.0.x kernel version, have not tried 7.0.x kernel version yet. Not sure if related.

ProxGamingMox · May 18, 2026

Small update :

After unchecking Discard and IO Thread from 2/3 VMs the IO Delay dropped to 3%. Could that be the culprit ?

ProxGamingMox · May 18, 2026

After turning Discard and IO thread off on all 3 vms the issue appears to have gone away ? IO Pressure Stall is also taking a nose dive.

leesteken · May 18, 2026

Maybe your VMs were trimming but with Discard disabled those trim commands won't reach the underlying (thin?) storage, which might be a problem later on. Or maybe the load went down just by restarting your VMs (which were doing some IO task).
I don't expect IO Thread to make much of a difference, except on a VM with multiple virtual disk and VirtIO SCSI single. By disabling IO Thread the VM can do less IO at the same time (with multiple disks) and the load might look lower but simply take longer. Or maybe the load went down just by restarting your VMs (which were doing some IO task).

ProxGamingMox · May 18, 2026

leesteken said:
Maybe your VMs were trimming but with Discard disabled those trim commands won't reach the underlying (thin?) storage, which might be a problem later on. Or maybe the load went down just by restarting your VMs (which were doing some IO task).
I don't expect IO Thread to make much of a difference, except on a VM with multiple virtual disk and VirtIO SCSI single. By disabling IO Thread the VM can do less IO at the same time (with multiple disks) and the load might look lower but simply take longer. Or maybe the load went down just by restarting your VMs (which were doing some IO task).

I have been at it since midnight, lots of restarts both on the host as well as the VMs, all 3 of them are on ZFS storage which could explain the discard race perhaps ? The host was left on overnight so about 7-8 hours and the delay was nailed at 7%.

leesteken · May 18, 2026

The IO delay is not that high that I would be worried. Maybe the VMs were actually doing some useful idle/background IO like trimming or scrubbing/fsck.

EDIT: Maybe find out what your VMs that you maintain are doing before changing your Proxmox configuration away from best practices.

Impact · May 18, 2026

Can you share this from the node and every guest that uses ZFS?

Bash:

zpool status -vtP
zpool get autotrim

I have some notes here on how to investiagte IO Delay that you might find useful.

ProxGamingMox · May 18, 2026

Impact said:
Can you share this from the node and every guest that uses ZFS?

Bash:

zpool status -vtP zpool get autotrim

I have some notes here on how to investiagte IO Delay that you might find unseful.

Host :
root@pve:~# zpool status -vtP
pool: share1ntelzfs
state: ONLINE
scan: scrub repaired 0B in 00:17:33 with 0 errors on Sun May 10 00:41:34 2026
config:

NAME STATE READ WRITE CKSUM
share1ntelzfs ONLINE 0 0 0
/dev/disk/by-id/ata-INTEL_SSDSC2KG960G8_BTYG90920CB0960CGN-part1 ONLINE 0 0 0 (untrimmed)

errors: No known data errors
root@pve:~# zpool get autotrim
NAME PROPERTY VALUE SOURCE
share1ntelzfs autotrim off default
root@pve:~#

Immich :
nick@immich-gaming:~$ zpool status -vtP
no pools available
nick@immich-gaming:~$ zpool get autotrim

K3s :
nick@esscert:~$ zpool status -vtP
no pools available
nick@esscert:~$ zpool get autotrim

Truenas:

ProxGamingMox · May 18, 2026

IO Delay is now sitting at 10-11% after turning Discard and IO Thread back on on the 3 VMs.

ProxGamingMox · May 18, 2026

I ran iotop -oPa for about 3 minutes. It appears that the constant disk activity comes from the k3s cluster vm as well as zvol_tq2. As for Truenas it burts a 1-2mb of activity every once in a while it seems.

ProxGamingMox · May 18, 2026

Switching to kernel Linux 7.0.2-4-pve (2026-05-15T07:32Z) seems to have fixed the problem.
Will report back if anything changes.
Thanks for your time.

leesteken · May 18, 2026

ProxGamingMox said:
Switching to kernel Linux 7.0.2-4-pve (2026-05-15T07:32Z) seems to have fixed the problem.

A recent version of some Proxmox package(s) reported too high values according to another thread. Did you update more of your Proxmox besides just the kernel?

ProxGamingMox · May 18, 2026

leesteken said:
A recent version of some Proxmox package(s) reported too high values according to another thread. Did you update more of your Proxmox besides just the kernel?

No not really, here is how it basically played out :

I was creating a new lxc when I was prompted to "update" the host for better compatibility/performance of the lxcs and I answered with Yes.
It took some time for the update and upgrade to complete and after it completed all web elements had <span> in front of them.
I then did a dist-full-upgrade and --fix-broken installs and it went back to normal. It was then that I noticed the IO Delay. Everything I tried since that moment made no difference apart from unchecking Discard and IO thread. It seemed like IO Thread was the main culprit until I switched over to the newest kernel and everything went back to normal.
Apart from all that I also changed the ARC size from the initial 8gb over to 32gb since my server has more RAM than it used to.

ProxGamingMox · May 18, 2026

If it is of any help here are my system specs :

CPU -> Intel Xeon e5-2680v4
RAM -> 128gb of DDR4 2400mhz
Host storage -> C400-MTFDDAC128M
Lxc/VM storage -> INTEL SSDSC2KG96

Running on a HP Z440.

EDIT: It is also possible that on newer packages metrics/load changed and/or became heavier and thus some consumer grade SSDs might be struggling more with the load.

_Dejan_ · May 27, 2026

I would like confirm that there is some issue in some Proxmox component compatibility with kernel...
Im update proxmox and stay at same kernel(6.8.12-8-pve) as with newer I have issues with Intel network cards. This kernel has work ok for long time(Around 8 months)... Imediately after update proxmox, reboot host and move VM's back to host IO Delay spike to around 50%. It happen on both PVE host after update. If VM's do not run on host then IO Delay is ok.

Im using Enterprise SATA SSD's(Intel D3-S4610 960GB) so this is not problem...

Hosts has been updated 19.5.2026 to latest "No Subscription"(pve-no-subscription) release of packages when issue occour. 2 days ago Im update them again but there is no change... Today Im also update kernel to latest version 7.0.2-6-pve and after kernel update IO Delay fall down to normal 0-0,2%.

[SOLVED] Noteable I/O Delay after upgrade to 9.1.14

ProxGamingMox

New Member

ProxGamingMox

New Member

ProxGamingMox

New Member

ProxGamingMox

New Member

leesteken

Distinguished Member

ProxGamingMox

New Member

leesteken

Distinguished Member

Impact

Distinguished Member

ProxGamingMox

New Member

ProxGamingMox

New Member

ProxGamingMox

New Member

ProxGamingMox

New Member

leesteken

Distinguished Member

ProxGamingMox

New Member

ProxGamingMox

New Member

_Dejan_

Member

We value your privacy