Can't upload ISOs via PVE (proxmox 9 IO Stall)

brainsoft

Member
Jun 25, 2024
33
3
8
Hello,
I used to be able to upload isos from the web ui. This is a basic feature, of course it worked, never had a problem with pve 8. Somewhere since PVE9, maybe 9.1, I can no longer upload isos. The system io stall goes up to 90 and the upload stops dead. it is pretty consistent that it gets to maybe 2.4gb then slots down for a couple of seconds, gets to 2.6 or so and stops cold. It behaves the same on the 1gbe and 10gbe connections, just gets to 2.6gb faster before stopping.

I'ved tried a few things, mounting tmp on the nvme, mounting it as tmpfs in ram, adding an optane slog, disabling sync writes on rpool but nothing has helped. I've split the mirrors too, nothing has changed.

Yes, these are cheap consumer grade sata ssds. No, they didn't have a problem before. Other than being the only iso upload source, all they are is boot and a single instance nightly guest backup. (pbs handles more) Configurations haved changed over time, but this problem isn't even isolated to just this one machine. This machine 'tiberius' is an old Ryzen 2600 gaming rig shoved to the gills (pci expansions like 10gbe nic, 8-port HBA, gpu, additional drives carefully laid out to use 100% of the available IO). The other machine 'regulus' is just a Dell Optiplex 3040. Both machines have a zfs mirrored rpool and an m2 nvme for guests. Both are running 9.1.1 or 9.1.2. There is also a third node, an old 4th gen Intel Nuc, it is also on 9.1.1. it isn't powerful enough to pull the full gigabit line speed, but it will take 300-400mbps upload and actually complete, even if it's writing afterwords over nfs back to tiberius.

Since it affects both machines, and since uploading ISO is a pretty basic function, any help is appreciated. I've gone as far as I can with LLMs, here are some details that naturally chatgpt goes AHA, this is the smoking gun! ZFS back pressure from ZFS TXG flush activity metadata etc etc but I'm not a zfs master. All I know is... why did it work and what changed in either ZFS or proxmox 9 that means I can't upload isos anymore, because it used to work even with this crap before I upgraded.
All 3 nodes went through PVE8to9 upgrades but that was weeks/months ago. IN fairness, i haven't tried to upload any isos in a while, just working with lxc templates mostly, so I can't say exactly when it happened.

Code:
Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:      38.84 M/s
    PID  PRIO  USER     DISK READ DISK WRITE>    COMMAND                                                                                                   
  62348 be/4 www-data      0.00 B      2.64 G pveproxy worker
   4295 be/4 root          0.00 B     23.51 M systemd-journald
   3327 be/4 root        124.00 K      9.77 M kvm -id 108 -name homeassistant-vm,debug-threads=on -n~0=pflash0,pflash1=drive-efidisk0,hpet=off,type=q35+pve0
  23593 be/4 root          0.00 B      9.50 M [kworker/u50:2-kvfree_rcu_reclaim]
     87 be/4 root          0.00 B      7.79 M [kworker/u49:0-events_unbound]
  17630 be/4 root          0.00 B      7.73 M [kworker/u50:1-kvfree_rcu_reclaim]
  39124 be/4 root          0.00 B      5.74 M [kworker/u50:0-kvfree_rcu_reclaim]
  32153 be/4 root          0.00 B      5.38 M [kworker/u50:4-kvfree_rcu_reclaim]
  34590 be/4 root          0.00 B      4.42 M [kworker/u49:3-events_unbound]
   2427 be/4 root        112.00 K      4.03 M rrdcached -g
    400 be/0 root        311.00 K      4.00 M [zvol_tq-1]
  25580 be/4 root          0.00 B      3.09 M [kworker/u49:1-kvfree_rcu_reclaim]
   2959 be/4 root          0.00 B      3.00 M pmxcfs
    401 be/0 root        349.00 K      2.28 M [zvol_tq-2]
   3860 ?dif 100999        6.84 M   1158.00 K pihole-FTL -f
    399 be/0 root         89.00 K   1024.00 K [zvol_tq-0]
  52327 be/4 root          0.00 B    992.00 K [kworker/u49:2-kvfree_rcu_reclaim]
   3628 be/4 100000        0.00 B    504.00 K systemd-journald
   4596 be/4 backup        0.00 B    324.00 K proxmox-backup-proxy
  49834 be/0 root         16.00 K    208.00 K [zvol_tq-1]
   3600 be/4 100000        0.00 B    178.00 K systemd-journald
    637 be/4 root         24.00 K    120.00 K [txg_sync]
   1825 be/4 root          0.00 B    120.00 K [txg_sync]
  61313 be/0 root          0.00 B    114.00 K [zvol_tq-1]
                                       9
   3157 be/4 100000       20.00 K     22.00 K rsbd: cleanupd
   2430 be/4 root          0.00 B     11.00 K smartd -n -q never
  61338 be/0 root          5.00 K      8.00 K [zvol_tq-2]
   3039 be/4 root          0.00 B   1024.00 B nmbd --foreground --no-process-group
   3741 be/4 100000        0.00 B   1024.00 B dhclient -4 -v -i -pf /run/dhclient.eth0.pid -lf /var/~.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0
  63356 be/4 postfix       0.00 B   1024.00 B bounce -z -n defer -t unix -u -c

  keys:  any: refresh  q: quit  i: ionice  o: all  p: threads  a: bandwidth                                                                                 
  sort:  r: asc  left: DISK READ  right: COMMAND  home: PID  end: COMMAND                                                                                   
CONFIG_TASK_DELAY_ACCT and kernel.task_delayacct sysctl not enabled in kernel, cannot determine SWAPIN and IO %


Code:
Every 1.0s: cat /proc/pressure/io                                                                                         tiberius: Tue Dec 16 09:48:32 2025

some avg10=87.72 avg60=56.26 avg300=18.42 total=283301742
full avg10=85.36 avg60=54.29 avg300=17.72 total=275486165


Code:
Every 1.0s: date; zpool iostat -v                                                                               tiberius: Tue Dec 16 09:48:51 2025

Tue Dec 16 09:48:51 AM EST 2025
                                                               capacity     operations     bandwidth
pool                                                         alloc   free   read  write   read  write
-----------------------------------------------------------  -----  -----  -----  -----  -----  -----
nvme_guests                                                  6.42G   226G     10     18   137K   211K
  nvme-Samsung_SSD_970_EVO_Plus_500GB_S58SNM0T907161H-part1  6.42G   226G     10     18   137K   211K
-----------------------------------------------------------  -----  -----  -----  -----  -----  -----
rpool                                                        23.7G  74.8G      2     50  75.8K  3.70M
  mirror-0                                                   23.7G  74.8G      2     50  75.8K  3.70M
    ata-Netac_SSD_120GB_YS581296399139783932-part3               -      -      1     25  34.8K  1.85M
    ata-Netac_SSD_120GB_YS581296399139784728-part3               -      -      1     25  41.0K  1.85M
-----------------------------------------------------------  -----  -----  -----  -----  -----  -----
vault                                                        18.7G  25.5T      0      0  1.18K  2.34K
  mirror-0                                                   9.24G  12.7T      0      0    220    458
    ata-WDC_WUH721414ALE6L4_81G25VVV                             -      -      0      0    110    229
    ata-WDC_WUH721414ALE604_9JGJ4LST                             -      -      0      0    109    229
  mirror-1                                                   9.18G  12.7T      0      0    219    458
    ata-WDC_WUH721414ALE6L4_Y5JMJSUC                             -      -      0      0    109    229
    ata-WDC_WUH721414ALE6L4_81G7NSRV                             -      -      0      0    109    229
special                                                          -      -      -      -      -      -
  mirror-2                                                    327M   111G      0      0    772  1.44K
    ata-INTEL_SSDSC2BB120G4_PHWL522301B2120LGN                   -      -      0      0    420    738
    ata-INTEL_SSDSC2BB120G4_PHWL522301LH120LGN                   -      -      0      0    352    738
-----------------------------------------------------------  -----  -----  -----  -----  -----  -----