IO Pressure Stall issue after upgrade to 9.1

Gabriele_Lvi

New Member
Dec 12, 2025
2
0
1
Hi everyone,
I'm writing because I started working in a new company and I'm having to diagnose a few issues with their proxmox server, mainly IO Pressure Stall related. The server was installed more as a test than anything even though it now hosts quite a few websites now.

The current setup is two hosts in a cluster (pve1 and pve2). Pve2 and the cluster setup as a whole were a test, there's only a windows 10 vm on pve2 and no HA functionality at the moment.
Pve1 has a Ryzen 9 7950X processor, 64Gb Ram, a Samsung mzvl21t0hclr 1Tb nvme boot drive, 2 wd red sa500 sata 2Tb drives in a zfs mirror for the vms/lxcs. It hosts about 40 lxc (4 CPU, 4GB RAM, 10GB drive), 2 linux based VM (UISP: 2CPU, 2Gb RAM, 50GB Drive; qcenter: 2CPU, 4GB, 8GB drive) . The lxcs host websites, they are not directly managed by us but by a collaborator. I know they use an Ansible playbook to manage updates and HAproxy.
Pve2 has 2 Intel(R) Xeon(R) CPU E5-2420 processors, 72Gb Ram, a KINGSTON SHSS37A240G sata ssd boot drive, 2 KINGSTON SA400S37480G sata ssd not used at the moment and a CT1000BX500SSD1 with zfs for vms/lxcs. It only hosts a win10 vm (4CPU, 4Gb RAM, 80Gb drive).

The issues became apparent after the upgrade from 8.4 to 9.1. The upgrade was successful without any errors, but later in the day we saw that one website was offline. At that point we noticed that the IO Pressure Stall graph of pve1 was at about 90% before normalizing after a few hours. Now it's stable, and every 5 minutes nearly on the dot theres a short spike to about 70%. It doesn't look like this metric was present before the upgrade, so I have no way to exclude the possibility that was already present, since in normal use it's not noticeable. CPU IO delay was instead present before the upgrade and I can see that it was roughly similar, it only looks more "spiky", again, a spike every 5 minutes.

The server load is rarely over 5% with around 27Gb of available RAM.

To diagnose the issue I first tried with iostat and iotop. Iostat showed latency during the spikes as expected, iotop on the host showed journald as the most intensive process (but still I wouldn't call it intensive). Journalctl showed several apparmor denies per second on requests by rsyslog coming from the Haproxy lxc, but temporarily disabling apparmor for that container changed nothing. Honestly it doesn't really surprise me because even if the logs were the culprit disabling apparmor only means that the same amount of logs that were blocked are now coming through. I haven't found a way to see this logs though…

Seeing how iotop doesn't show anything relevant I assumed it was related to zfs and possibly already present before the upgrade, but a collegue pointed out that even on pve2 (which shows no spikes and no IO Pressure unless there's some kind of load on the only vm) installing for instance a win11 vm is extremely slow and that cannot be normal. He also said that that host always had similar issues, they tried with different drives but were not able to fix it. In any case according to him it's worse than it was. I checked and he is right, during the installation the IO Pressure raises consistently over 80%. I tried setting up a new host with a similar processor to pve2. I installed directly 9.1. The sata ssd is a Samsung evo 870 250Gb. I tried setting up the drive with zfs and install the win11 vm, then I switched the drive to LVM and tried again. I saw minimal IO Pressure in both cases, but the installation was still really slow.

I then tried to set up one of the unused kingston drives on pve2 as LVM and see if it made a difference. I tried installing win11 again and IO Pressure was considerably lower, but the installation was still slow. I then checked the recommended settings for win11 and by fixing them I got these results:
If I install it on the lvm drive in about 4 minutes the installation is complete with maximum IO Pressure of 17.64.
If I install it on the zfs drive it takes 12 minutes to get to 63% (I stopped it after that…) with an IO Pressure of 88.

I also tried messing with the zfs timers, but they did not make a difference.

At the end of the day I have 2 hosts that feel rather sluggish and where IO pressure tends to rise dramatically when there's any load added, wether it's adding a vm or updating one. Since the issue is fixed on pve2 when working on a LVM drive I assume it's related to zfs, and since I'm not aware of any tuning made or attempted on these hosts I assume it's just the standard behaviour that becomes an issue in this hardware configuration. The counterargument by my collegues is that they used to have similar loads and setup with esxi and never noticed any issues. Of course they were not using zfs at the time.

Since this is all a bunch of assumptions I'd like to know if anyone more experienced has any input about this issue.


Thank you in advance.
 

Attachments

  • CPUIODelay.png
    CPUIODelay.png
    29.2 KB · Views: 6
  • IOPressureStallAfterUpgrade.png
    IOPressureStallAfterUpgrade.png
    24.1 KB · Views: 6
  • IOPressureStallStable.png
    IOPressureStallStable.png
    48.8 KB · Views: 6
you shouldn't use consumer ssd like Samsung evo with zfs. zfs is doing synchronous write, and consumer ssd don't have a supercapacitor to keep sync writes in memory cache before writing the nand cell. (it's really something like 200~400iops on theses drive with 20000 iops with datacenter grade ssd)
 
  • Like
Reactions: UdoB
Welcome, @Gabriele_Lvi .
I'm not stating that your issue was also present at PVE 8,, but as far as I remember from the forum posts, the graphs in PVE 9 are more "spiky" than in PVE 8 because they are prepared other way than they used to be in PVE 8.
So maybe the issue was less "visible" in PVE 8. Or maybe it didn't exist in fact, I don't know.

If I understand you right, you weren't working there (or short) when PVE 8 was installed.
So you may install PVE 8 on the second server and repeat your test with installing the same Win11 to have more hints whether the culprit is PVE 9 or ZFS.

Good luck!
 
  • Like
Reactions: UdoB
I am also getting bad io stall on my machines. Other end of the spectrum, homelab stuff, but still consumer SSDs.

The issue for me is simple. I used to be able to upload an ISO. Now I can't. It pushes about 2.6gb and the upload just stalls, io stall goes through the roof and everything just locks right up. Cancel the upload and wait a few minutes and it settles back down. The system is waiting on io somewhere and zfs just starts backing up across the board.

One of the systems is my old gaming hardware file server test bench so I could completely understand that I messed something up or and dumping to a shared data channel or something stupid, but the other machine is just a Dell Optiplex 3040 with a mirror data boot and a single nvme for guests.

Ironically, the third machine is an old Intel NUC 4th gen. I can upload there just fine, it has a single really old data ssd And I can upload just fine. It is not full speed, only 400mbps or so, but it goes and completes.

On the optiplex a 1gbe connection it runs at full speed and locks up at 2.6gb. on the old Ryzen machine it will run at 1gbe or 10gbe, and in either case it will get to 2.3, the slow down until 2.6 and stall.

Sometimes it would throw error 0 on the GUI but most of the time it just stops transferring and sits there.

I did notice a log thing also, probably unrelated, but very interesting. Couple of weeks ago I set up log2ram on all 3 machines and rsyslog on the ryzen machine, pointing at a little optane drive. I had one guest on the nuc (a NUT monitor webpage) that was misconfigured and throwing logs every second. After a few weeks journals on the nuc filled the 128mb ram drive and locked up the host. But looking at the logs on the logserver the excessive writes only started on Dec 12. There was only 3 days of massive logs. So even though I hadn't made a change to the guest in several weeks, something changed on Saturday and suddenly the daily logs went from small to like 16mb or something. I can check if anyone cares, but it was specific and probably related to a Friday night system update.

Remind me, I need to reduce the journals maxlogsize on the other nodes.
 
Pve1 has a Ryzen 9 7950X processor, 64Gb Ram, a Samsung mzvl21t0hclr 1Tb nvme boot drive, 2 wd red sa500 sata 2Tb drives in a zfs mirror for the vms/lxcs. It hosts about 40 lxc (4 CPU, 4GB RAM, 10GB drive), 2 linux based VM (UISP: 2CPU, 2Gb RAM, 50GB Drive; qcenter: 2CPU, 4GB, 8GB drive) . The lxcs host websites, they are not directly managed by us but by a collaborator. I know they use an Ansible playbook to manage updates and HAproxy.
OMG!! 40 LXCs each with 4 vcpu and 4gb RAM? Running on a 16/32 core CPU combined with consumer NAS drives? Massive overcommit, no wonder that you‘re running in I/O stalls. I don’t believe that the system ever ran normally. Otherwise I through my big Epyc systems away and buy 50 of these…
 
Thank you everyone for the help.

you shouldn't use consumer ssd like Samsung evo with zfs. zfs is doing synchronous write, and consumer ssd don't have a supercapacitor to keep sync writes in memory cache before writing the nand cell. (it's really something like 200~400iops on theses drive with 20000 iops with datacenter grade ssd)

That was kind of my conclusion. The only doubt is that from what I saw installing windows on zfs and lvm on pve2 the difference is massive, while on a native 9.1 test machine it was way less noticeable, so I guess there might also be something else at play.

Welcome, @Gabriele_Lvi .
I'm not stating that your issue was also present at PVE 8,, but as far as I remember from the forum posts, the graphs in PVE 9 are more "spiky" than in PVE 8 because they are prepared other way than they used to be in PVE 8.
So maybe the issue was less "visible" in PVE 8. Or maybe it didn't exist in fact, I don't know.

If I understand you right, you weren't working there (or short) when PVE 8 was installed.
So you may install PVE 8 on the second server and repeat your test with installing the same Win11 to have more hints whether the culprit is PVE 9 or ZFS.

Good luck!

Yes, I haven't really had the time to work on this setup before the upgrade, so it's entirely possible that the issue was already there or that it changed only slightly. As I said as long as you don't look at the dashboard is not that noticeable. I could try your suggestion on the second server, however I know for sure that it had similar problems even with 8.4, but it could be useful to see if it gets better or worse.

OMG!! 40 LXCs each with 4 vcpu and 4gb RAM? Running on a 16/32 core CPU combined with consumer NAS drives? Massive overcommit, no wonder that you‘re running in I/O stalls. I don’t believe that the system ever ran normally. Otherwise I through my big Epyc systems away and buy 50 of these…

Well, I'm no ruling that out but the only bottleneck seems to be with the drives, the server load in the last year reached 50% only once right after the upgrade and it has always had at least 20Gb of available ram...

I am also getting bad io stall on my machines. Other end of the spectrum, homelab stuff, but still consumer SSDs.

The issue for me is simple. I used to be able to upload an ISO. Now I can't. It pushes about 2.6gb and the upload just stalls, io stall goes through the roof and everything just locks right up. Cancel the upload and wait a few minutes and it settles back down. The system is waiting on io somewhere and zfs just starts backing up across the board.

One of the systems is my old gaming hardware file server test bench so I could completely understand that I messed something up or and dumping to a shared data channel or something stupid, but the other machine is just a Dell Optiplex 3040 with a mirror data boot and a single nvme for guests.

Ironically, the third machine is an old Intel NUC 4th gen. I can upload there just fine, it has a single really old data ssd And I can upload just fine. It is not full speed, only 400mbps or so, but it goes and completes.

On the optiplex a 1gbe connection it runs at full speed and locks up at 2.6gb. on the old Ryzen machine it will run at 1gbe or 10gbe, and in either case it will get to 2.3, the slow down until 2.6 and stall.

Sometimes it would throw error 0 on the GUI but most of the time it just stops transferring and sits there.

I did notice a log thing also, probably unrelated, but very interesting. Couple of weeks ago I set up log2ram on all 3 machines and rsyslog on the ryzen machine, pointing at a little optane drive. I had one guest on the nuc (a NUT monitor webpage) that was misconfigured and throwing logs every second. After a few weeks journals on the nuc filled the 128mb ram drive and locked up the host. But looking at the logs on the logserver the excessive writes only started on Dec 12. There was only 3 days of massive logs. So even though I hadn't made a change to the guest in several weeks, something changed on Saturday and suddenly the daily logs went from small to like 16mb or something. I can check if anyone cares, but it was specific and probably related to a Friday night system update.

Remind me, I need to reduce the journals maxlogsize on the other nodes.

The cyclical spikes we get is probably due to some cyclical operation, I do think that any load could produce a bad io stall, so I guess the issue is very similar and you have the advantage of knowing your system's performance even before the upgrade. I guess it coulkd be very telling if you could try to roll back to 8.4 in one of the affected ones, as long as it's not too much of a hassle of course.