High IO Delay, Slow performance

MorpheusTrue

New Member
Apr 26, 2024
1
0
1
For about 2 weeks know I´m having trouble with my server. Everytime I´m uploading Data or using a VM, the IO Delay goes up to 30-80 % and needs a few Minutes to go down. In the meantime, I reinstalled Proxmox because of a few reasons. Sadly, that changed nothing. So I think it could be a hardware Problem.

Currently I have just one Lxc running for Docker.

Bildschirmfoto vom 2024-04-26 10-40-08.png

On that peak I transferred 11,4 GB of Music.
Before I did the new installation, I installed 4 new 3TB HDD´s. Only my 1TB SSD is still in my Homeserver.

Bildschirmfoto vom 2024-04-26 10-45-04.png

While writing this, I uploaded Data to my ZFS Volume, and there where no Problems. 110 GB of Data and it ran flawlessly. So its defenetly my SSD. But is there a way to "repair" it?

Edit: Also, this minutely shows up in my system Logs:
Bildschirmfoto vom 2024-04-26 11-14-35.png
 
Last edited:
My PROXMOX Server 8.1.10 runs fine over 2 weeks, yesterday afternoon it startet high IO delay 50, 60% and more. After shutting down all VMs and reboot all was normal, today morning the high IO delay came back. I now shut down all VMs again, but high IO delay persist. Trying and uograde.
The are quite al lot of post with the same problem and no real solution, at least not for my problem as I no scheduled backups or snapshots etc.
And there is no possible reason for that problem.
PROXMOX should at least give some tips to avoid that problem
 
The most common reason for sudden IO delay spikes is that Consumer or Prosumer SSD are used.

While they might offer decent performance in desktops, they are not suited for Hypervisors, since the general workload differs drastically. Enterprise SSDs are more expensive but are made for this kind of workload, which is why these kind of issues mostly appear on Homelab instances.

Consumer SSDs are only really fast as long as you can use their (tiny) SLC cache. Once that cache is full, their performance tanks hard, and then you get the performance of their main cells, which are the way slower TLC or QLC cells. The performance of QLC is especially bad, usually dropping down to HDD levels.

Additionally, Consumer SSDs don't offer Power-Loss-Protection (PLP). While this sounds 'just' like a safety feature, it also enables the disk to do Sync Writes, which drastically increases the performance in these kinds of environments.

The only real solution is to get some Enterprise SSDs. A common recommendation for Home-Labs is to look out for second-hand Enterprise-SSDs.
 
  • Like
Reactions: Kingneutron
My Server is a Fujitsu Primargy RX300 S6 Server with server grade HDs (that's not a consumer enviorement) and it runs fine last three weeks in production and 1 month before while testing PROXMOX without any abnormal IO delay, since yesterday. I did'nt change anything. And aft reboot yesterday evening it runs normal all the night until suddenly this morning ir raised over 50%.
My filesystem is ZFS
 
Hello,

Could you please share the exact storage configuration? Is a hardware controller in use? Is it in HBA mode? How many disks? What kind of raid setup if any?, and more importantly what is the exact model of the disks?

Could you please share the VM config of one of the affected VMs? They are at `/etc/pve/qemu-server/<VM-ID>.conf`.

Regarding the original post, I see four HDDs. Which ZFS raid mode are they using?
 
Hello Maximilano,

thank you for your help, but as I wrote i figured it out. Im am relative new to PROXMOX and to ZFS. My first assume was that a DH has crashed or has failure but zpool status said mirrored no error, so I came to zpool scrub and found it was running, I stopped it and IO delay was gone.
My configuration is two 144GB HD mirrored with a RAID controller for the PROXMOX system an d two single 1TB HD ZFS mirrored RAID1

A second server pbs is configured similar.

Regards

 
Could you check if your HDDs are SMR or CMR? It is known that SMR disks perform badly with scrubs.

Regarding using a raid setup with a hardware controller, this is not recommended. See ZFS's documentation [2].

Regarding the Backup server, we recommend using SSDs [1] for performance. The reason is that HDDs perform poorly when doing random writes/reads.

[1] https://pbs.proxmox.com/docs/installation.html#recommended-server-system-requirements
[2] https://openzfs.github.io/openzfs-docs/Performance and Tuning/Hardware.html#hardware-raid-controllers
 
Last edited:
  • Like
Reactions: Kingneutron
The HDs are 1.0TB WD RE3 SATA 32 MB Cache WD1002FBYS
The 1T HD are not hardware mirrored they two separate disks configured as RAID1 under ZFS
I know my servers are not the youngest and my HDs are not the fastes, but I am IT Consultant and it's my home network, so performance is not really an issue, but an IO delay of 50, 60 up to 90% ist not accepable. But now I know how to solve and can start scrub over night or weekend.

I know for a real production network enterprise grade SSDs are recommended

I migrated everything from ESXi last month. I am still learning, and migrated a customers network last week from ESXi to PROXMOX , to actual server with SSDs and have an other larger installation by an other customer that should be migrated in the next months.

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!