Please help me solve high IO delay

dragon73

New Member
Nov 18, 2024
3
0
1
Hello everyone,

I’ve been facing a persistent issue for some time now, and I would greatly appreciate advice from more experienced members on how to resolve it. I’ve tried Googling and experimenting with different solutions, but so far, nothing has worked.

The problem is the high I/O delay I experience, which completely freezes all the VMs running on my server. This typically happens during moderately intensive disk operations. For example, if I use `yt-dlp` inside one of the Linux VMs to download a slightly longer YouTube video and then copy it elsewhere (e.g., to a remote share or server), the I/O delay often jumps to 100% at stays like that for a couple of minutes making the all VMs unusable. This delay sometimes occurs immediately after the download or during the file transfer, causing the entire system to freeze for several minutes.

Here’s my setup:
- Proxmox version: 8.1.4 (but it was the same with version 7)
- Hardware: Intel i7-2600 CPU, 32 GB RAM, a mix of SSDs and HDDs.
- Storage setup:
- Proxmox is installed on a ZFS pool of two mirrored 128 GB consumer-grade SSDs.
- VMs run on another ZFS pool (`zfs-storage`) with two mirrored 1 TB consumer-grade SSDs.
- An additional HDD is formatted with ext4 and mounted inside one of the VMs.

At one point, I added an Intel SSDSC2KB48 SSD as a ZIL drive for the `zfs-storage` pool, hoping to alleviate the I/O delay bottleneck. Unfortunately, this did not resolve the issue.

I understand that the CPU and memory aren’t particularly high-end. However, this is a home server currently running only:

1. A VM with Incredible PBX.
2. A VM with Ubuntu, which is used only occasionally.
3. A CT running a few docker containers.

I chose ZFS for its redundancy benefits, knowing that it typically requires more RAM than my setup offers. I love Proxmox but these freezes are discouraging me from utilizing it more as I would like. Would it make sense to switch to a setup with software RAID and than manually install Proxmox on top of it instead of using automated install and opting for ZFS?

Any advice on resolving these issues would be highly appreciated. Thank you!
 

Attachments

  • io_delay.JPG
    io_delay.JPG
    67.6 KB · Views: 4
There‘s a good chance you run into the „don‘t use consumer grade ssds for zfs“-problem. If your ssds run out of cache (which happens very quickly) data has to be written directly. Beside the faster wear out you‘ll experience delays and sluggish performance.

What ssds models do you use?
 
They are definitely ordinary consumer grades, those smaller one used for the system are Patriot Burst, and the bigger ones for the storage are Patriot P210.

Didn't think ZFS is so picky about it, especially because Proxmox ISO didn't offer some other type of software RAID so I though it's only natural to choose ZFS. :oops:
 
You may want to look into your ARC size. (It may be better to increase its max size if it is capped.)

I use consumer grade SSDs with ZFS for my server as well. (Since that is what my hoster put into the server.)

When I write large amounts of data to the disk, when the disks cannot keep up it starts using the ARC untill it can write it to disk. And when the drives cannot write the data fast enough to the SSDs, the ARC should kick in untill it is able to write everything to the SSDs.
 
Last edited:
if I use `yt-dlp` inside one of the Linux VMs to download a slightly longer YouTube video and then copy it elsewhere (e.g., to a remote share or server), the I/O delay often jumps to 100% at stays like that for a couple of minutes making the all VMs unusable. This delay sometimes occurs immediately after the download or during the file transfer, causing the entire system to freeze for several minutes.
download = writing AND THEN copy it elsewhere to remote = reading is zfs arc problem on consumer-grade ssd's or writing to remote target ...
 
How much RAM does your host have?

The ARC needs to be big enough to store the files it cannot write to the SSD.
So if your total file size is 5GB then it cannot be stored in the ARC as the ARC is too tiny.
Also when calculating the needed size for the ARC, also expect 2GB/3GB of overhead as it will also store frequently used files to speedup access times. (There is a good reason why the default ARC size is 50% of the total RAM size.)

The rule of thump is:
[base of 2GB] + [1 GB/TB of storage] (This is minimum and maximum size)
https://forum.proxmox.com/threads/rule-of-thumb-how-much-arc-for-zfs.34307/

And my rule of thump is:
[base of 2GB] + [1 GB/TB of storage] (With a minimum size of 8GB and a max size of 16GB)
 
My rule of thump allows the ARC to also be used as a file cache location to reduce access times and since I am writing 5GB+ files somewhat frequently I want it to be big enough to not notice it starts using the ARC and running out of space in the ARC. But also not being so big, im just wasting RAM.

The ARC is sort of a plan B for writing files. If it cannot directly write it to the SSD (because it as already busy writing/reading), it writes it to the ARC cache until it can write it to the SSD.
 
Yes but they also state in there own wiki that you need at least a ARC size of 8GB.
For me it always sets the minimum at 8GB and maximum on 16GB since I run a server with 256GB of RAM.
But I never found a reason to increase the ARC size and reducing it can cause for issues like @dragon73 described as it cannot function correctly with that little RAM/ARC size.
https://pve.proxmox.com/wiki/ZFS_on_Linux
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!