[SOLVED] Another io-error with yellow triangle

girofle · Tuesday at 10:35

Hi,

Sorry for the yet-another-io-error

Here are our host specs:
* OS : proxmox version 6.17.4-2-pve (x64)
* Intel Xeon-E3 1230v6 - 4c/8t - 3.5 GHz/3.9 GHz
* 32 Go ECC
* 2×450 Go SSD NVMe Soft RAID

We currently have 4 VM running on this host, and they happen to fail randomly, showing an IO-Error message with a yellow triangle.
The partition that contains the VMs is an SSD formatted in ZFS (over LVM+LUKS).

You will find attached the debug log, manually generated using the following command, after the crash on one of this VM.
```
gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/<ID>.pid)
```

Could you help me understand more what is happening ? I don't know where to go from there.

We did not find any trace of disk i/o error on the host logs (smartmontools installed), and the disk is currently used at 18% only.
Let me know if I can provide more information.

Thanks !

fba · Thursday at 12:32

Hi,
anything unusal in the logs dmesg -T or journalctl -xe before or at the time of the crash?
You could connect to the qemu monitor with qm monitor <vmid> (or via GUI) and run commands like info status and info block to retrieve more details about why the vm is halted.

bitranox · Thursday at 12:58

This backtrace is not a crash-in-progress. It looks like you attached GDB while QEMU was still running, so you basically grabbed a snapshot of mostly idle threads.

Normal, boring threads

Thread 1 (kvm main). Sitting in the main event loop (ppoll). That is expected.
Thread 5 (vnc_worker). Waiting for work. Normal.
Thread 6 (CPU 0/KVM). vCPU thread blocked in qemu_wait_io_event. Normal.
Thread 8 (call_rcu). RCU helper thread waiting on a futex. Normal.

Threads showing 0x0. Not automatically “broken”

Thread 2 and 3 (iou-wrk-*). io_uring worker threads. These are kernel-managed, and GDB often can’t unwind them properly from userspace. That usually shows up as 0x0 frames.
Thread 4 (kvm-nx-lpage-re). A KVM kernel thread. Same limitation.
Thread 7 (vhost-*). A vhost kernel thread. Same thing.

This backtrace does not explain the failure, because it was taken after the important part. The I/O error happened down in the storage path before QEMU had much to say about it.

Given your stack. ZFS --> LVM --> LUKS --> NVMe soft RAID. the most likely culprit is somewhere in that layering, not an obvious QEMU userspace crash.

io_uring interacting badly with the stacked block devices
The iou-wrk threads are a strong hint QEMU is using io_uring for async I/O. io_uring has had rough edges with complex stacked block devices in real deployments (esp. encrypted layers plus other abstraction). You can get transient submit failures, timeouts, or weird latency spikes that QEMU reports as I/O errors.

Try first: force QEMU to use thread-based AIO instead of io_uring.
In /etc/pve/qemu-server/<ID>.conf, set on each disk line:

aio=threads

Or set it via the Proxmox UI per-disk. (Same effect.)

ZFS memory pressure causing stalls that look like I/O errors

With 32GB RAM, multiple VMs, and LUKS overhead, ZFS ARC can push the host into memory pressure. When that happens, ZFS can stall in ways that feel like storage timeouts.

arc_summary
cat /proc/spl/kstat/zfs/arcstats | grep -E "^(size|c_max)"
free -h

Mitigation: cap ARC, e.g. 8GB

echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max

Permanent:

/etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=8589934592

LUKS + soft RAID latency spikes under loadA deep stack can amplify tail latency. Encryption plus md RAID scheduling plus VM bursts can create short “everything pauses” moments. QEMU can interpret that as a disk I/O timeout depending on settings and workload.Watch

iostat -x 1 (look at await and %util)
dmesg | grep -iE "task.*blocked|hung_task|io.error|blk_update"
journalctl -k --since "1 hour ago" | grep -iE "error|timeout|reset"

Even if SMART looks “fine,” NVMe controller resets or md resync can cause brief stalls.

cat /proc/mdstat
nvme smart-log /dev/nvme0n1 (esp. media_errors, unsafe_shutdowns)
nvme smart-log /dev/nvme1n1
dmesg | grep -i nvme

Switch all VM disks to aio=threads and see if the errors stop. This is the cleanest, lowest-effort test.
While reproducing load, watch iostat -x 1 and kernel logs to catch the exact moment the stack hiccups.
If memory looks tight, cap ARC to something sane for your workload.
Longer term. Consider flattening the storage stack. ZFS mirror instead of mdraid + LUKS + LVM under ZFS is usually less “surprising” in practice.

use refreservation=None on the zfs volummes if You overcommit storage instead of Thin-LVM

fba · Thursday at 13:59

Would you be so kind to not post untested ai generated answers?

Parts like this

bitranox said:
Mitigation: cap ARC, e.g. 8GB

echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max

Permanent:

/etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=8589934592

are simply not correct. This would potentially increase the used memory for ARC.

bitranox · Thursday at 14:01

yes - itrs just an example, adopt to the memory requirements --> e.g. 8GB (!!!)

fstrankowski · Thursday at 14:06

bitranox said:
This backtrace is not a crash-in-progress. It looks like you attached GDB while QEMU was still running, so you basically grabbed a snapshot of mostly idle threads.

[...]
use refreservation=None on the zfs volummes if You overcommit storage instead of Thin-LVM

Jesus AI Slop nonsense!

bitranox · Thursday at 14:07

fstrankowski said:
Jesus AI Slop nonsense!

whats nonsense exactly - happy to learn ?

fba · Thursday at 14:08

bitranox said:
yes - itrs just an example, adopt to the memory requirements --> e.g. 8GB (!!!)

ARC is for a default installation on a PVE 9 with 32 GiB much lower. So the example is missing clear directions and could be phrased much better to be helpful.

bitranox · Thursday at 14:37

fba said:
ARC is for a default installation on a PVE 9 with 32 GiB much lower. So the example is missing clear directions and could be phrased much better to be helpful.

Since PVE 8.1 (which includes PVE 9), the installer sets ARC to 10% of installed physical memory, clamped to max 16 GiB. This is written to /etc/modprobe.d/zfs.conf.

On a 32 GiB system: the default ARC limit is ca. 3.2 GiB, not 8 GiB.
The 8589934592 (8 GiB) value in the official Proxmox docs is just a syntax example for the config line:
options zfs zfs_arc_max=8589934592
It is not a recommendation, and it's definitely not the default. The docs could be much clearer about this distinction.
What the right value actually depends on storage pool size and workload.

The Proxmox docs give this rule of thumb:
2 GiB base + 1 GiB per TiB of storage
So for a 32 GiB system:
2TIb --> 4GB
8TIb --> 10GB
16TIb --> 16GB (cap)

1. Edit /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=4294967296
1. (example: 4 GiB = 4 * 1024^3)
2. If root is ZFS, update initramfs:
update-initramfs -u -k all
3. Reboot. (You can also set it with sysctl without reboot )

If your desired zfs_arc_max is <= zfs_arc_min (which defaults to 1/32 of system memory = 1 GiB on 32 GiB), you must also set zfs_arc_min to at most zfs_arc_max - 1.

The 3.2 GiB default (10% of 32 GiB) is quite conservative. Depending on your storage pool size and workload, you may want to increase or decrease it

Your claim not to post untested help is a bit unreasonable - how on earth You should test all those hardware, memory and workload variations ?
And in my OP, I putted : If memory looks tight, cap ARC to something sane for your workload.

The only thing You can hope for is to push people into the right direction, applying common sense, not a tailored recipie, which I would charge for.

fba · Thursday at 15:00

bitranox said:
Your claim not to post untested help is a bit unreasonable - how on earth You should test all those hardware, memory and workload variations ?

Of course, no one can test every variation. But there is a middle ground between exhaustive testing and echoing AI outputs without the necessary critical oversight.
This is my last OT post here as this discussion is not helping the OP.

girofle · Friday at 20:57

Wow ! Thank you for all those answers !
I did not expect to trigger such a thread haha

Running `zpool events -v` right after a failure shows multiple `ereport.fs.zfs.dio_verify_wr` events. I believe I have the same issue than https://forum.proxmox.com/threads/proxmox-9-io-error-zfs.179519/
I tested using `zfs set direct=disabled`, and I did not face any io-error since Wednesday. I believe the issue is solved, but I'll keep you informed if not, and I will eventually dig into ARC.

Thank you again for the answers,

fstrankowski · 2026-02-21T05:20:33+0100

Maybe useful for people coming back to this thread one day:

Take a look at the kernel.org thread which describes this bug including examples from Proxmox and QEMU. There is also an excellent writeup here.

Search

Search

[SOLVED] Another io-error with yellow triangle

girofle

New Member

Attachments

fba

Renowned Member

bitranox

Member

ZFS memory pressure causing stalls that look like I/O errors

fba

Renowned Member

bitranox

Member

fstrankowski

Famous Member

bitranox

Member

fba

Renowned Member

bitranox

Member

fba

Renowned Member

girofle

New Member

fstrankowski

Famous Member

We value your privacy

[SOLVED] Another io-error with yellow triangle

New Member

Attachments

Renowned Member

Member

​

ZFS memory pressure causing stalls that look like I/O errors​

​

Renowned Member

Member

Famous Member

Member

Renowned Member

Member

Renowned Member

New Member

Famous Member

We value your privacy

ZFS memory pressure causing stalls that look like I/O errors