Hi,
I’m running into repeatable ZFS I/O stalls on a Proxmox host and I’d like some technical feedback before I start swapping hardware.
Hardware
Under heavy write load (e.g. vzdump backup, snapshot, large writes), the system eventually:
D [txg_sync]
D [zvol_tq-0]
D [dbuf_evict]
D flush-zfs
D vzdump
No:
Any technical input appreciated.
I’m running into repeatable ZFS I/O stalls on a Proxmox host and I’d like some technical feedback before I start swapping hardware.
Hardware
- CPU: Ryzen 9 7900
- Motherboard: ASUS Pro WS B850M-ACE SE (AM5)
- RAM: 64GB DDR5 (non-ECC)
- Storage: 2x Crucial T705 2TB (CT2000T705SSD3)
- Firmware: PACR5111 (both drives)
- Both NVMe drives running at PCIe 5.0 x4 (32GT/s confirmed via lspci)
- Pool: ZFS mirror (rpool).
- Software
- Proxmox VE (latest kernel 6.17.x)
- ZFS mirror on the two T705
- Guest: Ubuntu VM with LVM inside
Under heavy write load (e.g. vzdump backup, snapshot, large writes), the system eventually:
- Load average spikes (~10+)
- Multiple ZFS threads enter D state:
- txg_sync
- zvol_tq-*
- flush-zfs
- Even unrelated processes end up blocked
- SSH eventually drops
- No NVMe reset or I/O error in dmesg
- zpool status still shows ONLINE, no errors
- Only recovery is full reboot (power cycle sometimes required)
D [txg_sync]
D [zvol_tq-0]
D [dbuf_evict]
D flush-zfs
D vzdump
No:
- nvme timeout
- controller reset
- blk_update_request error
- Both drives are PCIe Gen5 x4
- No ASPM enabled in BIOS
- No explicit NVMe power saving tuning
- Scrub completes fine when idle
- Issue appears only under sustained write / flush pressure
- Happens even when backup target is local (so not network-related)
- Has anyone seen txg_sync hangs on Phison E26 (T705) under ZFS?
- Would forcing PCIe Gen4 instead of Gen5 be a reasonable stability test?
- Is this a known flush latency issue with consumer Gen5 NVMe?
- Any ZFS tunables worth testing (before replacing hardware)?
- Forcing both slots to PCIe Gen4
- Temporarily detaching one disk and testing single-device pool
- Updating firmware (if newer than PACR5111 exists)
Any technical input appreciated.