ZFS Storage bad NVME Performance

ByteArchitect

New Member
Mar 11, 2025
9
1
3
localhost
Hello everyone,


I’m currently running a server with the following hardware:


System specs:
  • CPU: AMD EPYC 7303P
  • Motherboard: H12SSW-NTR
Storage:
  • OS: 2x 256GB SSD
  • Data: 3x Samsung OEM Datacenter NVMe PM9A3 3.84TB (PCIe 4.0 x4)

ZFS setup:
I created a RAIDZ1 pool and configured the following settings:
zfs set recordsize=8K rpool
zfs set compression=lz4 rpool
zfs set atime=off rpool

Expected performance (per NVMe):
  • Read: ~6900 MB/s
  • Write: ~4100 MB/s

Actual performance (inside VM):
Using fio, I only get around 1900–2000 MB/s write & read throughput, which seems quite low for this setup.


Write Test command:
fio --name=write-test --size=1G --filename=/tmp/fio-testfile --bs=128k --rw=write --direct=1 --numjobs=1 --time_based --runtime=30 --group_reporting

Read Test command:
fio --name=read-test --size=1G --filename=/tmp/fio-testfile --bs=128k --rw=read --direct=1 --numjobs=1 --time_based --runtime=30 --group_reporting

Result Write (shortened):
  • Write BW: ~1900 MiB/s (~2000 MB/s)
  • IOPS: ~15k
  • Latency avg: ~65µs
  • IO depth: 1

Result Read (shortened):
  • Read BW: ~2600 MiB/s (~2700 MB/s)
  • IOPS: ~21k
  • Avg latency: ~47µs
  • IO depth: 1

My concern:
Given that I’m using 3 high-performance PCIe 4.0 NVMe drives in RAIDZ1, I expected significantly higher write performance.

Questions:
  • Is this performance normal for RAIDZ1 with this kind of workload?
  • Could this be limited by ZFS configuration (e.g. recordsize, sync, etc.)?
  • Is the fio test itself the bottleneck (iodepth=1, single job)?
  • Would a different layout (e.g. mirrors instead of RAIDZ1) improve performance significantly?

Any insights or tuning recommendations would be greatly appreciated.
Thanks in advance!
 
Where does the expected performance come from? Is it from the manufacturer and in an unrealistic perfect (sequential) scenario?
RaidZ1 write is never more than a single drives performance. RaidZ1 is also not a good fit for VMs due to the low (random) IOPS and mirrors are indeed better: https://forum.proxmox.com/threads/fabu-can-i-use-zfs-raidz-for-my-vms.159923/
EDIT: And ZFS has read/write amplification and overhead (compared to LVM) which allow it to have many useful features like data checksums and automatic healing. Unfortunately, ZVOLs are known to be still relatively slow.
 
Last edited:
  • Like
Reactions: uzumo and LnxBil