Why ZFS Pool so slow VS raid hw ?

nsc

Renowned Member
Jul 21, 2010
49
2
73
Hello,

My server have LSI 9361-8i 2 GB cache BBU with 8x14GB SAS Disk.
CPU is a 16 x AMD Ryzen 7 PRO 3700 8-Core Processor (1 Socket) with 128Gb RAM

I'm benching Proxmox with 8.4, Boot is on SSD drive.

My main storage was initialized with RAID5, LVM on virtual disk.

I made some "load" fio bench, here the result :

iowait stay around 60%, no cpu usage, load around 10

I read a lot of thing about ZFS (better / stronger) so i break my RAID5 and put disks as JBOD

Then i create raidz1 pool and run same test.
iowait move up to 75%, cpu usage around 30%, load around 100 !!!

Then i create raid10 pool and run same test.
iowait move up to 85%, cpu usage around 15%, load around 100 !!! Yes 100 again !!

Did i miss someting ?

Does hardware RAID definitively outperform ZFS in terms of performance and overall impact on the server?

I'm frustrated because I've read a lot about ZFS, especially with Proxmox, but here it's the opposite...

Thanks

Nsc
 
Did i miss someting ?
Well, maybe. Probably you are comparing apples with oranges.

And "iowait move up to 75%, cpu usage around 30%, load around 100 !!!" does tell us nothing as there is no "Megabytes per second" or "In-/Out operations per seconds" reported. Note that IOPS is much more important than MB/s - at least for my use cases.

You need to utilize a sane benchmark! Look for fio, for example:
Code:
~# apt show fio
Description: flexible I/O tester
 fio is a tool that will spawn a number of threads or processes doing a
 particular type of I/O action as specified by the user. fio takes a
 number of global parameters, each inherited by the thread unless
 otherwise parameters given to them overriding that setting is given.
 The typical use of fio is to write a job file matching the I/O load
 one wants to simulate.

Then prepare for a repeatable test and run the same command on your different setups.

Usual pitfalls are: the HBA has a BBU. While this is good for some situations, it just lies to us: data declared to be written may be sitting in that buffer.

And ZFS does a lot more than just writing data, it manages and guarantees data-integrity with a lot of checksum-"magic". This needs some compute power and of course this is slower than just writing data without any check.


One of my standard-tests - just as an example! - is this :
Code:
root@pven:/rpool/fio# fio --name=randrw --ioengine=libaio --direct=1 --sync=1 --rw=randrw --bs=4k --numjobs=1 --iodepth=1 --size=20G --runtime=60 --time_based --rwmixread=75
The result depends heavily on used options. Here "--bs=", "--iodepth=" & "--sync" are intentionally set to force the lowest result.

See man fio for a loooong man page; search for it on "the web" to find tutorials.
 
  • Like
Reactions: news
This is exactly what i did.

i create a fio benchmarch with multiple job : read / write / random in a script bash, fio configuration :

Code:
[global]
ioengine=libaio
direct=1
time_based=1
runtime=${RUNTIME}
group_reporting=1
log_avg_msec=1000
write_bw_log=${MNT_BASE}/fio_bw
write_iops_log=${MNT_BASE}/fio_iops
write_lat_log=${MNT_BASE}/fio_lat

[veeam-read]
rw=read
bs=1M
iodepth=32
numjobs=2
filename=${MNT_BASE}/src/testfile
size=100G

[veeam-write]
rw=write
bs=1M
iodepth=32
numjobs=2
filename=${MNT_BASE}/dst/testfile
size=100G

# --- Activité VM random ---
[vm-noise]
rw=randrw
rwmixread=70
bs=4k
iodepth=64
numjobs=8
filename=${MNT_BASE}/vm/testfile
size=20G

then I run it multiple time on LVM over Raid HW then on ZFS Pool different RAID Level.

Then i get theses results (very high load with ZFS)

So no, i don't compare apple and orange ;-)

The starting point for all this is that I have another server with eight disks as well, and I thought it would be time to move away from hardware RAID and start using ZFS. And it was on this server that I encountered these load issues when the server was subjected to heavy I/O (Veeam backup).

So I took back this old server, which still had an active RAID5 and which I could reinstall in Proxmox 8.4 for comparison purposes.

I know the limitations of BBU and cache, but clearly in this operating mode, i.e. having two or three VMs on this type of server, it is better to use hardware RAID in my case than ZFS at present.
 
it is better to use hardware RAID in my case than ZFS at present.
Okay, may be...

I am not really convinced, but that is irrelevant :-)
 
  • Like
Reactions: Johannes S
The Server has two SSD, i took one and create a SLOG, on it with sync=always, dont change on the load while fio benchmark, still very load (around 103).
 
Does hardware RAID definitively outperform ZFS in terms of performance and overall impact on the server?
Yes. this isn't news or a mystery.

ZFS has many OTHER advantages over lvm on block. performance is only one factor; if the zfs subsystem is providing SUFFICIENT performance for your application it is by far preferrable. You might want to consider whether the benchmarks you're chasing have any real relevence to your application.
 
RAID5 (with BBU) is almost, but not quite, entirely unlike RAIDz1 (with drives without PLP): https://forum.proxmox.com/threads/fabu-can-i-use-zfs-raidz-for-my-vms.159923/post-734643
Yes, and I learned that the hard way. In 2025, I thought that ZFS, which was being promoted everywhere, would be able to replace a hardware RAID card that was several years old. I also thought it was a good candidate for mdadm, which I had also used in the same type of configuration.

ZFS may be more secure and certainly offers many interesting features, but I do not need them for my use case, and conversely, its performance is a real bottleneck.

It's not a big deal, we also enjoy working in IT, we learn something new every day and we challenge ourselves :)