Yet another "ZFS on HW-RAID" Thread (with benchmarks)

I forgot about this thread and that your question (and almost everything else I just said) was covered earlier in the thread :)


THIS.
Not completely my situation but yah I just want to make sure my thinking is correct and would not make fs only freakout. Like I said plan is to only use 1 vdev and not do say raid 0 multiple single disk to host then do zfs raid levels as some also have.. That seems stupid to me if you already have a HW raid. As I noted I want compression and arc cache with slog.. In my testing with nested trunas it was faster than say a vm strictly on the datastore..
 
Well key is the fact that i already have production h710p and h720p cards
I hope you understand that h710p are over 20 years old technology. I dont know what h720p is (if you mean 730 that's better, but still quite old.) If you are relying on an h710 in 2025 your idea of "production" is seriously in need of re-evaluation- they haven't had firmware updates in over 10 years, and heat cycles plus solder means they dont last forever. never mind that they are meant for HDDs and are S.L.O.W.

edit h730p allow passthrough.

edit2 (and perhaps the most important point) such old equipment is FAR more likely to introduce data errors due to the age of the electronics. since RAID is not content aware it will happily compute parity for mangled bits.
 
Last edited:
I hope you understand that h710p are over 20 years old technology. I dont know what h720p is (if you mean 730 that's better, but still quite old.) If you are relying on an h710 in 2025 your idea of "production" is seriously in need of re-evaluation- they haven't had firmware updates in over 10 years, and heat cycles plus solder means they dont last forever. never mind that they are meant for HDDs and are S.L.O.W.

edit h730p allow passthrough.
Yes I understand this.. Also I have full backups that sync because raid is not a backup :P My point is will zfs corrupt the file system if on top of a HW raid virtual disk in single dev zpool. I do not see why this would be a problem. It would be the same of any other filesystem on a single device. again i want to?Leverage, dedupe, arc, slog and thin provisioning.. I do plan to do some performance test between thin lvm and zfs.. I plan to split my drives into 2 vd's one with zfs and one with thin lvm.. Again i have been using a trunas vm with single disk zpool and slog vmdk for testing for well over a year. If I lose the controller it is any easy fix all you do is replace the controller they are like 30$ and then do foreign disk import. It is raire to lose a controller i work in support at dell on a software san solution powerflex. We use to support h720p on 13g and h730p on 14g's and it was very rare to have a controller failure.. 99.8% of the time it was disks.
 
We use to support h720p on 13g and h730p on 14g's and it was very rare to have a controller failure.. 99.8% of the time it was disks.

There's no such thing as a h720p. 14g dells used h730 (SAS3.) and yes, these are excellent and the only thing I ever needed to replace on them was the bbu.

BUT bear in mind I never ran these past around 6 years old. they can probably last 12, but I wouldn't trust them afterwards- but then I wouldn't in the first place- I do have to pay for power and cooling ;)

My point is will zfs corrupt the file system if on top of a HW raid virtual disk in single dev zpool.
No, it wont corrupt the filesystem. it will just eat space and be slow because of block misalignment and write amplification.
 
There's no such thing as a h720p. 14g dells used h730 (SAS3.) and yes, these are excellent and the only thing I ever needed to replace on them was the bbu.

BUT bear in mind I never ran these past around 6 years old. they can probably last 12, but I wouldn't trust them afterwards- but then I wouldn't in the first place- I do have to pay for power and cooling ;)


No, it wont corrupt the filesystem. it will just eat space and be slow because of block misalignment and write amplification.
hah your right nm my r730xd has a h730p just my r720 has an h710p well i can do passthrough with that h730p controller if I want so i may do that the r730xd will be production the r720 is for test and hop migration I am going from esxi to proxmox and need to move prod to r720 first then reloaded proxmox and send back to R730xd That's the plan. I dont like nukeing the server and restoring from my other backup as its not as redundant. I am migrating off evil Broadcom now that they basically killed vmug.

"No, it wont corrupt the filesystem. it will just eat space and be slow because of block misalignment and write amplification."

Sounds like something to test before I migrate.. ;) I will not have the new disks to test with until august 6th then will take me a few weeks to build. sy by then I may just flash the controller to h710p mini IT mode. The jbod mode on the h730p should be fine for zfs from my understanding..


.. 1751565749914.png
 
Last edited:
Example of patrol read running on my perc controller from tty log it cycles trough every block like zfs does with its checks.
A patrol read check every block of all disks regardless of being in a raidset, were a spare or an unconfigured disk.
Zfs scrub will only check for referenced blocks on disks in the pool, even no defined zfs spare disk and will not find already broken blocks by a headcrash if it's actually not in use, so that's really not the same.
 
Last edited:
  • Like
Reactions: IsThisThingOn
A patrol read check every block of all disks regardless of being in a raidset, were a spare or an unconfigured disk.
Zfs scrub will only check for referenced blocks on disks in the pool and will not find already broken blocks by a headcrash if it's actually not in use, so that's really not the same.
Would there be negative effects having both enabled? virtual disk on perc with No read ahead , write through with patrol read and then have scrub? I mean on my R730xd with h730p i may change controller to jbod more.. that is going to be my main production with most of my vm's that are prod on it.. So for R730xd with h730p controller it would be jbod, with RAIDZ2, with slog on the intel Optane pci nvme ssd. The RAIDZ2 would be 9 3TB 7200 rpm dell enterprise SAS disks..

Right now I am in POC mode on my 720 before migration as I am waiting on a set of 8 3tb SATA Dell enterprise disks.. Yah I know EWWW. Right now I am testing with 8 300G 10k sas disk for poc until I get the new drives begining of august.


I may also get 2tb enterprise SSD's that I may end up getting 24.. I can put 14 ssd's in the xd and then if I get a midpane have 4 more so 18.. Then Migrate the SAS disks to the R720. I plan on getting 24 if possible and having an external manually build 8 or 10 bay nas with trunas. BTW i have good freinds in the enterprise space with their own HW also :D
 
Last edited:
Would there be negative effects having both enabled? virtual disk on perc with No read ahead , write through with patrol read and then have scrub? I mean on my R730xd with h730p i may change controller to jbod more.. that is going to be my main production with most of my vm's that are prod on it.. So for R730xd with h730p controller it would be jbod, with RAIDZ2, with slog on the intel Optane pci nvme ssd. The RAIDZ2 would be 9 3TB 7200 rpm dell enterprise SAS disks..

JBODs would not have a patrol read option. But more to the point; what is the use case for this storage? raidz2 on HDDs would have TERRIBLE performance for virtualization workloads.

edit- slogs dont do what you think they do. https://www.truenas.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/
 
Last edited:
  • Like
Reactions: waltar
JBODs would not have a patrol read option. But more to the point; what is the use case for this storage? raidz2 on HDDs would have TERRIBLE performance for virtualization workloads.
I mean right now I run r730xd on raid6 it works well with nested trunase with slog on the nvme pci ssd.. However this is not an random sync workload... I mean most of this is for home use I do host my web pages and services.. Some stuff for church.. From my windows PC with 10gig card with cat6a going to basement trough cisco nexus 3k 10gig switch then 40gig to 10gig breakout DAC cables to intel sfp+ x710 intel 10gig cards.. Although most if this is hitting arc cache. This is why I want to leverage the arc cache and slog of xfs. because if I say go to my normal nfs server and not the nested trunas non prod I get only like 1/3 this or less. https://maple-street.net/home-and-server-rack-10gbe-network-upgrade-and-deploy/

I do photography photo editing directly on my nas from my pc. So the 10gig is nice.

1751567542841.png
 
Last edited:
makes sense, and is a good solution for your workload. zfs edges out raid due to features like integrated shadow copies, and performance differences should be negligible.

Although most if this is hitting arc cache
Likely not. Your use case is large assets (photographs of size.) ARC works best with small files- so unless you have very large arc and are only accessing a small number of files constantly its not coming to play. Besides, disk subsystems dont have trouble delivering large files at full speed which is why large block benchmarks deliver such big numbers; its just the seek latency thats the problem and thats not typically an issue for your usecase.
 
  • Like
Reactions: IsThisThingOn
makes sense, and is a good solution for your workload. zfs edges out raid due to features like integrated shadow copies, and performance differences should be negligible.


Likely not. Your use case is large assets (photographs of size.) ARC works best with small files- so unless you have very large arc and are only accessing a small number of files constantly its not coming to play. Besides, disk subsystems dont have trouble delivering large files at full speed which is why large block benchmarks deliver such big numbers; its just the seek latency thats the problem and thats not typically an issue for your usecase.
is 128g of arc good? :D that's i have allot of memory on these servers I don't touch.. I have 384 gigs vm's maybe use 180-200 max.. However like you say may not be hit often if larger files.

I think I am going to do some fio testing at varying block sizes. 4k 8k 32k 256k 512k and if im nuts 1m.. Smaller sizes will utilize disks more.. larger sizes more handle throughput.. It will be intresting to note the latency.. I do those type of tests on distributed software san storage..

This is example of some of my stess testing i have used in the past however may need to tweek theses as don't think it is my good set I have used.

fio --name=stresstest --ioengine=libaio --direct=1 --size=75G --group_reporting --numjobs=6 --thread --iodepth=250 --filename=/dev/sdc --rw=randread --bs=4k
fio --name=stresstest --ioengine=libaio --direct=1 --size=75G --group_reporting --numjobs=6 --thread --iodepth=250 --filename=/dev/sdc --rw=randread --bs=128k
fio --name=stresstest --ioengine=libaio --direct=1 --size=75G --group_reporting --numjobs=6 --thread --iodepth=250 --filename=/dev/sdc --rw=randread --bs=256k
fio --name=stresstest --ioengine=libaio --direct=1 --size=75G --group_reporting --numjobs=6 --thread --iodepth=250 --filename=/dev/sdc --rw=randread --bs=512k
 
Last edited:
is 128g of arc good?
Well..., it depends! (Most systems have much, much less than that of course.)

Do not ask this but check it: ~# arc_summary | less to get an impression of the vast amount of live bean counters. Then
Code:
~# arc_summary  | grep hit
        Total hits:                                    97.1 %      12.9G

        Demand data hits:                              98.9 %       6.9G

        Demand metadata hits:                         100.0 %       5.8G
Those "near 100%" will go down when new data is being read from disk. I can not give a definitive recommendation but personally I am fine as long as it stays above 90%. (With drop down to 50% for specific situations, e.g. the backup reading a lot of disks...)
 
Last edited:
  • Like
Reactions: Johannes S