Hi,
I would like to talk about a topic already discussed hundreds of times here on the forum, but in slightly more scientific and practical terms:
ZFS on top of Hardware RAID
Introduction
I already know all the warnings regarding this configuration, but since in most cases the references mentioned are experiments on small home-lab, issues on cheap hardware and so on, I would like to cover this topic in the "enterprise servers" context.
I would start from an assumption: enterprise-class RAID controllers, as well as enterprise storage solutions with LUNs, have been present for decades and keep the majority of IT systems running without evidence of continuous catastrophes or problems, with any kind FS on top.
So in this thread, on the "Hardware-RAID" side, I'm talking about professional RAID cards with battery backed-up caching, redundant arrays with T10 data protection and so on.... not raid on consumer motherboard chipsets or similar cheap solutions.
Using ZFS directly on raw disks has its advantages of course (but also disadvantages, such as the impossibility of expansion by single disks additions).
If the performances were equivalent, I think no one would have any doubts about the choice: letting ZFS directly manage the disks is certainly better!
But the reality is that ZFS disk management VS Hw-RAID seems to have a huge impact on performance, especially under database-type load profiles.
Testing configuration
Let me start by saying that I have been using ZFS since Solaris 10 and, based on what I have always read, I have always thought that it was undoubtedly better to let ZFS manage the disks directly.
However, after some rather shocking tests (very low performance) on a fully managed ZFS pool, very intrigued by this topic, I investigated futhermore and went down the rabbit hole, doing many kind of testing on many configurations.
I've done many test on two identical machine. I would like to show you the most relevant ones.
Common hardware configuration
Testing environment setup
I've installed PVE on both nodes, configured everything (network, etc...), and created the ZFS data pools on both nodes with the same configs (comp LZ4, ashift 12).
Then, in order to test a typical workload from our existing infrastructure, I've created a Windows 2022 virtual machine on both nodes with this config:
No other ZFS options nor optimization are used on both setups
Testing metodology
I've used CrystalDiskMark for a rapid test with these parameters:
SQL Server 2022 Developer edition and HammerDB was used for DB workload benchmarking,
with this configuration-metodology:
RESULTS
CrystalDiskMark rapid test
First server (ZFS managed disks)

Second server (ZFS on HW Raid)

No major difference here, except the sequential write.
HammerDB - Orders per minutes
First server (ZFS managed disks): 28700
Second server (ZFS on HW Raid): 117000
(4 times faster on HW Raid)
HammerDB - Transaction count graphs
First server (ZFS managed disks)

Second server (ZFS on HW Raid)

Conclusions
It seems that, on performance side and with database-type workload, having a Raid card with (battery-protected) caching gives a huge advantage.
ZFS on Hw Raid measured 4 times the performance on DB workload VS ZFS on raw disks!
Also remembering that we are comparing an hardware RAID 5 versus a RAID 10 on ZFS, clearly against the hardware raid with a database type workload...but despite this, we obtained enormously superior performance.
Considering what was said in the introduction and in front of these results, I sincerely think that the use of an Hardware RAID (again, on enterprise-grade platforms with battery-backed write caching) can bring great advantages, in addition to the beautiful features that ZFS has (snapshots, ARC, Compression, etc...).
I also think that passing through a controller-managed drives, can give also advantages in terms of write amplification on SSDs (not tested yet, just speculation)
Still considering what said in the introduction about the use of enterprise-grade hardware, are there actually such disadvantages as to give up this performance boost of ZFS on HW Raid over RAW disks?
I hope to get some opinions from you too.
In case of opinions against this setup (I imagine they will be mostly on data resilience), I would kindly ask you to bring sources and real world examples on enterprise-hardware properly configured systems.
(Please also note that, as already said, I was totally in favour of ZFS on RAW disks by years... until these tests, so it is absolutely not a provocative post)
Bye,
Edoardo
				
			I would like to talk about a topic already discussed hundreds of times here on the forum, but in slightly more scientific and practical terms:
ZFS on top of Hardware RAID
Introduction
I already know all the warnings regarding this configuration, but since in most cases the references mentioned are experiments on small home-lab, issues on cheap hardware and so on, I would like to cover this topic in the "enterprise servers" context.
I would start from an assumption: enterprise-class RAID controllers, as well as enterprise storage solutions with LUNs, have been present for decades and keep the majority of IT systems running without evidence of continuous catastrophes or problems, with any kind FS on top.
So in this thread, on the "Hardware-RAID" side, I'm talking about professional RAID cards with battery backed-up caching, redundant arrays with T10 data protection and so on.... not raid on consumer motherboard chipsets or similar cheap solutions.
Using ZFS directly on raw disks has its advantages of course (but also disadvantages, such as the impossibility of expansion by single disks additions).
If the performances were equivalent, I think no one would have any doubts about the choice: letting ZFS directly manage the disks is certainly better!
But the reality is that ZFS disk management VS Hw-RAID seems to have a huge impact on performance, especially under database-type load profiles.
Testing configuration
Let me start by saying that I have been using ZFS since Solaris 10 and, based on what I have always read, I have always thought that it was undoubtedly better to let ZFS manage the disks directly.
However, after some rather shocking tests (very low performance) on a fully managed ZFS pool, very intrigued by this topic, I investigated futhermore and went down the rabbit hole, doing many kind of testing on many configurations.
I've done many test on two identical machine. I would like to show you the most relevant ones.
Common hardware configuration
- Dell Poweredge R640 with 8x2,5" SATA/SAS backplane
- 2x Intel(R) Xeon(R) Gold 5120
- 128GB of RAM
- Dell PERC H730 Mini with 1GB of cache and Battery PLP
- 2x Crucial BX500 as Boot drives
- 4x Kingston DC600M (3840GB) as Data drives
- PERC in HBA mode, bypassing all the drives to the OS
- One ZFS 2 drive mirror for boot/OS
- One ZFS 4 drive stripped mirrors (RAID 10 equivalent) for data
- PERC in RAID mode, Caching Writeback and No read-ahead
- One Virtual disk with 2 drive in RAID 1 for boot/OS
- One Virtual disk with 4 drive in RAID 5 for data
Testing environment setup
I've installed PVE on both nodes, configured everything (network, etc...), and created the ZFS data pools on both nodes with the same configs (comp LZ4, ashift 12).
Then, in order to test a typical workload from our existing infrastructure, I've created a Windows 2022 virtual machine on both nodes with this config:
- 32 vCPUs (2 sockets, 16 core, NUMA enabled)
- 20GB of RAM (balooning enabled, but not triggered since the hosts never reaches the threshold)
- One 80GB vDisk for OS
- One 35GB vDisk for DB Data/Log files (Formatted with 64K allocation size, according to SQL Server guidelines)
- VirtIO SCSI single for disks
- Caching set to "Default (No cache)", Discard enabled
- VirtIO full package drivers installed on guest
No other ZFS options nor optimization are used on both setups
Testing metodology
I've used CrystalDiskMark for a rapid test with these parameters:
- Duration: 20 sec
- Interval: 10 sec
- Number of tests: 1
- File size: 1 GiB
SQL Server 2022 Developer edition and HammerDB was used for DB workload benchmarking,
with this configuration-metodology:
- Created an empty DB with 8x datafiles (20 GB total) and 1x logfile (5 GB)
- Populated the DB with HammerDB, using 160 Warehouses
- Backed-up the DB (useful to do multiple test starting with the same condition)
- All tests are done after a PVE host restart and 2 minutes wait after the VM startup and no other VMs/applications/backups are running on the nodes during tests
- No encryption (direct connection on local DB)
- Windows authentication
- Use all warehouses: ON
- Checkpoint when complete: ON
- Virtual users: 100
- User delay: 20ms
RESULTS
CrystalDiskMark rapid test
First server (ZFS managed disks)

Second server (ZFS on HW Raid)

No major difference here, except the sequential write.
HammerDB - Orders per minutes
First server (ZFS managed disks): 28700
Second server (ZFS on HW Raid): 117000
(4 times faster on HW Raid)
HammerDB - Transaction count graphs
First server (ZFS managed disks)

Second server (ZFS on HW Raid)

Conclusions
It seems that, on performance side and with database-type workload, having a Raid card with (battery-protected) caching gives a huge advantage.
ZFS on Hw Raid measured 4 times the performance on DB workload VS ZFS on raw disks!
Also remembering that we are comparing an hardware RAID 5 versus a RAID 10 on ZFS, clearly against the hardware raid with a database type workload...but despite this, we obtained enormously superior performance.
Considering what was said in the introduction and in front of these results, I sincerely think that the use of an Hardware RAID (again, on enterprise-grade platforms with battery-backed write caching) can bring great advantages, in addition to the beautiful features that ZFS has (snapshots, ARC, Compression, etc...).
I also think that passing through a controller-managed drives, can give also advantages in terms of write amplification on SSDs (not tested yet, just speculation)
Still considering what said in the introduction about the use of enterprise-grade hardware, are there actually such disadvantages as to give up this performance boost of ZFS on HW Raid over RAW disks?
I hope to get some opinions from you too.
In case of opinions against this setup (I imagine they will be mostly on data resilience), I would kindly ask you to bring sources and real world examples on enterprise-hardware properly configured systems.
(Please also note that, as already said, I was totally in favour of ZFS on RAW disks by years... until these tests, so it is absolutely not a provocative post)
Bye,
Edoardo
			
				Last edited: 
				
		
	
										
										
											
	
										
									
								 
	 
	
 
 
		
 but fortunately all data and VMs are safe now.
 but fortunately all data and VMs are safe now.
 
 
		 ) in the test using a lot of parallelism.
) in the test using a lot of parallelism.