poor disk perfomance zfs on dell perc H310mini

AndreyMHV

New Member
Aug 8, 2023
3
0
1
I ran tests with the FIO utility to check performance (read/write speeds and IOPS) and got extremely poor results regarding write speeds. Especially for parameters such as random read/write
The testing methodology was as follows.
On a host system with pve installed, I measured the read write speed of the disk subsystem using the FIO utility.
I performed the measurements with the same block sizes, iodepth, that the manufacturer specified in the disk specifications, namely Random Read (4KB, QD32), (4KB, QD1) and Random Write (4KB, QD32), (4KB, QD1).
My test options
Code:
fio --name TEST --eta-newline=15s --filename=temp.file --rw=<write|read|randwrite|randread> --size=20g --io_size=10g --blocksize=<4K|1M> --ioengine=libaio --fsync=1 --iodepth=<1|32> --direct=1 --numjobs=<1|32> --runtime=300 --group_reporting

Here's the details on the server:
Code:
1.Dell PowerEdge R620
2.Xeon(R) CPU E5-2680 v2 @ 2.80GHz (2 Sockets)
3.380 Gb memory
4.Raid PERC H310 Mini (Embedded) passthrough (non-raid)
5.2x SSD Samsung 870-evo-250gb-sata-3 Raid1 (linux raid mdadm) (Proxmox host)
6.6x SSD Samsung 870-evo-250gb-sata-3 Raid-Z2(VM hosted on here)
7.OS:Debian 11, Linux 5.15.104-1-pve #1 SMP PVE 5.15.104-2 ;Proxmox-7.4-3
8.Raid controller PERC H310 Mini (Embedded)(8-lane, PCI Express 2.0 compliant,)
Code:
Controller Properties PERC H310 Mini (Embedded)(
-Patrol Read Mode                     - Auto
-Manual Patrol Mode Action             - Stopped
-Patrol Read Unconfigured Areas        - Enabled
-Check Consistency Mode                - Normal
-Copyback Mode                        - On
-Load Balance Mode                    - Auto
-Check Consistency Rate(%)            - 30%
-Rebuild Rate(%)                    - 30%
-BGI Rate(%)                        - 30%
-Reconstruct Rate(%)                - 30%


According to the specs on the disks, their speed Sequential Read/Write = ~560/530MB/s accordingly
Random Read/Write IOPS (4KB, QD32) = 98K/88K
Random Read/Write IOPS (4KB, QD1) = 13K/36K
specification sata ssd 870 evo 2tb
specification ssd 870 evo 250gb

In some cases (tests such as randomread/randomwrite 4kb, io depth 32) the results of random read/write gave me performance 10MB/s and about ~1000 IOPS and even less.
Performed tests by FIO

1)I would like to know why I get such low write results, since the disks are brand new and I bought them to get a profit.
2)What is the bottleneck in my specification regarding disk array bandwidth.
3)What would be the optimal Raid configuration for me ? (I don't want to lose ZFS advantages in the form of incrimental replication, snapshots,compression)?
 
Last edited:
Can you post the full output of the fio tests?

Generally speaking though, those SSDs are consumer-grade and are actually quite in-line with what we found in our ZFS benchmarks [1]. Due to how ZFS works, it is strongly recommended to get enterprise-grade SSDs in order to have acceptable performance. Additionally, when using consumer-grade SSDs, ZFS will tear through them quite quickly, they will not last very long.

The specifications of vendors are usually greatly exaggerated when compared to real-world use cases. ZFS also adds considerable additional overhead.

Additionally, since you are using RAIDZ2, the amount of IOPS does not scale with RAIDZ2 setups (the bandwidth should scale with the amount of disks though). That means you are always getting the IOPS of just a single disk when using RAIDZ. The optimal setup for performance would be RAID 10, although at the cost of storage efficiency.

[1] https://www.proxmox.com/images/download/pve/docs/Proxmox-VE_ZFS-Benchmark-202011.pdf
 
  • Like
Reactions: Dunuin and spirit
consumer ssd/nvme don't have supercapacitor to handle fsync in memory (so a single 4k write with fsync, it's writing a full nand cell , that's mean super high amplication (like x40) and low iops)
 
Can you post the full output of the fio tests?

Generally speaking though, those SSDs are consumer-grade and are actually quite in-line with what we found in our ZFS benchmarks [1]. Due to how ZFS works, it is strongly recommended to get enterprise-grade SSDs in order to have acceptable performance. Additionally, when using consumer-grade SSDs, ZFS will tear through them quite quickly, they will not last very long.

The specifications of vendors are usually greatly exaggerated when compared to real-world use cases. ZFS also adds considerable additional overhead.

Additionally, since you are using RAIDZ2, the amount of IOPS does not scale with RAIDZ2 setups (the bandwidth should scale with the amount of disks though). That means you are always getting the IOPS of just a single disk when using RAIDZ. The optimal setup for performance would be RAID 10, although at the cost of storage efficiency.

[1] https://www.proxmox.com/images/download/pve/docs/Proxmox-VE_ZFS-Benchmark-202011.pdf
Can you post the full output of the fio tests?


I would like to clarify, are you not available here my result test fio or should I attach more tests not only random read/write but also sequence?Or what do you mean?​
 
Sorry, seems like I overlooked that part. Skimming through the test results I'd say that those results look pretty decent for the disks you have, to be quite frank. The rest of my points still stand.
 
Sorry, seems like I overlooked that part. Skimming through the test results I'd say that those results look pretty decent for the disks you have, to be quite frank. The rest of my points still stand.
am confused by the results when testing in blocks of 4k, because when using bs 1024k, the results are not so sad, but again we say that I should get performance equal to the number of disks in RAID, and in this case I have Raid z2 x6 2Tb SSD, and in tests for example Random Write 1MB, I see in fact performance of 1-2 disks....and this is speaking only about bs=1024k record, in the situation of bs=4k the situation is terrible to say the least.
I'll attach a screenshot as an example
 

Attachments

  • Screenshot from 2023-04-27 16-18-48 (1).png
    Screenshot from 2023-04-27 16-18-48 (1).png
    257.3 KB · Views: 9
  • Screenshot from 2023-04-27 16-05-04 (1).png
    Screenshot from 2023-04-27 16-05-04 (1).png
    251.7 KB · Views: 9
Last edited:
I think I have the same problem, same system R620xd same CPU but 198Gb ram and H710p flashed to IT mode.
I'm making some testing in two exact systems, just the drives change.

I also find the error on the picture attached and led me to this site but cannot understand it:
http://www.osris.org/documentation/dell

ZFS Raidz1 4x vdevs (24x500gb SSD samsung 860 evo)

Code:
fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.33
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=2002MiB/s][r=2002 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=82448: Fri Sep  1 23:45:32 2023
  read: IOPS=1906, BW=1906MiB/s (1999MB/s)(10.0GiB/5372msec)
    slat (usec): min=373, max=1934, avg=521.21, stdev=112.28
    clat (usec): min=3, max=52587, avg=16086.89, stdev=3205.34
     lat (usec): min=481, max=54521, avg=16608.10, stdev=3303.06
    clat percentiles (usec):
     |  1.00th=[10159],  5.00th=[14877], 10.00th=[14877], 20.00th=[15008],
     | 30.00th=[15270], 40.00th=[15664], 50.00th=[15664], 60.00th=[15795],
     | 70.00th=[15795], 80.00th=[15926], 90.00th=[16057], 95.00th=[27132],
     | 99.00th=[28181], 99.50th=[28705], 99.90th=[44827], 99.95th=[48497],
     | 99.99th=[51643]
   bw (  MiB/s): min= 1002, max= 2016, per=99.50%, avg=1896.60, stdev=314.65, samples=10
   iops        : min= 1002, max= 2016, avg=1896.60, stdev=314.65, samples=10
  lat (usec)   : 4=0.05%, 500=0.02%, 750=0.03%, 1000=0.01%
  lat (msec)   : 2=0.09%, 4=0.20%, 10=0.59%, 20=93.78%, 50=5.21%
  lat (msec)   : 100=0.04%
  cpu          : usr=0.73%, sys=99.24%, ctx=8, majf=4, minf=8207
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=1906MiB/s (1999MB/s), 1906MiB/s-1906MiB/s (1999MB/s-1999MB/s), io=10.0GiB (10.7GB), run=5372-5372msec



MDRAID5: 9x2Tb SSD samsung 870 evo

Code:
/mnt/md127# fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.33
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=2202MiB/s][r=2202 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=489685: Sat Sep  2 16:53:24 2023
  read: IOPS=2010, BW=2010MiB/s (2108MB/s)(10.0GiB/5094msec)
    slat (usec): min=334, max=1780, avg=493.93, stdev=138.53
    clat (usec): min=3, max=51433, avg=15256.18, stdev=4167.27
     lat (usec): min=446, max=53213, avg=15750.10, stdev=4297.82
    clat percentiles (usec):
     |  1.00th=[ 9241],  5.00th=[13304], 10.00th=[13566], 20.00th=[13698],
     | 30.00th=[14091], 40.00th=[14091], 50.00th=[14222], 60.00th=[14222],
     | 70.00th=[14222], 80.00th=[14353], 90.00th=[15270], 95.00th=[26870],
     | 99.00th=[27395], 99.50th=[27657], 99.90th=[43779], 99.95th=[47449],
     | 99.99th=[50594]
   bw (  MiB/s): min= 1024, max= 2236, per=99.60%, avg=2002.20, stdev=435.62, samples=10
   iops        : min= 1024, max= 2236, avg=2002.20, stdev=435.62, samples=10
  lat (usec)   : 4=0.05%, 500=0.05%, 1000=0.05%
  lat (msec)   : 2=0.10%, 4=0.20%, 10=0.63%, 20=89.18%, 50=9.73%
  lat (msec)   : 100=0.02%
  cpu          : usr=1.16%, sys=98.76%, ctx=7, majf=0, minf=8207
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=2010MiB/s (2108MB/s), 2010MiB/s-2010MiB/s (2108MB/s-2108MB/s), io=10.0GiB (10.7GB), run=5094-5094msec
 

Attachments

  • Captura de ecrã 2023-09-02 003713.jpg
    Captura de ecrã 2023-09-02 003713.jpg
    32.4 KB · Views: 4
I came out with a conclusion, after a week of tests I realized that is nothing to do with the HBA cards, the cause is bad proxmox drivers.

I install a windows and a ubuntu directly on the machine and I got the 500MB/s per drive, so looking what to do next...
 
I came out with a conclusion, after a week of tests I realized that is nothing to do with the HBA cards, the cause is bad proxmox drivers.

I install a windows and a ubuntu directly on the machine and I got the 500MB/s per drive, so looking what to do next...
Don't compare Apples and Pines....

ZFS Raidz1 4x vdevs (24x500gb SSD samsung 860 evo) or anything else without real Power Loss Protection will NEVER perform well with ZFS....
When you use Windows or Ubuntu you have a normal FileSystem like NTFS or LVM with ext4.... these do know nothing about your Disks and have no real Data-Integrity at all..... so they can indeed write data without thinking about does it really ever reach the Disk... or is it readable again...
 
  • Like
Reactions: Dunuin
I've tested fio in the boot disc (1x SSD) and I was not even getting 300MB/s and on windows I get 500MBs
I think I have my conclusions but what to do next don't know!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!