ZFS Performance Questions on HDDs

axlf

New Member
Sep 4, 2020
3
0
1
42
Hello,

I'm running a Server with 2 x 8 TB HDD and 1 x 240GB SSD Drive with the following config.

Code:
# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 22:10:56 with 0 errors on Sun Apr 11 22:34:58 2021
config:

    NAME                                         STATE     READ WRITE CKSUM
    rpool                                        ONLINE       0     0     0
      mirror-0                                   ONLINE       0     0     0
        ata-HGST_HUH721008ALE600_12345678-part3  ONLINE       0     0     0
        ata-HGST_HUH721008ALE600_87654321-part3  ONLINE       0     0     0
    cache
      sda                                        ONLINE       0     0     0

It's the first time I'm using ZFS so there is a lot of uncertainty on my end. There are running a handful of Webserver VM's on this, most of them with personal projects and really low traffic. I have the feeling that I have done something fundamentally wrong, because of slow read / write speeds.

If I'm running a fio test on the rpool root, I get following results

Code:
# fio --filename=test --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=348KiB/s][w=87 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8532: Tue Apr 20 22:32:58 2021
  write: IOPS=95, BW=380KiB/s (389kB/s)(111MiB/300005msec); 0 zone resets
    clat (msec): min=2, max=139, avg=10.52, stdev= 7.13
     lat (msec): min=2, max=139, avg=10.52, stdev= 7.13
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    9], 40.00th=[    9], 50.00th=[    9], 60.00th=[   10],
     | 70.00th=[   11], 80.00th=[   13], 90.00th=[   17], 95.00th=[   21],
     | 99.00th=[   41], 99.50th=[   54], 99.90th=[   86], 99.95th=[  102],
     | 99.99th=[  136]
   bw (  KiB/s): min=   64, max=  688, per=100.00%, avg=380.08, stdev=101.94, samples=600
   iops        : min=   16, max=  172, avg=94.98, stdev=25.49, samples=600
  lat (msec)   : 4=3.74%, 10=62.58%, 20=28.41%, 50=4.69%, 100=0.53%
  lat (msec)   : 250=0.05%
  cpu          : usr=0.01%, sys=0.21%, ctx=57150, majf=8, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,28511,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=380KiB/s (389kB/s), 380KiB/s-380KiB/s (389kB/s-389kB/s), io=111MiB (117MB), run=300005-300005msec

What do you think would be the approximate read/write performance for this setup?

Thank you,
— axlf
 
HDDs are really crappy as a VM storage because you need alot of IOPS and HDDs can only be fast for sequential async reads/writes. Especially if your are running webservers with MySQL DBs and a lot of web logs. 94 IOPS sounds normal for a HDD mirror. Because of that it is normal that you only get write speeds below 1 MB/s if you are trying to do random or sync writes.
You really should consider using a mirror of enterprise grade SSDs for your VMs if Proxmox is reporting a high IO delay (look at your servers "CPU usage" graph).

And your SSD will only cache reads and only if your ARC (RAM) is full already. It won't help you at all with writes. For that you would need a SLOG but a SLOG can only cache sync writes and not async writes. Only a fraction of your writes will be sync (MySQL DBs for example), most OS stuff should be async writes.
 
Last edited:
I've got my SSD configured as an SLOG, and it's barely getting any I/O through it in normal operation - my PVE cluster doesn't seem to do much sync writing - but that is what your test is based on.

Running that command on my machines gets:
HOST1 (NVMe slog, JBOD underneath): 215k IOPS
HOST2 (SATA SSD slog, 4 disk RAIDZ1 underneath): 6553 IOPS
HOST3 (SATA SSD): 3142 IOPS

Turning off the slog for the first two, I get:
HOST1 (3 disk JBOD): 3568
HOST2 (4 disk RAIDZ1): 700

A quick google shows real world testing on those drives giving 400 IOPS as an achievable goal, so in a mirror I would expect comparable IOPS to that. Does definitely sound like something is not running right - if you don't have data in the ZFS pool, I would run the fio against just a single device to confirm there's no hardware issues.

Spinning rust (HDD) are never going to be stellar on random I/O - and whilst I use them myself, I don't expect great performance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!