ZFS HDD Mirror of 2 WD Red Pro 8tb slow even with SLOG (60gb) and Cache (400gb) on SSD

watzr

New Member
Mar 3, 2022
13
0
1
54
Hey,

i setup a "Storage" ZFS pool on my server with 2 8tb WD Red Pros as mirror. I gave parts of this pool to several VMs and created an ext4 filesystem inside those to make use of it. Now performance was terrible, worse than a single of these drives without any RAID, i got about 120-150 MB/s writes on average but with extremely high IO delay up to over 90% on the server. (The server has 64GB of ram installed with roughly 40-45 of that in use)
Now i read up on zfs a bit and installed an NVME SSD (Samsung 970 Evo, i know, its not made for this) and gave 60GB of SLOG and 400GB of Cache to the HDD-mirror.
Whats confusing me now is that nothing changed really. Whether read nor write speed changed after installing slog and cache.
I understand that not every write is in sync mode so SLOG doesnt necessarily help in every situation but shouldnt at least the cache help tremendously in write performance?!

Can someone help me understand this issue better? Did i misconfigure something?

Thanks a lot in advance!
 
I understand that not every write is in sync mode so SLOG doesnt necessarily help in every situation but shouldnt at least the cache help tremendously in write performance?!
No, the L2ARC isn't used as write cache at all. Its just read cache and will only be used when your RAM is full. Because a L2ARC is way slower than the ARC, so it would be bad to use it if the faster ARC could handle the data too. Its also recommended to have a RAM-to-L2ARC ratio of below 1/10. So if you got a 400GB L2ARC you should give ZFS atleast 40GB of RAM. And the more L2ARC you got, the less RAM you get for the ARC, as the L2ARC has to be indexed in RAM. So using L2ARC you sacrifiy very fast RAM cache to get more of slow SSD cache. Depending on the situation using a L2ARC might even slow down your pool.

If you want to see more performance try to...
- enable relatime or disable atime for your pool
- make sure compression is set to LZ4
- make sure your pools ashift is 12
- add two SSDs as a mirrored "special device" so metadata will be stored on the SSDs instead of the HDDs so the HDDs aren't hit by that much IOPS
- never fill a pool more than 80%
- disable caching for the VM (cachemode "none" instead of "write back" or "write through") as ZFS is already caching in ARC so you don't cache the same data twice in RAM
- use "virtio SCSI single" for your VMs and enable "SSD emulation" and "IO thread"
- don't mount a ext4 with "discard" option. Setup a daily fstrim -a in your guests crontab instead
 
Last edited:
  • Like
Reactions: Tmanok
Hey,

i setup a "Storage" ZFS pool on my server with 2 8tb WD Red Pros as mirror. I gave parts of this pool to several VMs and created an ext4 filesystem inside those to make use of it. Now performance was terrible, worse than a single of these drives without any RAID, i got about 120-150 MB/s writes on average but with extremely high IO delay up to over 90% on the server. (The server has 64GB of ram installed with roughly 40-45 of that in use)
Now i read up on zfs a bit and installed an NVME SSD (Samsung 970 Evo, i know, its not made for this) and gave 60GB of SLOG and 400GB of Cache to the HDD-mirror.
Whats confusing me now is that nothing changed really. Whether read nor write speed changed after installing slog and cache.
I understand that not every write is in sync mode so SLOG doesnt necessarily help in every situation but shouldnt at least the cache help tremendously in write performance?!

Can someone help me understand this issue better? Did i misconfigure something?

Thanks a lot in advance!
Well, is not just that "Samsung 970 Evo is not made for this", is "Samsung 970 Evo does NOT have to be used for this", it has really poor sync performances, so if you use it as SLOG you have a really bad performance.
Of course SLOG does not increase not sync performances, nor bandwidth.
In any case test your storage with pveperf /storagepath and tell us about the FSYNCS/SECOND.
Here mine with a ZFS mirror of 2 disks + Kingston DC 500M Sata SSD as SLOG
Code:
# zpool status hddp1
  pool: hddp1
 state: ONLINE
config:

        NAME                            STATE     READ WRITE CKSUM
        hddp1                           ONLINE       0     0     0
          mirror-0                      ONLINE       0     0     0
            wwn-0x5000cca291cbba49      ONLINE       0     0     0
            wwn-0x5000cca291c1d39f      ONLINE       0     0     0
        logs
          wwn-0x50026b768385302f-part4  ONLINE       0     0     0

# smartctl -a /dev/disk/by-id/wwn-0x5000cca291cbba49 | grep 'Device Model'
Device Model:     HGST HUH721212ALE600
# smartctl -a /dev/disk/by-id/wwn-0x50026b768385302f | grep 'Device Model'
Device Model:     KINGSTON SEDC500M960G

# pveperf /hddp1
CPU BOGOMIPS:      86237.52
REGEX/SECOND:      3562189
HD SIZE:           10168.04 GB (hddp1)
FSYNCS/SECOND:     10053.08
 
If you want to see more performance try to...
- enable relatime or disable atime for your pool
- make sure compression is set to LZ4
- make sure your pools ashift is 12
- add two SSDs as a mirrored "special device" so metadata will be stored on the SSDs instead of the HDDs so the HDDs aren't hit by that much IOPS
- never fill a pool more than 80%
- disable caching for the VM (cachemode "none" instead of "write back" or "write through") as ZFS is already caching in ARC so you don't cache the same data twice in RAM
- use "virtio SCSI single" for your VMs and enable "SSD emulation" and "IO thread"
- don't mount a ext4 with "discard" option. Setup a daily fstrim -a in your guests crontab instead
thank you guys. i removed the slog and cache now.
- i need to read up on relatime and atime, didnt set anything since i wasnt sure what either of it does.
- compression is on lz4 and ashift is 12.
- im not sure if i want to use "special device" as it would mean to give the entire pool health into my consumer grade ssds hands. currently none used.
- pool is well below 50% in fact one VM is given 3tb and another 0,5tb, nothing more of the pool of 8tb is used.
- i changed the VMs to single mode and enabled "SSD emu" and "IO thread", no noticeable improvement though.
- on fstrim i too need to read up on, dont want to change things when im not knowing at least roughly what im doing.

this is the output of the zpool and pveperf:
Code:
root@pve:~# zpool status HDD-Storage-M1
  pool: HDD-Storage-M1
 state: ONLINE
config:

        NAME                                     STATE     READ WRITE CKSUM
        HDD-Storage-M1                           ONLINE       0     0     0
          mirror-0                               ONLINE       0     0     0
            ata-WDC_WD8003FFBX-68B9AN0_VGJHS94G  ONLINE       0     0     0
            ata-WDC_WD8003FFBX-68B9AN0_VDJSDNHM  ONLINE       0     0     0

errors: No known data errors
root@pve:~# pveperf /HDD-Storage-M1/
CPU BOGOMIPS:      124793.76
REGEX/SECOND:      3924969
HD SIZE:           3702.50 GB (HDD-Storage-M1)
FSYNCS/SECOND:     125.04

thank you guys again, i would appreciate some more input on this
 
FSYNCS/SECOND: 125.04
That's about right. The only way to improve spinning rust is by adding more spinning rust. Two harddisks are not fast, really. You cannot change that, no filesystem on the planet can. Use 4, 6 or 8 if you have to stick with hdd. Another way is to go with Enterprise SSDs (SATA/SAS or NVMe).
 
  • Like
Reactions: Tmanok and Dunuin
Hey,

i setup a "Storage" ZFS pool on my server with 2 8tb WD Red Pros as mirror. I gave parts of this pool to several VMs and created an ext4 filesystem inside those to make use of it. Now performance was terrible, worse than a single of these drives without any RAID, i got about 120-150 MB/s writes on average but with extremely high IO delay up to over 90% on the server. (The server has 64GB of ram installed with roughly 40-45 of that in use)
Now i read up on zfs a bit and installed an NVME SSD (Samsung 970 Evo, i know, its not made for this) and gave 60GB of SLOG and 400GB of Cache to the HDD-mirror.
Whats confusing me now is that nothing changed really. Whether read nor write speed changed after installing slog and cache.
I understand that not every write is in sync mode so SLOG doesnt necessarily help in every situation but shouldnt at least the cache help tremendously in write performance?!

Can someone help me understand this issue better? Did i misconfigure something?

Thanks a lot in advance!
If you have fsync writes from a VM you absolutle need to use killer feature of zfs - zlog ssd cache (enterprise level with a capacitor). If you have many read you can use l2arc with tune to write any data on it and make it persistence! - That is another killer feature of zfs. If you have many write in async mode you can tune a block size in your VM or add more disk. If you need mix it, you can use two or more l2arc ssd. Just remember that SATA is speed half protocol. You can't write and read in the same time. Just read or write. If you need speed you can switch from sata to sas.
 
thank you guys. thats what i had thought...double the speed. but how does the dude in the video i linked do it? he has seagate ironwolf 8tb drives that are also 7,2k rpm and sata. how does he get nearly 10GBit read speed out of his pool without anything added?
 
One mans performance is another mans slowness. Are you talking about sequential read then yes, with 3-4 harddisks you can saturate a 10 GBE link if all data is stored sequentially and read in that way. If and only if this is true, you will get those speeds.

If you have to hop around on the disk and read in the front, the middle and in the back, you will be significantly (!) slower. Those fsync-numbers you saw are IOPS (IO operations per second) and if you have e.g. 512n Sectors on your disk and you have 125 IOPS per second per drive, you will only read in our 3 disk setup with 3 * 125 * 512 = 192 K/sec. That is the worst case compared to the super-duper case above. The real world performance lies in between and is hugely use case and application specific

For decades, the IOPS battle for harddisk was won with disks, huge amounts of disks. One of the last giants before the SSD was affordable, was the HP EVA system. I decomissioned and moved one a few years back and it had 1,5 racks full of disks, so 48 HE full with 24 shelves with 12 disks each ... and we got a few tens-of-thousands-of IOPS. Nowadays, one customer SSD has more (but not the same performance as this behemoth)
 
Last edited:
  • Like
Reactions: Tmanok
alright, thank you guys again! since pretty much all my questions got answered this can be marked as solved or closed or however its done here :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!