ZFS Mirror HDD pool slow performance

sanek2k6

New Member
Feb 22, 2024
3
0
1
Hello!

I have used MDADM for years and would continue to do so, but I heard that its not officially supported, plus I do like ZFS features, so I would really like to try to get it working well.

I have the following system configuration:
  • Proxmox VE 8.1.4
  • AMD Ryzen 9 7940HS CPU
  • 32 GB RAM
  • 2x WD_BLACK 1TB SN770 SSDs in ZFS Mirror configuration (OS Disk, VM & LXC storage)
  • Sabrent DS-SC4B 4-bay 10-Gbit USB-C (UAS) enclosure (JBOD/not raid) with 2x WD RED Pro 16TB drives in ZFS Mirror configuration (Data Storage, SMB)
  • 2.5 Gbit Ethernet
Code:
~# zpool status
  pool: naspool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        naspool     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:01 with 0 errors on Sun Feb 11 00:24:02 2024
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b448b4a0de958-part3  ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b448b4a007bc3-part3  ONLINE       0     0     0

errors: No known data errors

~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0  14.6T  0 disk
├─sda1        8:1    0  14.6T  0 part
└─sda9        8:9    0     8M  0 part
sdb           8:16   0  14.6T  0 disk
├─sdb1        8:17   0  14.6T  0 part
└─sdb9        8:25   0     8M  0 part
nvme1n1     259:0    0 931.5G  0 disk
├─nvme1n1p1 259:1    0  1007K  0 part
├─nvme1n1p2 259:2    0     1G  0 part
└─nvme1n1p3 259:3    0 930.5G  0 part
nvme0n1     259:4    0 931.5G  0 disk
├─nvme0n1p1 259:5    0  1007K  0 part
├─nvme0n1p2 259:6    0     1G  0 part
└─nvme0n1p3 259:7    0 930.5G  0 part


When copying multiple large 5GB test files to the HDD-based pool over SMB, the copy operation goes at ~283MB/s for the first 10GB-11GB and then drops down to ~160MB/s for the rest of the test. At the same time, I can see the IO Delay jump up from ~5% to ~45% in proxmox. Copying to the SSD-based pool maintained ~283 MB/s for the duration of the test.

I also tested using MDADM + RAID1 + LVM + EXT4 for those same two hard drives instead of ZFS (took me 24 hours to initialize that pool! :eek:) and that performed at a constant ~230MB/s with IO Delay varying between 5% and 8%.

I have used ashift=12 when creating the HDD-based pool and have not configured ZIL/L2ARC. I tried searching if I'm missing some sort of configuration, but was not able to find something definitive. My best guess is that ZFS writes its metadata/log to the HDD (and maybe also reading something) as the copy operation is happening, slowing things down, since its a spinning disk.

I do know that generally RAID with USB enclosures has not had the best track record, but I have tested this specific one extensively and it is very stable. I am also using UAS, so it performs very well as well. Still, maybe there are some additional issues where this particular setup is not ideal for ZFS:
Code:
~# lsusb
...
Bus 002 Device 006: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 005: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge

~# lsusb -t
...
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 10000M
        |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 10000M
            |__ Port 1: Dev 5, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 2: Dev 6, If 0, Class=Mass Storage, Driver=uas, 10000M

I do have those two NVME SSDs in a Mirror ZFS pool available if we need to setup ZIL (How would I do that? Would I have to shrink the existing pool?), but I don't really want to kill them - they are pretty good SSDs, but not enterprise-grade, so not sure if thats a good idea.

Any help with this would be appreciated!
 
Last edited:
SN770 does have poor sustained writes: https://www.tomshardware.com/reviews/wd-black-sn770-ssd-review . ZFS with sync writes and write amplification probably makes it worse. Caching sync writes (ZIL) also on that drive make absolutely no sense. Get better drives (with PLP) or live with the poor sustained writes.
Well again, I'm only focusing on the Hard drives at the moment ("naspool") - the SSD pool performance is actually sufficient to me. Since I only created the ZFS pool for the hard drives without specifying any ZIL/L2ARC:
Code:
zpool create -f -o ashift=12 naspool mirror /dev/sda /dev/sdb

As such, I'm assuming the SSDs aren't even being used for that pool right now. Ideally, I would like to get close to the MDADM performance for that hard drive pool for sustained writes.
 
Last edited:
Well again, I'm only focusing on the Hard drives at the moment - the SSD pool performance is actually sufficient to me. Since I only created the ZFS pool for the hard drives without specifying any ZIL/L2ARC:
Right, sorry, your post disappeared (spam filter) and I responded too quickly. I think putting ZIL (and probably L2ARC) on those SSDs is a bad idea.
Ideally, I would like to get close to the MDADM performance for that hard drive pool for sustained writes.
Don't use ZFS because it has sync writes for metadata and write amplification? If you add a ZFS special device to the HDD pool, you can put the metadata on those special devices and probably get better performance. Best to use enterprise SSDs with PLP for that, and it needs at least the same redundancy (mirror in your case) because you lose the pool when the special device breaks.
 
Right, sorry, your post disappeared (spam filter) and I responded too quickly. I think putting ZIL (and probably L2ARC) on those SSDs is a bad idea.

Don't use ZFS because it has sync writes for metadata and write amplification? If you add a ZFS special device to the HDD pool, you can put the metadata on those special devices and probably get better performance. Best to use enterprise SSDs with PLP for that, and it needs at least the same redundancy (mirror in your case) because you lose the pool when the special device breaks.
Unfortunately I am limited with connectivity for this system (Minisforum UM790 Pro). I setup a homelab with this mini pc, which is pretty powerful, but very quiet and very power-efficient. There are two M.2 ports with the two NVME SSDs and another one keyed for a wifi card (PCIE 2.0). There are several USB4/TB4 ports available though, one of which I utilized by connecting an external enclosure. The Mini PC is behind a UPS as well, so its not all bad. No critical data will be on it, so definitely not an enterprise system.

I guess ZFS is just not the right option for me here. Alternatively, I can go with MDADM that I have tested as working well already, risking that one day a Proxmox update might break it. I could look into doing a straight up backup/SnapRAID or perhaps an LVM mirror (which I understand is MDADM behind the scenes)... or maybe even BTRFS?

I've had a MDADM RAID10 with 4x 2TB WD Enterprise drives running for ~10 years on my last NAS until it got killed by a lightning strike (the drives survived!), I upgraded to this system and decided to see if there are any better options.
 
Unfortunately I am limited with connectivity for this system (Minisforum UM790 Pro). I setup a homelab with this mini pc, which is pretty powerful, but very quiet and very power-efficient. There are two M.2 ports with the two NVME SSDs and another one keyed for a wifi card (PCIE 2.0). There are several USB4/TB4 ports available though, one of which I utilized by connecting an external enclosure. The Mini PC is behind a UPS as well, so its not all bad. No critical data will be on it, so definitely not an enterprise system.
The system still can't cache sync writes, because the drives don't know about the UPS. If you had enterprise NVMes with PLP instead then you could use part of them as special devices.
I guess ZFS is just not the right option for me here. Alternatively, I can go with MDADM that I have tested as working well already, risking that one day a Proxmox update might break it. I could look into doing a straight up backup/SnapRAID or perhaps an LVM mirror (which I understand is MDADM behind the scenes)... or maybe even BTRFS?
ZFS many nice features do come with a performance, space and price cost. Btrfs also has write amplification but maybe it works better for you. Lots of threads on this forum about people with tiny systems wanting big performance, maybe someone else here has experience with what you want.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!