New ZFS setup on new install feels "off" somehow

Jan 2, 2021
26
3
23
40
Current system -
CPU is a Xeon E5-2697 v4
384 GB of memory
x6 Seagate st8000nm0075 8TB drives connected in HBA mode through Dell PERC H730P

Just spun up and migrated over to this "new to me" server. Was running a hardware RAID 10 for many years previously with x4 of the 8TB Seagate drives. Decided this time I wanted to be like the cool kids and test out the ZFS perks. So passed through the disks in HBA mode then setup a 6 disk RAID10 through the Proxmox GUI. But so far I kinda feel underwhelmed, I can't put my finger on it, but the drives feel way slower to me than on my way older server with a hardware RAID 10. I get more IO delay than I would expect when writing data i.e.,copying files to a VM, not extreme amounts but 15-20% at times, seems like a lot comparative to my last server.

Also anecdotally (might be unrelated, but things I've never had issues with in the last like 10 years on proxmox):
- I've had some stability issues with some services on a few of my VMs, services not starting, hanging, general errors
- I'm maxing out swap even though max ram util never really goes over about 70-75%
- The PVE dashboard/webpage is VERY slow to load occasionally, kinda reminds me of when a website in IIS gets unloaded. Never experienced this before.

This feels really tough to explain objectivly, my IT senses just tell me something isn't exactly right, I just don't know what rock to look under to see what's wrong.

Before I migrate back to a hardware RAID10 wanted to see if this is just a configuration issue somewhere. I'm very new to ZFS so it very well could be.
 
In my experience ZFS has always been slower since it is a COW (copy on write) filesystem. Is even worse when running zfs pools on top of another zfs pool, as in the case of shared storage (SAN/NAS) that is running zfs as well.

The real advantages of zfs are mainly snapshotting volumes, and the "thin" nature of provisioning storage. I have yet to see any performance advantage. In my testing I used fio random reads/write to conduct testing, and zfs was always about half the IOP's, Have you done any read/write tests to get a baseline of the IOPs? Just curious.
 
In my experience ZFS has always been slower since it is a COW (copy on write) filesystem. Is even worse when running zfs pools on top of another zfs pool, as in the case of shared storage (SAN/NAS) that is running zfs as well.

The real advantages of zfs are mainly snapshotting volumes, and the "thin" nature of provisioning storage. I have yet to see any performance advantage. In my testing I used fio random reads/write to conduct testing, and zfs was always about half the IOP's, Have you done any read/write tests to get a baseline of the IOPs? Just curious.
I haven't done any testing, I didn't have anything to compare to, my other server is down now, I'd have to spin it up to get some sort of comparative test. Thoughts on a good way to test, I mean I could get data on my current config if nothing else.

Getting a bad feeling I should have just gone Hardware RAID 10 and been done with it...
 
Also here is my arc_summary if that helps anything, I don't fully understand a lot of this or how to decipher what might be good/bad etc.


Had to post as txt file as direct post exceeds post character count...
 

Attachments

I haven't done any testing, I didn't have anything to compare to, my other server is down now, I'd have to spin it up to get some sort of comparative test. Thoughts on a good way to test, I mean I could get data on my current config if nothing else.

Getting a bad feeling I should have just gone Hardware RAID 10 and been done with it...

Ah ok, so will be hard to reference.

Nothing jumps out in the arc report, all looks good there.

If you just want a baseline measurement, you can try some random read/write tests to get the IOP's if for anything a sanity check:


Code:
Random r/w
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread --ramp_time=4

Sequential r/w
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=read --ramp_time=4
 
Last edited:
Just for reference, here are those same tests on a test VM using NFS shared storage to a TrueNAS ZFS array, that runs all flash storage, with a SLOG drive over 10GB uplinks - this may be way different than your setup so keep that in mind! But gives an idea of IOP's for some more modern junk, running ext4 over shared storage.



Code:
Random:
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
write: IOPS=122k, BW=476MiB/s (499MB/s)(2219MiB/4661msec); 0 zone resets

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread --ramp_time=4
read: IOPS=198k, BW=773MiB/s (811MB/s)(1293MiB/1672msec)

Sequential:
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4
write: IOPS=1254, BW=5020MiB/s (5263MB/s)(4096MiB/816msec); 0 zone resets

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=read --ramp_time=4
read: IOPS=1333, BW=5333MiB/s (5592MB/s)(4096MiB/768msec)
 
Here are my results


Code:
Random r/w
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
write: IOPS=14.8k, BW=57.7MiB/s (60.5MB/s)(3875MiB/67188msec); 0 zone resets


sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread --ramp_time=4
read: IOPS=61.7k, BW=241MiB/s (253MB/s)(3080MiB/12777msec)


Sequential r/w
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4
write: IOPS=796, BW=3188MiB/s (3342MB/s)(4096MiB/1285msec); 0 zone resets

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=read --ramp_time=4
read: IOPS=1122, BW=4491MiB/s (4709MB/s)(4096MiB/912msec)
 
That's not bad, I have seen worse in production. Hopefully a zfs guru chimes in, as I don't see disk I/O as a crippling bottleneck from those small tests.
 
I don't know what caused some of the service issues I had post migration, still fighting some of that..., however, I think I found why things felt slow notably the PVE UI. I mentioned swap was maxed out all the time, apparently by default swappiness is set to 60 of system ram, so set that to 10 and that made a huge difference on responsiveness.

I think swap was hitting 100% and the PVE website was unloading itself out of ram or running out of swap which was painfully slow.