ZFS Mirror with two SSDs IO Delay

rbe

New Member
Feb 13, 2023
9
0
1
Hey,

I'm facing a weird issue here, I have NUC11i7 with one Crucial MX500 4TB SSD and one Crucial CT4000P3PSSD8 4TB NVMe SSD in a ZFS Mirror one. I've capped ZFS to 8GB of Memory (the host has 64GB in total) via the /etc/modprobe.d/zfs.conf file.

Everytime I have a bit more IO (especially observed this during write, read seems to be ok), I start to get an io delay of 30-50% and services start to freeze until the IO is over. I've just had the case again while I restored a VM from a QNAP on the network, the limiting factor here should have been the QNAPs' disks which don't allow more than 60mb/s read, however during this restore the IO Delay jumped up to 50% and the two VMs on that host stopped responding until it was done.

Does anyone have an idea? I'm aware that the mix of NVMe and SATA SSD will limit me to the smallest available write speed, however that should still be around 500MB/s and not cause any kind of io delay given that the restore wasn't even 1/5 of the speed.
For reference, the restore even showed this:
restore image complete (bytes=34359738368, duration=2378.50s, speed=13.78MB/s)

/etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=8589934591
options zfs zfs_arc_max=8589934592

arc_summary: https://pastebin.com/hUH2FzrS
pve_perf: https://pastebin.com/GeGxh8ig
iostat: https://pastebin.com/8WtMv8jV
zpool get all rpool: https://pastebin.com/sP4qQmYL
 
Last edited:
I'm aware that the mix of NVMe and SATA SSD will limit me to the smallest available write speed, however that should still be around 500MB/s and not cause any kind of io delay given that the restore wasn't even 1/5 of the speed.
No. Do some really bad workloads (like 4K sync writes) and the write performance of that SSD can easily drop to a few MB/s. SSDs are only fast at reading. When writing there are situations where a HDD could be faster. The SSD can only keep up the advertised write performance for a few seconds. Then the DRAM-cache is full and write performance will drop. Once the SLC-cache gets full the performance drops again.
And sync writes should always be terrible with your SSDs, as these don't got a power-loss protection, so they can't cache sync writes at all.

And that CT4000P3PSSD8 uses QLC NAND so terrible write performance, especially with ZFS. So your MX500 might actually be the SSD with the better write performance as it is at least using TLC NAND.

  1. FSYNCS/SECOND: 39.77
Thats 40 sync write IOPS...every HDD would beat that. There is something really wrong and even those slow SSDs should perform better.

Cache hit ratio: 69.2 % 154.6M
I would assign more RAM to the ARC. A 69% hit rate isn't great. I usually add more RAM to the ARC until I'm in the 85-95% hit range.
 
Last edited:
Thanks for the insights.
When I bought the NVMe back then, I was aware about the QLC but didn't rate it so bad as I read it has 1TB TLC as write cache, however I can't find where I read this.

I've been using the exact same setup with an Ubuntu mdraid on ext4 before and it was running flawlessly, even under huge load, that's why I'm wondering.

I'd be adding more memory and assign more to ZFS, however 64GB is the maximum supported on an NUC11.

I'll try to rule out the NVMe by evicting it from the mirror temporary to see if this might actually be the one causing the issues.
However I won't make it before tommorow.
 
I'm not sure I've read a sucess story running zfs on consumer flash drive !
performance is bad and wearout will be fast.
 
I'm not sure I've read a sucess story running zfs on consumer flash drive !
performance is bad and wearout will be fast.
I've been running zfs on a supermicro box before with a mirror on MX500 SSDs (smaller ones tho, I believe 1TB) and it didn't suffer those issues.
I scaled down to the NUC as it has a way newer CPU which performs a way better and consumes 1/4 of the power which the supermicro based server did (it was a V2 Xeon, so pretty old already).
I would have used two MX500 on the NUC too, but I'm somewhat limited by the hardware, it just has one 2.5" Slot and one 2280 NVMe slot, therefor I went this way.

As mentioned above, I'll evict one of the disks today from the mirror to see if I can maybe outline the issue to one of them.

Edit: Would have probably went the EXT4 / mdraid route, but it's not supported by Proxmox and I'd lose a few zfs features that are nice to have such as bitrot protection.
 
Last edited:
Edit: Would have probably went the EXT4 / mdraid route, but it's not supported by Proxmox and I'd lose a few zfs features that are nice to have such as bitrot protection.
Its possible to install Debian with mdraid and then the proxmox-ve package on top. Used that some years ago when it wasn't possible to boot from an encrypted ZFS pool yet. But yes, you would miss a lot of features...but of cause all these features and additional data integrity come at a cost...
 
Last edited:
  • Like
Reactions: _gabriel
As mentioned I've evicted the NVMe drive from the mirror on Thursday and since then didn't have a single occurence of iodelay > 3%, FSYNCs are at 760.55 so I assume that looks good also.
I intentionally didn't do any changes to the ZFS Config meanwhile (so cache hitrate is more or less the same). I might just order another NVMe with pure TLC to test and rule out the one I had in previously as everything seems to run good with the sole MX500.
 
As mentioned I've evicted the NVMe drive from the mirror on Thursday and since then didn't have a single occurence of iodelay > 3%, FSYNCs are at 760.55 so I assume that looks good also.
I intentionally didn't do any changes to the ZFS Config meanwhile (so cache hitrate is more or less the same). I might just order another NVMe with pure TLC to test and rule out the one I had in previously as everything seems to run good with the sole MX500.
you can buy some small datacenter ssd for zfs slog device (100-200gb), fsync are around 10000-20000 .
 
I promised to come back and here we are: Since I evicted the P3 NVMe from the RAID and added another MX500 instead, I face zero issues and less than 1% iodelay at max.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!