ZFS on RAID 10 with 4 x PCIe5 NVMe Drives: Performance Insights and Questions

fmoreira86

Renowned Member
Aug 7, 2013
47
1
73
Hello everyone.

I recently acquired 2 HPE DL380 GEN11 servers. Each node has 512GB of RAM and 4 PCIe5 NVME drives of 3.2TB (HPE, but manufactured by KIOXIA). Given my use case, where the goal was not to have shared storage but rather a cluster with VM replication, I created a logical volume for VMs with the NVME drives (directly attached to the CPU, of course). My boot logical volume consists of two Samsung SSDs... but that's not relevant for this discussion.

In some tests performed within Windows VMs (deployed using the recommendations from the Proxmox wiki), I’m getting a maximum of around 6500MB/s.

Don’t get me wrong—I am very satisfied with the overall performance of the system, but I feel that, given the hardware I have, I'm facing a significant penalty due to ZFS. Even though I know ZFS is not a filesystem focused on performance, I think we all are always looking for something more ;) My ZFS configuration is default, with ashift 12 and compression enabled. I have atime disabled.

For your reference, HPE states that these disks are capable of:

MAX Seq Reads / Max Seq Writes Throughput (MiB/s): 13,852 / 6,802 MB/s

Read IOPS
Random Read IOPS (4KiB, Q=16) = 215,995
Max Random Read IOPS (4KiB) = 1,159,100 @ Q128
Write IOPS
Random Write IOPS (4KiB, Q=16) = 599,050
Max Random Write IOPS (4KiB) = 649,489 @ Q16, 13,852 / 6,802 MB/s

Any ideas? :)

1742074418212.png
1742076569015.png

1742074446669.png


1742074491758.png


EDIT:

Tested with:

1742077168239.png

1742077095610.png
1742077119810.png
 
Last edited:
ZFS indeed can't be comparable to regular file system.
Compare with Lvmthin to show the ZFS penalty.

Virtualization has penalty too.
Q1 single thread is the worst case for virtualization.

IOPS is missing from your CDM screenshot.
 
Last edited:
  • Like
Reactions: Johannes S
ZFS indeed can't be comparable to regular file system.
Compare with Lvmthin to show the ZFS penalty.

Virtualization has penalty too.
Q1 single thread is the worst case for virtualization.

IOPS is missing from your CDM screenshot.
Just added IOPS screenshot ;)

Also added testing with:

1742077183964.png
 
Last edited:
Hello,
I'm digging up the topic a bit but I came across it on my first search on the same topic as you, so maybe it can help. I've been experimenting with different options for the past week to optimize a server for mixed VMs and also for MSSQL.

I managed to achieve good performance on my MSSQL VMs after 1 day of optimization with ChatGPT which, with each new test suggested improvements based on each VM (10 minutes per test including throughput tests, mixed IOPS, and IOPS for SQL), to truly optimize zpool and zvol.
Knowing that the goal of the optimization was to optimize while maintaining the same security regarding file integrity.

Perhaps you could see what ChatGPT suggests after each test to optimize your configuration... This saves time interpreting the results and allows you to refocus your tests.

On the other hand, if you have the opportunity, you could try the latest Kernel 6.17 (try your benchmark before and after)

Before optimization, in my case, RAID10 Gen5, EPYC 9475F... I had a considerable increase on BTRFS, and after testing, the same on ZFS on 4k and 8k and IOPS in general, just with the kernel upgrade. I can't tell you the reason... It's also impossible to tell you if it's linked to something that particularly affects my configuration.

The Passmark above was in BTRFS, there in ZFS I exceed 50k (just for an idea because I focus on FIO tests oriented according to the desired workload) but I have another problem not related to file system or raid
I noticed on my platform a drop in performance and an abnormal bottleneck as soon as I request the 4 NVME. But this has nothing to do with Proxmox because I reproduce on any platform and tickets are in progress with Supermicro (for MB), AMD (proc) and Kingston (NVME) to try to find the origin. This bottleneck does not allow me for the moment to have 100% performance on my RAID10.
 
Last edited:
I ran benchmark tests on the virtual disks stored on the ZFS Mirror storage.

Previously, when benchmarking the same disk with 2699 v4, the best 4k q1t1 speed was around 24MB.

I suspect this relates to CPU and memory, but I don't know the exact reasoning.

Benchmark results for 8GB with writeback and none

This speed is achieved on a disk that is significantly slower than an NVMe SSD.
*However, the write performance is terrible.

【CPU】Intel Core Ultra 7 265K
【MEM】 Crucial CP2K48G56C46U5 x4
【MB】Asrock Z890 Pro RS WiFi White
【PCIE 4.0 x4】 Broadcom HBA9500-16i
【SAS】HUSMM8080ASS200 (MO0800JDVEV) x2

*The reason read is fast with none is because the read data exists in ARC.


https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache

Do not use the Virtio Balloon Driver​

The Balloon driver has been a source of performance problems on Windows, you should avoid it. (see
 

Attachments

  • IMG_0081.png
    IMG_0081.png
    369.1 KB · Views: 18
  • IMG_0079.png
    IMG_0079.png
    349.5 KB · Views: 20
Last edited: