Limit of ZFS speed on NVMe and SSD drives

Madhusudan

New Member
Jul 25, 2020
2
0
1
39
Hello Team,

We are using DELL R630 Servers for the proxmox deployment and the version is 6.4-4. We have a ZFS pool configured as RAID10 using SSD drives (ULTRA SSD) and also we have one more ZFS pool configured as RAID10 using NVMe drives.

Server configuration: 512 GB RAM and Intel CPU E5-2699 processor

Disk config:

4 * NVMe using PCIe cards configured as RAID-10 using ZFS
8 * Ultra SSD drives configured as RAID-10 using ZFS

When we run the benchmark test on both pools using tools like ATTO and crystal benchmark we are getting almost same IOPS and Read / write speeds irrespective.

So, my question is weather proxmox limits the ZFS speeds in anyway and is there any way we can fine tune this configuration to achieve higher speeds?

Thanks in advance,
Madhu
 
When we run the benchmark test on both pools using tools like ATTO and crystal benchmark we are getting almost same IOPS and Read / write speeds irrespective.
Do you run them inside the VM? Then you do have the virtualization layer in between.

When benchmarking storage, try to go from the very bottom to the top. Adding one layer after the other step by step to see where you lose potential performance.

First, try to benchmark a single disk (if still possible). Then directly ZFS. Be aware that ZFS has two types of datasets that have different performance characteristics. The filesystem datasets (used for containers) and volume dataset which provide a block device (used for VM disks).

Then you can go ahead and benchmark inside a VM and see what you get and how different options affect the result.

See our ZFS benchmark whitepaper to get an idea how to approach this hard topic. Doing benchmarking well and consistently is not easy and can be tedious at times. :)
 
  • Like
Reactions: Dunuin
Hi Aaron,

Thanks for the reply.

We are running ZFS directly on the proxmox host. Proxmox can see the hard disk directly and there is no virtualization layer involved in between.

Could you please guide us through?

Thank you,
Madhu
 
Where did you run ATTO and Crystal Benchmark? I assume inside a Windows VM? This means you did have the following stack:
Code:
disk -> zfs pool (volume dataset) -> qemu/kvm (virtualization) -> Windows -> ATTO / Crystal Benchmark

There are quite a few moving parts in there and caches at various levels which will give you skewed results.
ZFS uses the ARC (primary cache) to serve read requests. This means, if the VM just wrote some data to the disk which is residing on the ZFS pool, ZFS will store that in its cache (ARC) which is stored in RAM. Any reading of that data will be served from RAM and not directly from the disks.

If you take a look at the previously linked ZFS benchmark paper, at the end of page 5, it mentions that for the dataset used for the benchmark, the property "primarycache" has been set to "metadata" so that ZFS will not cache any actual data in RAM.

Depending on how you configure the disk(s) of the VM, you might use some caching as well. Using "IO threading" can also have an impact on the performance inside the VM.

Please take a look at the paper and use fio to do benchmarks directly on the Proxmox VE host before you start benchmarking inside a VM.
Since fio benchmarks are destructive, either create a new VM disk or create volume dataset manually with zfs create -V 10G <pool>/<dataset name> for example. That dataset should be accessible via /dev/zvol/<pool>/<dataset>. And don't forget to set the primarycache property to avoid the ZFS cache.

When you do storage benchmarking, you are usually interested in two characteristics: how many IOPS can the storage / disk handle and how much bandwidth is possible.

The block size parameter (--bs=) parameter of fio will determine in which limit you will run during the benchmark. If you select a small block size like 4K, you will run into the IOPS limit and the resulting bandwidth will be low as for each IO operation, only 4k are read/written. If you use a larger block size like 1M or 4M, you will run into the bandwidth limit and the IOPS will be much lower.

If you run multiple VMs on the same storage, IOPS is the metric that is important. Also keep in mind, that if you run fio with --numjobs=1 --iodepth=1, the results you get are showing you the lower end of what is possible. Usually you do have a larger queue and more than one "client" accessing the storage, and by far not all operations will be written with sync and direct semantics.
 
  • Like
Reactions: Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!