ZFS performance tips!

ant12

New Member
Sep 12, 2024
14
1
1
Hi,

I have 2 proxmox hosts each with local zfs with draid2 with 8 sas disks, 2 sas ssd nvme for zfs cache and other 2 sas ssd nvme for zfs log. The utilization of zfs cache is very low only 458G out of 1.5TB. What can I do ?

Thanks!
 
I have 2 proxmox hosts each with local zfs with draid2 with 8 sas disks, 2 sas ssd nvme for zfs cache and other 2 sas ssd nvme for zfs log. The utilization of zfs cache is very low only 458G out of 1.5TB. What can I do ?
Regarding Proxmox and VMs: use stripes or mirrors instead of (d)RAIDz(1/2/3) if you want better IOPS. This is not Proxmox specific , so do more research about ZFS and learn that L2ARC and SLOG don't usually help for IOPS. Maybe a special device would help?
 
L2ARC is an extension of ARC.
So when you say L2ARC usage is low, that could be a positive thing because data did not evict ARC in the first place.

Why are you not happy with the low usage? Are you unhappy with performance?
 
  • Like
Reactions: Kingneutron
> 2 sas ssd nvme for zfs cache and other 2 sas ssd nvme for zfs log. The utilization of zfs cache is very low only 458G out of 1.5TB

You're allocating wayheyhey the hell too much L2ARC. Detach those and convert them to a mirrored Special device.

https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

Unless you're doing NFS / lots of sync writes, you probably don't even need a slog, either
 
I have 2 proxmox hosts each with local zfs with draid2 with 8 sas disks, 2 sas ssd nvme for zfs cache and other 2 sas ssd nvme for zfs log. The utilization of zfs cache is very low only 458G out of 1.5TB. What can I do ?
How much RAM cache is displayed in arc_summary?
 
Last edited:
I only use for VMs so I must add a special mirror device ?
Really depends on what you want or what use case you have.
In general I would recommend VMs two NVME drives in a mirror.
If you need good sync write performance, you can add a SLOG.

special is more useful for RAIDZ datasets where you want better metadata performance.
 
16GB ? My host has 1TB of RAM!
Your RAM cache is at 100%, zfs recommends 1GB of RAM every 1TB of data, but I found that will bottleneck even with L2ARC or SLOG nvme disks, I personally use 4GB of RAM every 1TB, try increasing the max limit of it

L2 cached evictions: 16.2 GiB
L2 eligible evictions: 12.0 TiB
You could also add a bigger L2ARC disk and you would benefit from a SLOG + ZIL disk
 
I would first set ARC to something reasonable like 64GB and get rid of L2ARC.
I don't know how much RAM his VMs use, I would probably set ARC to at least 128GB or even higher.

But again, it all depends on what the use case is. Even 1TB of ARC does not help with sync write performance.
 
Last edited:
  • Like
Reactions: Kingneutron
If you want to write more to L2ARC devices you need to increase the write rate at which ZFS is able to write to those devices;

echo 67108864 > /sys/module/zfs/parameters/l2arc_write_max

https://openzfs.github.io/openzfs-docs/Performance and Tuning/Module Parameters.html#l2arc-write-max

In /etc/modprobe.d/zfs.conf :
Code:
options zfs zfs_arc_max=42949672960
options zfs zfs_arc_min=4294967296
options zfs zfs_arc_min_prefetch_ms=12000
options zfs zfs_arc_min_prescient_prefetch_ms=10000
options zfs zfs_dirty_data_max_max=17179869184
options zfs zfs_dirty_data_max=8589934592

<- define your settings to be loaded on boot

These are examples I am using. YMMV.
 
  • Like
Reactions: leesteken
echo 67108864 > /sys/module/zfs/parameters/l2arc_write_max

Basically off-topic: I need to actually count the digits to know that it is 67 million. That's why I prefer to run "echo $[..." in my scripts:
Code:
~$ echo  $[ 64 * 1024 * 1024  ]
67108864

That's not possible inside of "zfs.conf", of course...
 
Your L2 cached/eligible evictions rate difference indicates that your I/O rate is greater than what can be written to L2ARC OR the data changes too rapidly to be evicted to the L2ARC.

If the data is not present to be evicted then it is not written to the L2ARC.

Have you any information to share about the footprint of the data on the disks?

But anyways: More vdevs == more write IOPS. If you had 80 disks in 1 vdev and 20 vdevs with 4 disks each then the many vdevs would be much faster to use.

Use as much ARC as possible. Leave some unused RAM. The ARC save reads from the disks anyways. Which equals more idle time to write which would be your bottleneck with one 8 disk vdev.

I'd test different layouts on a test machine. I doubt you are getting the most with the running config.

I'd test a 4x2 drive mirror and a 2x4 drive raidz. Depending on the amount of data then maybe L2ARC and most definitely the SLOG. If you have a high write rate then the mirror layout would be better. The 2x raidz will still outperform a single vdev 8 disk draid. The SLOG will only help so much.

Rgds,..
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!