ZFS worth it? Tuning tips?

delebru

Member
Jul 10, 2018
25
2
23
36
First of all thanks for having this amazing community and making Proxmox possible! :) I've been using Proxmox for a couple of months on a small personal server and it's been a great solid experience. All the information available here and the wiki has been invaluable!!

Now I am tasked to set up a local server for our office which I plan to run Proxmox with a couple of linux containers for services such as OpenVPN, Nextcloud and GitLab. We'll also need a Windows VM to set up an automated build system with Unity 3D and Team City.

We have a Dell R720xd with 2x Xeon E5-2667 v2, 128GB RAM, 8x 2TB SAS 3.5" drives, a 512GB Intel 545s SSD and 256GB Samsung 860 Pro. The server has a PERC H310 which doesn't support IT mode but does allow for the drives to be set in "non-raid" mode. I know this is still not ideal to use with ZFS but I gave it a try anyways since I'm still waiting on the raid controller and cables which will allow me to use the drives in IT mode.

I guess it's worth mentioning I'm not really worried about power interruptions and that's why I went with "consumer" grade SSDs. We have a fairly solid UPS setup which should allow us to gracefully shut down the system in case of a power cut.

On my first install I created a zpool of 4 mirrored vdevs (RAID10 alike), added the Intel 512GB SSD as L2ARC and set the ARC to a max of 64GB. Vdevs were created with ashift=9 because the drives have a sector size of 512b, and I also specified ashift=13 when adding the L2ARC. Compression was set to LZ4. (I'm planning to dedicate a 64GB partition on the 860Pro to use as ZIL, but haven't done that yet).

I also read the best practices for win vms and set the drive and network to virtio with the appropriate drivers. CPU is on host with NUMA enabled and the appropriate flags ticked.


Now to the please help me section... The Windows VM seemed to have occasional freezes until I decided to try and remove the L2ARC from the pool. We're still not running any service on the host but the system still felt pretty slow so I tried different settings I've found on various tuning guides which were supposed to help when recent hard drives/ssds are available. Unfortunately there didn't seem to be any difference in performance so I tried reinstalling Proxmox and the VMs directly on the SSD on an ext4 filesystem. I know it seems I'm comparing hard drives to SSDs but the performance difference was massive, way more than hdd vs ssd should be.

ZFS seems to be this great thing that lots of people pray to but there isn't much consistency with the performance recommendations I found with hours of googling... Is anyone here using ZFS with a positive experience? Should ZFS default settings work out of the box or would anyone recommend tuning some parameters?

Maybe I'm misjudging my ZFS experience because I couldn't set the raid controller to IT mode yet? Or should I just quit with ZFS and go for a hardware raid with raid10 on the hdds and raid1 on the ssds?


On another note, I found that a Windows 10 VM seemed to respond much quicker than Windows 7 which was a bit unexpected... Any experiences here?


This post turned out to be way longer than what I expected but I hope the background info helps understand my existential questions. I must say that I'm not expecting a definitive answer to any of these questions but any little experience, thought or recommendation may go a great way for me, thanks in advance!!
 
Hi,

as long you use a Raid controller you will never see good performance with ZFS.
It does not matter if you use JBOD mode or Raid mode of the HW Raid.
The problem is an HW Raid controller has its own cache management to improve the performance.
But this Raid HW cache will reduce the speed of ZFS massive and often ends in stuck io requests.

Use ZFS only if you have an HBA or an onboard controller.
 
  • Like
Reactions: delebru
Thanks for the replies!
often ends in stuck io requests
This would explain why I was having freezes and such terrible performance during my tests. I already ordered an HBA but everything takes weeks to arrive here to New Zealand... I'll redo the installation when it arrives and give ZFS a real chance :) Should I look into adjusting any zfs parameters or are the defaults a good starting point?

L2ARC index will be taken from ARC. only add L2ARC when you really need it not before.
Sorry, what do you mean with "L2ARC index will be taken from ARC"?
By adding a 512GB L2ARC I was hopping the system would cache more data, am I wrong? How would you determine when L2ARC is needed?
 
if the read req won't get cached in ARC it also won't get into L2ARC
if your ARC hit ratio is low in general L2ARC is useless at all
the index is a map of what is in the L2ARC and that is stored in the ARC itself for performance reasons. the bigger the L2ARC the bigger the index

How would you determine when L2ARC is needed?
on freebsd I've the mru,mfu ghost stats which will tell me when my ARC is too small

just start without L2ARC...you always can add it later if it is necessary

https://www.freebsd.org/doc/en/books/faq/all-about-zfs.html#idp59536328
 
  • Like
Reactions: delebru
Oh I see, so ARC space will be "wasted" to store L2ARC indexes when the original ARC size may be enough for my caching needs. Good tip, thanks!

Do you use zfs on freebsd only? Not on Proxmox?
 
I'm using ZFS with Hardware Raid. It performs really good.

I don't know why and how it was spread the idea that ZFS can't be used with Hardware Raid controller. It is such as a Myth.

This article explains why you can, for sure, use ZFS with Hardware Raid: https://mangolassi.it/topic/12047/zfs-is-perfectly-safe-on-hardware-raid

In a nutshell, ZFS is perfectly safe with Hardware Raid.

In Dell Raid controllers, even if you don't have HBA, you can use "writethrought" mode and "no read ahead", this way you're avoiding the Hardware Raid cache.
 
  • Like
Reactions: GuiltyNL
Vdevs were created with ashift=9 because the drives have a sector size of 512b

This is not OK. Even if your HDDs have sector size of 512b, is far better to use ashift=12(like for 4k disks), because at one moment in the future you will need to replace at least one HDD(you know that any HDD have the bad habbit to be broken ...). At that moment it is highly possible to get a new disk with 4k sector-size. In such a case your pool wil face a performance degradation(on each sector of 512b you will need to write on 4k HDD).
 
In a nutshell, ZFS is perfectly safe with Hardware Raid.


OK, let see ....! Now zfs will make a some write operations and flush all the data to the disks/raid controller. When this operation is finish(raid will say, OK the data is saved on disks), zfs will also consider that all this data are on the disks(you know it is a transaction). But the controller is not save the data on disks(data it is in the cache controller). In the next moment the power is down and after the power is recover, the data from the cache controller is LOST(controller without BBU). This is safe? Judge yourself.

Let see again .... controller with BBU, and the controller is burn/destroyed. The data on disk is safe?
 
Is anyone here using ZFS with a positive experience

I am one of them. And I am using zfs for at least 5 years. As anyone who start to use a new tool, I have many bad and good experiences. As time was past, the bad experiences was start to diminished. But I read many unnumbered documentation about zfs, and I tested many of them. The most difficult part was to understood many things that are related to zfs. As you start to understood more about zfs, your zfs setup/performance will be better. And I only regret that I have not spent more time for zfs.
Now all my systems/servers use zfs.
 
Let see again .... controller with BBU, and the controller is burn/destroyed. The data on disk is safe?

Yes, it is safe as long as you're using the controller batteries.

If the controller suffer a fault, just replace it, using the current batteries.
 
... if was not burn at the same time with the controller. And to replace a raid controller(buying a new one) it can take many days!

Parts available for replacement has nothing to do with what is in discussion here. The point is, Hardware Raid is Safe or not ? Obviously if you care with your service avaiability, you should always have parts for replacement, regardless using Hardware Raid or not.

When you said "if battery and controller burn at the same time", it is the same as if I said "if your 2 disks or more burn at the same time, on your Non HW Raid setup".
 
Last edited:
Obviously if you care with your service avaiability, you should always have parts for replacement, regardless using Hardware Raid or not.
Without HW-Raid I can put all the HDDs into another server without RAID and I go online in 15 min.

it is the same as if I said "if your 2 disks or more burn at the same time, on your Non Raid setup"

I do not remember when I was see the last HDD that was burn. But I can rember when I see a burn controller ;)
 
Without HW-Raid I can put all the HDDs into another server without RAID and I go online in 15 min.



I do not remember when I was see the last HDD that was burn. But I can rember when I see a burn controller ;)

I understand your arguments.

But you're just considering by itself the experience that you had in the past.

There are no statistics telling us that Raid Controllers are more likely to burn, than Hard Disks.
 
  • Like
Reactions: guletz
In my own opinion using my own past experience:

- zfs with hw raid without bbu is risky (and I tell why before)
- zfs with hw raid without bbu, with slog and a big ups have a low risk
- zfs with hw raid with bbu and a good ups and a slog drive is ok

Of course many people can have different experiences(good and bad) with all of this situations. But in the end only the time will validate if ours own choices was good or bad.
 
  • Like
Reactions: masterdaweb
Now for the initiator of this post... I forget about him ;) some tricks or zfs woodoo spells; )

- start with defaults zfs and try to see if most of the time is ok or not (use a monitoring systems for this like librenms) taking in account not only zfs storage (cpu, ram, load and so on can have impact on zfs)
- focus your attention on VM and less on CT (VM with zfs are mostly impact on performance )
- for VM use the largest volblock size that fit your use case (make tests with values larger than defaults=4k PMX, like 16-64 )
- if most of the time you use large files, then a bigger volblocksize is better
- use no cache for VM, and primarycache=metadata for zvol
- using in a VM a DB is tricky, because you need to adjust again your volblocksize as the DB owner recommend (16 k for mysql as example)
- do not try to emulate your real load with various tool ... use a real load as you have.
- test various setups for zfs for many days, and watch your graphics (librenms), and document what you setup, and what you have get with this settings(for any bad/good results) including graphics, logs or anything that could be useful in the future (you will forgot many details after one year, and you do not want to waste your time)
- look at your logs when something is bad in the zfs landscape
- remember that zfs is self-tunning, so many setings will be changing if some triggers will be happening (like if your free space will be under 10 %, or your fragmentation will rise, and so on)
- read many documentation as you can, at the beginning you will not understanding many things, but in time your brain will start to put all strange infos in order and to make unnumbered links ... and in a day your cloudy thoughs will be clear like crystal; )

- if you need different volblocksize(like, you need a big one for the OS, and 16k for let say a DB), then create 2 different vDisks with different volblocksize.


Good luck!
 
Last edited:
  • Like
Reactions: puldi and delebru
In a nutshell, ZFS is perfectly safe with Hardware Raid.

That all depends on your definition. ZFS has very heavy overhead in terms of memory. This is primarily because it is designed to provide stateful fault tolerance by incorporating the parity information both at the LV and file levels. By placing your LV on a RAID volume, you get the worst of both worlds- you have all the overhead without any of the fault tolerance benefits- the RAID provides parity at the device block level, but zfs cannot use this for file level parity.

I'm using ZFS with Hardware Raid. It performs really good.
It may perform well enough for your use case- but compare it with LVM to see how much faster and less RAM intensive it can be.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!