ZFS config recommendation

Richard Goode

New Member
May 29, 2019
15
2
3
51
Hi all,

I'm fairly new to ZFS and I'm after a bit of advice on the best config for my setup.

I have some old-ish hardware, but it was right for my budget and I have plenty of spares to see me into the medium term future :-

* Dell R510
* 12 magnetic drives - 10 x 15k SAS 600G, 2 x 7.2k 2TB
* 16 cores
* 64GB RAM

I'm converting from ESXi already on these servers, and I installed my first server using my existing hardware RAID config (/dev/sda is RAID6 on the 10x15k drives, /dev/sdb is RAID 1 on the 2x2TB drives). PVE 5.4-3 is installed with root as ext4 and swap on the same LVM, and the remainder of /dev/sda is ZFS for VM images. /dev/sdb is entirely ZFS and I use this for backups/ISO's. ZFS has compression set to lz4. It's working great and my overall read/write performance is much better than on ESX, but I do get extremely slow performance when I'm doing intensive IO on the host (such as qemu-img work) and guests grind to a virtual halt. It recovers when disk IO has calmed and it's not a huge issue as this operation is rare.

I'm about to convert my second server and thinking to do it different now I've learned lessons and read a bit more. From what I've read, I believe I should disable the PERC and give the drives entirely to ZFS. If I were to replicate the same setup, I would RAIDZ2 the 10x15k's and RAIDZ1 the 2x2TB's. I've also just read that I should limit the zfs_arc_max to 50% of my RAM and tune down the swappiness to 10. But I'm unsure if this drive config is best practice and whether this would help mitigate the IO choke I describe above? I'm on a limited budget so can't fork out for any SSD at this point (maybe in future).

As ZFS is new for me, and based on my previous disaster stories with software RAID, I'm nervous about disabling the PERC, but I'm open to changing my prejudice on that.

Any suggestions please?

Thank you,
Rich
 
For PERC, delete any existing RAID volumes. This will make all the drives "Unassigned" and the PERC should pass through drives.

I'm also using 15K RPM SAS drives as well.
 
You may need to put the RAID controller into a non-raid mode where it will pass the disks straight to the OS. HBA is what I've seen this mode called on Dell servers. Whatever you do, you don't want to make a RAID 0 for each disk. That will prevent ZFS from seeing the hard drive directly. Proxmox will want to talk past the RAID controller straight to the disks. You can monitor the health of the disks with S.M.A.R.T quite easily. You can live without the SSD for now, but I highly recommend getting one for ZIL and L2ARC! I saw huge performance gains on my RAID10 with 4x 1TB 7200RPM disks adding 8GB of ZIL and the remainder of the 256GB SSD to L2ARC
 
Hi,

I believe I should disable the PERC and give the drives entirely to ZFS

This is a must to have(hba mode)

I would RAIDZ2 the 10x15k's a

From point of view of Iops spikes that you see on first server, you must understand that any raidzX, have the same performance as a single disk. Only from this perspective you can make something like raid10. Could be 2 mirrors (= raidz1 with 5 hdd) in a raid0. The iops will be double. But you can lose only one disk /mirror(safety and performance at the same time is not possible in most cases as you know)

Another important thing is to take in account the block size used in your pool. Search about my forum user and you will find many of my posts about this subject. You can try to use bigger block size like 32,64 k and you will improve your iops depending of your VM load. You can create different datasets on your zfs pool, with different block size. Then use the best dataset/block size for each VM .

Also take your time and make many tests, and try to read some documentation about zfs before you will go forward on the zfs road. I see many happy zfs users on this forum but also other users who are not so happy ( some of them because they do not have try to inform themself about zfs)

Good luck and enjoy with zfs and proxmox.
 
Last edited:
Since you're volumes are already defined with a RAID6 it means your server has an H700, which is not capable of passing through disks. To use zfs effectively you'll want to replace it with an HBA (an LSI card in IT mode)

Your best performance option, especially with spinning disks, is a 10 disk striped mirror (5 sets of 2 disk vdevs) for exactly the reasons @guletz mentioned. I know it sounds like a waste of space but slow storage that you're not using is definitely less preferable then fast storage that you actually utilize. As for the the large drives, those should be used for non VM disk storage, eg archives, isos, etc.
 
Hi all,

I'm fairly new to ZFS and I'm after a bit of advice on the best config for my setup.

I have some old-ish hardware, but it was right for my budget and I have plenty of spares to see me into the medium term future :-

* Dell R510
* 12 magnetic drives - 10 x 15k SAS 600G, 2 x 7.2k 2TB
* 16 cores
* 64GB RAM

I'm converting from ESXi already on these servers, and I installed my first server using my existing hardware RAID config (/dev/sda is RAID6 on the 10x15k drives, /dev/sdb is RAID 1 on the 2x2TB drives). PVE 5.4-3 is installed with root as ext4 and swap on the same LVM, and the remainder of /dev/sda is ZFS for VM images. /dev/sdb is entirely ZFS and I use this for backups/ISO's. ZFS has compression set to lz4. It's working great and my overall read/write performance is much better than on ESX, but I do get extremely slow performance when I'm doing intensive IO on the host (such as qemu-img work) and guests grind to a virtual halt. It recovers when disk IO has calmed and it's not a huge issue as this operation is rare.

I'm about to convert my second server and thinking to do it different now I've learned lessons and read a bit more. From what I've read, I believe I should disable the PERC and give the drives entirely to ZFS. If I were to replicate the same setup, I would RAIDZ2 the 10x15k's and RAIDZ1 the 2x2TB's. I've also just read that I should limit the zfs_arc_max to 50% of my RAM and tune down the swappiness to 10. But I'm unsure if this drive config is best practice and whether this would help mitigate the IO choke I describe above? I'm on a limited budget so can't fork out for any SSD at this point (maybe in future).

As ZFS is new for me, and based on my previous disaster stories with software RAID, I'm nervous about disabling the PERC, but I'm open to changing my prejudice on that.

Any suggestions please?

Thank you,
Rich


From the brief install manual, here: https://pve.proxmox.com/wiki/ZFS_on_Linux

Do not use ZFS on top of hardware controller which has its own cache management. ZFS needs to directly communicate with disks. An HBA adapter is the way to go, or something like LSI controller flashed in “IT” mode.
If you are experimenting with an installation of Proxmox VE inside a VM (Nested Virtualization), don’t use virtio for disks of that VM, since they are not supported by ZFS. Use IDE or SCSI instead (works also with virtio SCSI controller type).
 
Since you're volumes are already defined with a RAID6 it means your server has an H700, which is not capable of passing through disks. To use zfs effectively you'll want to replace it with an HBA (an LSI card in IT mode)

Yes that's correct, it's a H700. Ok, interesting that it doesn't have a passthrough capability and thanks for pointing that out. So it would seem no matter what I will still need to create RAID arrays even if they're RAID0.

Whatever you do, you don't want to make a RAID 0 for each disk. That will prevent ZFS from seeing the hard drive directly. Proxmox will want to talk past the RAID controller straight to the disks. You can monitor the health of the disks with S.M.A.R.T quite easily.

I do currently monitor the SMART status of my drives (graph temp, monitor health etc) on server #1, even though the drives are in a H700 RAID6 array. So it does seem that I can still access SMART of each disk "through" a RAID array (using smartctl and smartd for example), but maybe ZFS has specific problems with this. Given the above lack of passthrough functionality, I may not have a choice either way.

You can live without the SSD for now, but I highly recommend getting one for ZIL and L2ARC! I saw huge performance gains on my RAID10 with 4x 1TB 7200RPM disks adding 8GB of ZIL and the remainder of the 256GB SSD to L2ARC

Even with the 12 drive bays (filled with spinning drives at the moment), I do have 2 internal bays which I could fit SSD in the future. So will consider it at some point.

From point of view of Iops spikes that you see on first server, you must understand that any raidzX, have the same performance as a single disk. Only from this perspective you can make something like raid10. Could be 2 mirrors (= raidz1 with 5 hdd) in a raid0. The iops will be double. But you can lose only one disk /mirror(safety and performance at the same time is not possible in most cases as you know)

Ok a tricky trade off. I'm actually not too bothered about optimal performance as most VM's are low disk use (firewalls, routers, DNS server, NTP, RADIUS, etc). However I would like to avoid the choking that happens when I do a bulk copy or something like "qemu-img convert" on the host. If it wasn't for this scenario that causes the VM's to grind to a halt, I would be happy as-is (I get around 130MB/s read rate).

As these servers are in a datacentre and sometimes I am very remote (depending on my work), reliability is of key importance for me. I would definitely like to have 2 redundant drives. I have way more space than I need - 4.5TB on the RAID6 and I use about 15% for my images.

I'm also wondering if ZFS is right for my hardware given the limitations. My main reason for ZFS was to be able to do VM replication between hosts - is this feature available using other filesystems? I didn't want to go Ceph (too complex for my requirement - trying to keep it simple).

Another important thing is to take in account the block size used in your pool. Search about my forum user and you will find many of my posts about this subject. You can try to use bigger block size like 32,64 k and you will improve your iops depending of your VM load. You can create different datasets on your zfs pool, with different block size. Then use the best dataset/block size for each VM .

Ok, willdo, thanks.

Good luck and enjoy with zfs and proxmox.

Thanks, and I already am. PVE is a huge enlightenment compared with free ESXi - backups, migration, replication, HA, VLAN aware bridges, MAC learning bridges, LACP, and a full linux OS on the hypervisor are all major plus points for me.

Thanks all for your advice so far. Much appreciated.

Rich
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!