What filesystem for boot/OS/swap (ext4, xfs, ZFS?) for a software mirror?

EdoFede

Member
Nov 10, 2023
44
13
8
Hi!

I'm planning an installation of a Proxmox VE node for some testing.

I want to use two NVMe drives for the boot/OS and I would also like to put swap on them.
(VMs will be placed on other disks using ZFS)

Since placing the swap on a ZFS pools seems to be not a great idea, I'm looking for the best possibile solution.

I tried to search for documentation on this topic on the forum, but I mainly found posts relating to a complete installation of lab nodes on a single disk or a mirror (including VM image storage), but not on an installation with dedicated disks and these consideration on swap partition/file.

There will be 2 drives for the purpose and I have to make a software mirror.
The system must be able to boot even without one of the drives.
If it's worth it, I want to use ZFS for the boot/OS (excluding swap).

So I see these possibilities:
- ext4 (or xfs) and a separate swap partition on top of mdraid SW raid1 (taking the whole drives)
- ext4 (or xfs) and a swap file on top of mdraid SW raid1 (taking the whole drives)
- ZFS mirror on top of a partition on both drives (instead of the whole drive) + a swap partition on top of mdraid SW raid1 (built on top of another partition created per-drive on the free space)
- More suggestions… :)

A more detailed explanation of the last possibilities:
/dev/sda and /dev/sda: NVMe drives on which I'll create two partitions
- /dev/sda1 + /dev/sdb1 > ZFS mirror with boot/OS
- /dev/sda2 + /dev/sdb2 > mdadm software raid 1 > single mirrored swap partition

For all choices I have to go for a debian base installation + PVE, since none of the possibilities are supported by the default installer.

What about GRUB for booting on these setups? (assuming a failed drive)

Thanks in advance!
Edoardo
 
Depending on your used boot method (UEFI or legacy), you also need a partition for that in the beginning. Having the OS on NVMe is a total waste in my book, PVE root is not that I/O intensive. Please be aware of the problems with non-enterprise SSDs and ZFS (just look in the forums).

In the end, it boils down to your favorite, technically they are more or less equivalent. You need a custom Debian install anyway if you invoke mdadm as you already pointed out. Having everything (except swap) on ZFS has its own features.

I used of all them over the years and have these pointers:
  • use proxmox-boot-tool for UEFI of grub high availablility for both disks
  • UEFI should be 1G to have more wiggle room for a lot of kernels, that are not cleaned up correctly
  • 4G for the root is not enough anymore, it was in the past, yet is not anymore
  • Have ZFS refreservation an quota on your root zfs dataset
  • Don't run seperate ZFS boot pool unless you really need it. Better to incorporate everything in one pool
 
  • Like
Reactions: EdoFede
Thanks for your thoughts.

The choice of NVMe disks is made exclusively for reasons of slot availability in the chosen servers. The "front" slots are dedicated to the data pool and we'll add two internal PCIe > NVMe adapters to add internal drives.

I'm aware of ZFS problems on non-enterprise SSDs and I'm planning accordingly.

Can I kindly ask you to explain better what you mean in the last sentence?

  • Don't run seperate ZFS boot pool unless you really need it. Better to incorporate everything in one pool

Thanks!
 
Can I kindly ask you to explain better what you mean in the last sentence?
Managing two pools is harder than one. The only upside I see is that you can reinstall PVE easier if the boot and non-boot are divided. I'm a huge fan of running multi-tiered ZFS pools with SLOG and special metadata device. All is not doable via the default installer, yet you've already set for manual install anyway. I have my SLOG on the same mirrored NVMe Optane as my swap so that both are very fast. On my metadata pool, I have the OS (because it is small) and the rest is on disk / slower SSDs. You can do that too with two optane NVMe and your "normal" NVMe as metadata, the data disks are the backend then.
 
I would use 2 pools anyway, because the storage for the VM disks resides on separate disks (plus SLOG and L2ARC).


Since the "custom" Debian installation procedure with ZFS + proxmox is a very long procedure (even though I tried it all and it worked correctly), I made another attempt with the Proxmox VE installer and found a way to use the latest scenario without going crazy.

Select ZFS mirror on the two disks and limit the size on the advanced options.
The installer then create the partitions for EFI and for Zpool with reduced size.

Later we can create two swap partition or a single swap partition on top of an mdadm mirror (but I read that this could not be the right choice: https://forum.proxmox.com/threads/n...-raid1-how-to-create-swap.103157/#post-478011).

More simple than expected :)

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!