Proxmox Disks / Storage Setup

the-gloaming

Member
Dec 30, 2021
2
0
6
I'm setting up a Proxmox server (for a homelab) and am not sure how best to set up disks and storage. I am a newbie with servers, storage, Proxmox et al and current plan is a TrueNAS setup; I will play later also around with software/containers as well and see where it takes me.

The disks I have are and my plan for them is as below (no m.2 connectors):
  • 2x 800GB Intel DC SSDs: in a ZFS mirror to install and run the Proxmox OS, as well for VMs/containers and ISO and related storage. I also intend to install TrueNAS here.
  • 3x 4TB NAS HDDs: for TrueNAS storage
  • I also have 2x 4TB regular HDDs and 1x Samsung 850 128GB SSD (but no plans for using these yet; I can connect 7 discs to my setup in total)
My thoughts / questions / concerns, and where I am looking for help
  1. Does the above look like a proper approach for discs and storage? A key concern I have is if it is okay to use the SSDs for Proxmox OS, VMs, Containers, TrueNAS etc. If so, do I need to do any specific setup/partioning in advance?
  2. Or should I use a smaller SSD for the Proxmox OS (the Samsung 850 128GB SSD?) and use the Intel DC SSDs (in ZFS mirror) for VMs/Containers/TrueNAS/etc.? If I do, the Proxmos OS will not be on ZFS or mirrored disks (don't think that matters much for my setup).
 
TrueNAS is going to be a guest I take it?

I am a beginner looking to set up something very similar. I'll share my learnings because they seem relevant. I went for RAIDZ1 on entire SSDs, but regretting it now because:
1. I don't have a thin pool on which to store thinly provisioned vms. I could have my VMs on the HDDs but I want that an ext4 fs on RAID1 there so I can store files, backups etc.
2. Apparently RAIDZ1 is not mirrored, so in the case of a disk failure the rebuilding will involve much heavier IO. This may be not a big deal given that I only have two disks, but it's not clear to me. This article[1] discusses mirrored vdevs vs RAIDZ and argues for the first for safer and faster recovery and extensibility, with some sacrifice in storage efficiency.

So now I'm thinking of rebuilding with mirror vdevs and a thin pool. I take it this requires that the root fs is partitioned separately so I will set aside maybe 80GB so I don't ever have to worry about running out of space. Still leaves 720GB (in your case) for thin pool.

[1] https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
 
@the-gloaming for your 1x Samsung 850 128GB SSD consider a Special VDEV aka Fusion Drive.

special vdev
https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

more special vdev. this is gold.
https://klarasystems.com/articles/openzfs-understanding-zfs-vdev-types/

--------------------------------------------------------

@charfix It looks like you are digging into zfs. This is the path forward if you want to do Proxmox. Good.

For many folks who don't want to deal with the intricacies of zfs, a RAIDZ 1 or 2 is the choice because of easier maintenance. You can type up some fairly simple commands for your staff and they can replace disks for you.

For someone used to working with ZFS, I've come to agree with the array of mirrors philosophy. It's how you achieve speed. If you have only one vdev, you get less than the speed of one hard drive. An array of mirrors is faster, but you need to understand the zpool architecture in order to replace a disk.
 
Thank you @tcabernoch. I may do a CACHE vdev if I can spare the SATA slot, or perhaps if I can find an SSD for a PCIe slot.
In the meantime I've run across an interesting discussion [1] of `volblocksize`, `recordsize`, VM fs block size, and the way this all interacts with raid. It seems the 16KB default `volblocksize` on PVE 8 is not ideal because "...especially in wide vdevs, because each block gets split across drives which results in wasted space...". Some of these choices require rebuilding the pool from scratch. More importantly I am learning about how to diagnose whether this is an issue for me at all.

Separately: While my main data storage is ext4 on raid1 in parallel I will experiment with ZFS on LUKS where individual disks are LUKS encrypted and then the corresponding unencrypted block devices form a mirror vdev.
What I like about this approach is that I seem to get the best of all words: LUKS which is well-tested, and ZFS without encryption. I will stay away from ZFS native zvol encryption because I see some open issues in the send/receive process [2].

[1] https://allthingsopen.org/articles/noisy-zfs-disks
[2] https://news.ycombinator.com/item?id=32340433
 
If the OP loses that drive, all the metadata is gone, then entire pool is gone. Perhaps you meant cache, i.e. L2ARC?

You are correct, of course. ... but ... The first thing OP said was that it was homelab. Ya run what ya brung and make do with scraps.

And L2ARC only helps out on specific, not very common workloads. Mostly serial data reads, right? I dunno, I read the research and testing, and decided to never do L2ARC. ... on the other hand ... I have personally and extensively tested the impact of Special VDEV aka Fusion Drive, and I can report that it rocks.
 
You are correct, of course. ... but ... The first thing OP said was that it was homelab. Ya run what ya brung and make do with scraps.

I just wanted to put it on the record, not sure if OP would know this (important) part, also if someone else finds the thread in the future. The OP never really followed up and the second person was talking about intending a mirrored setup. In case it does not matter for home use (of course keeping backups), I would then also suggest just doing a stripe, i.e. there's no point using single point of failure vdev in a mirrored pool, at least I can't think of any.

And L2ARC only helps out on specific, not very common workloads. Mostly serial data reads, right?

I actually thought it helps with random reads a lot. I used it this in setups with dead data (not running VMs) and while I never benchmarked it, the cache filled up pretty early. I remember in some cases the drives could spin down and content was served from cache just fine. It was at a time where special vdevs, DRAIDs, etc. were not even a thing. :)

I dunno, I read the research and testing, and decided to never do L2ARC. ... on the other hand ... I have personally and extensively tested the impact of Special VDEV aka Fusion Drive, and I can report that it rocks.

I have not really benchmarked it, I can believe storing metadata on SSD helps a lot, but in my book that should always be a mirror, as should e.g. SLOG.
 
...the second person was talking about intending a mirrored setup. In case it does not matter for home use (of course keeping backups), I would then also suggest just doing a stripe, i.e. there's no point using single point of failure vdev in a mirrored pool, at least I can't think of any.
Could you restate that please, I don't follow. My vdev is a mirror, so it does not have a single point of failure. What does "single point of failure vdev in a mirrored pool" mean?
 
Could you restate that please, I don't follow. My vdev is a mirror, so it does not have a single point of failure. What does "single point of failure vdev in a mirrored pool" mean?

I was only referring to the special vdev - i.e. if one has special vdev that is not mirrored, it becomes single point of failure. Special vdevs carry metadata. If you loose it, your pool is lost as well in such case, it does not help the regular vdevs were redundant. That's about it.