Help with designing a home cluster on consumer hardware

crembz

Member
May 8, 2023
43
6
8
Hi Everyone

I'm new to pve and have been fiddling around for a few weeks. I come from an ESXi background and I suppose there is a lot more to consider in designing a cluster given the options around local/network and distributed storage. My main reason for moving to pve is I'm interested in running zfs mirrors to get disk redundancy in case of failures without needing to get into complex backup/recovery/ha. Remembering this is a home environment not a business product environment.

My cluster consists of 8 heterogeneous boxes, all consumer grade.

Box 1 is the main box - will run virtualised truenas using a sas card and pci passthrough as well as a media server
5900x w/64gb ECC
10 spinning disks (pass through data pool xfs)
2 nvme (pve install zfs mirror)
4 sata ssd (pass through sratch pool zfs)
1gbE nic
Wifi nic for management failover

The remaining 7 servers each contain intel CPUs of various generations and 2 nvme/ssd drives, 1TB of storage space each and 12gb nics. These will run a bunch of lab environments, nested ESXi/hyperv/openstack/nutanix and some docker/k8s again for learning/testing/demo

A few questions I have if someone could please help me understand:

  • I keep reading never to run consumer grade ssds with ZFS due to performance and poor lifespans. Taking performance off the table, should I be avoiding the use of ZFS mirrors for the pve local disks?
    • What about the scratch pool which is used as a
      • transcoding/downloading cache
      • VM disk share
  • If sharing ZFS truenas shares over NFS for VM disks, should those disks use raw or qcow2? I have read that qcow on ZFS is a bad idea due to COW on COW, does that change if NFS is in the mix? Using raw on NFS regardless of the backend seems to constrain the use of snapshots.
  • Can pve be installed onto a usb stick, leaving the local drives as zfs mirrors?
    • usb stick failure shouldn't be any more complex than reinstalling pve onto a new usb stick and importing the zfs mirror.
  • Is ceph worth thinking about in such an environment rather than using ZFS local disks?
    • I've read ceph on zfs should be avoided
    • I'm not sure how 1GbE links will hold up with runing a distributed storage system although I have run Nutanix on 1GbE switches in small 3 node environments
Or should I just stick to what I know hahahaha

Thanks in advance!!
 
Your post is fine. But is is complex. Too complex to generate a compact answer ;-)

...8 heterogeneous boxes, all consumer grade.
Box 1 is the main box - will run virtualised truenas
You have several hardware devices. In that case I wouldn't virtualize TrueNAS. This approach has several pros and cons...
should I be avoiding the use of ZFS mirrors
No. ZFS gives so many advantages! Data integrity check +, self-healing = reliability, snapshots, compression and so on.
scratch pool
...might be an exceptions.
Can pve be installed onto a usb stick
Do not do that! "Classic" USB-Sticks will burn up quickly. Two USB-to-SSD/NVMe Adapters (for mirrored ZFS) might work. A normal installation will result (for example) in a ZFS pool named "rpool" which allows assigning virtual disks to VMs. Usually these disks are block devices (ZVOLs) but directory based storages are also offered. A separate "rpool" for OS/PVE is good practice though. In that case I would always use a mirrored SSD.
Is ceph worth thinking about in such an environment rather than using ZFS local disks?
Ceph is great. But I refrained from using it because of the added complexity. You will experience problems your never guessed they exist ;-) I have a test-cluster with four machines and Ceph. When producing problems (on purpose, for learning) the debugging scenarios offered a complete new world of terms and topics to me.

(My "small" solution is to use ZFS replication between a small number of nodes. It does not scale well but it works great - if you can tolerate data-loss between replication intervals!)

I've read ceph on zfs should be avoided
Definitely. If you go for Ceph it should get complete devices = one disk per OSD. And you would want to have more one than single disk per node. Think three disk on five nodes as a starter. (The absolute lower limit is one OSD and three Nodes, but I would never trust this in a "productive" setting.)

Regarding speed: 1 GBit/s will technically work - but fun starts above 10 GBit/s. And it really should be an independent network, no VLANs or other tricks.

Just my 2€c...!
 
zfs on con/pro sumer (=not datacenter PLP protection) ssd will wearout quickly + many data transfer is done because test learning will wearout too.
if learning in <1 year why not. get spares and monitor wearout level if smart is accurate.
 
Your post is fine. But is is complex. Too complex to generate a compact answer ;-)


You have several hardware devices. In that case I wouldn't virtualize TrueNAS. This approach has several pros and cons...

No. ZFS gives so many advantages! Data integrity check +, self-healing = reliability, snapshots, compression and so on.

...might be an exceptions.

Do not do that! "Classic" USB-Sticks will burn up quickly. Two USB-to-SSD/NVMe Adapters (for mirrored ZFS) might work. A normal installation will result (for example) in a ZFS pool named "rpool" which allows assigning virtual disks to VMs. Usually these disks are block devices (ZVOLs) but directory based storages are also offered. A separate "rpool" for OS/PVE is good practice though. In that case I would always use a mirrored SSD.

Ceph is great. But I refrained from using it because of the added complexity. You will experience problems your never guessed they exist ;-) I have a test-cluster with four machines and Ceph. When producing problems (on purpose, for learning) the debugging scenarios offered a complete new world of terms and topics to me.

(My "small" solution is to use ZFS replication between a small number of nodes. It does not scale well but it works great - if you can tolerate data-loss between replication intervals!)


Definitely. If you go for Ceph it should get complete devices = one disk per OSD. And you would want to have more one than single disk per node. Think three disk on five nodes as a starter. (The absolute lower limit is one OSD and three Nodes, but I would never trust this in a "productive" setting.)

Regarding speed: 1 GBit/s will technically work - but fun starts above 10 GBit/s. And it really should be an independent network, no VLANs or other tricks.

Just my 2€c...!
Thanks heaps for the advice. I think I have a few design decisions to make.

Most of the boxes are tiny PCs with a single sata and nvme drive. Currently I have just setup a zfs mirror on them and I'm running esxi, nutanix and hyper v virtualised. Nutanix seems to take a massive hit in performance running on proxmox, I'm guessing it's a cow/cow issue. Wondering whether moving it to the truenas share might be better than local zfs/raw. I'm wondering whether the zfs mirrors on consumer drives are going to cause performance issues in the long run.

Perhaps ext4 and proxmox replication is the way to go. If I loose a node, I have 7 more running while I rebuild it ... hrm I wonder.

I think I'll avoid ceph for now, I can't squeeze enough drives into the boxes to have more than a single drive per box.

I decided to virtualize truenas to control the boot order between my media streaming docker box and the nas box, ensuring that the nas comes up first before any containers try connecting to the nfs shares. That and I have a ton of cores on the truenas box that would go to waste otherwise.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!