Help with designing a home cluster on consumer hardware

crembz · May 11, 2023

Hi Everyone

I'm new to pve and have been fiddling around for a few weeks. I come from an ESXi background and I suppose there is a lot more to consider in designing a cluster given the options around local/network and distributed storage. My main reason for moving to pve is I'm interested in running zfs mirrors to get disk redundancy in case of failures without needing to get into complex backup/recovery/ha. Remembering this is a home environment not a business product environment.

My cluster consists of 8 heterogeneous boxes, all consumer grade.

Box 1 is the main box - will run virtualised truenas using a sas card and pci passthrough as well as a media server
5900x w/64gb ECC
10 spinning disks (pass through data pool xfs)
2 nvme (pve install zfs mirror)
4 sata ssd (pass through sratch pool zfs)
1gbE nic
Wifi nic for management failover

The remaining 7 servers each contain intel CPUs of various generations and 2 nvme/ssd drives, 1TB of storage space each and 12gb nics. These will run a bunch of lab environments, nested ESXi/hyperv/openstack/nutanix and some docker/k8s again for learning/testing/demo

A few questions I have if someone could please help me understand:

I keep reading never to run consumer grade ssds with ZFS due to performance and poor lifespans. Taking performance off the table, should I be avoiding the use of ZFS mirrors for the pve local disks?
- What about the scratch pool which is used as a
  - transcoding/downloading cache
  - VM disk share
If sharing ZFS truenas shares over NFS for VM disks, should those disks use raw or qcow2? I have read that qcow on ZFS is a bad idea due to COW on COW, does that change if NFS is in the mix? Using raw on NFS regardless of the backend seems to constrain the use of snapshots.
Can pve be installed onto a usb stick, leaving the local drives as zfs mirrors?
- usb stick failure shouldn't be any more complex than reinstalling pve onto a new usb stick and importing the zfs mirror.
Is ceph worth thinking about in such an environment rather than using ZFS local disks?
- I've read ceph on zfs should be avoided
- I'm not sure how 1GbE links will hold up with runing a distributed storage system although I have run Nutanix on 1GbE switches in small 3 node environments

Or should I just stick to what I know hahahaha

Thanks in advance!!

crembz · May 14, 2023

Just wondering if anyone had any time to help me understand the above?

UdoB · May 14, 2023

Your post is fine. But is is complex. Too complex to generate a compact answer ;-)

crembz said:
...8 heterogeneous boxes, all consumer grade.
Box 1 is the main box - will run virtualised truenas

You have several hardware devices. In that case I wouldn't virtualize TrueNAS. This approach has several pros and cons...

crembz said:
should I be avoiding the use of ZFS mirrors

No. ZFS gives so many advantages! Data integrity check +, self-healing = reliability, snapshots, compression and so on.

crembz said:
scratch pool

...might be an exceptions.

crembz said:
Can pve be installed onto a usb stick

Do not do that! "Classic" USB-Sticks will burn up quickly. Two USB-to-SSD/NVMe Adapters (for mirrored ZFS) might work. A normal installation will result (for example) in a ZFS pool named "rpool" which allows assigning virtual disks to VMs. Usually these disks are block devices (ZVOLs) but directory based storages are also offered. A separate "rpool" for OS/PVE is good practice though. In that case I would always use a mirrored SSD.

crembz said:
Is ceph worth thinking about in such an environment rather than using ZFS local disks?

Ceph is great. But I refrained from using it because of the added complexity. You will experience problems your never guessed they exist ;-) I have a test-cluster with four machines and Ceph. When producing problems (on purpose, for learning) the debugging scenarios offered a complete new world of terms and topics to me.

(My "small" solution is to use ZFS replication between a small number of nodes. It does not scale well but it works great - if you can tolerate data-loss between replication intervals!)

crembz said:
I've read ceph on zfs should be avoided

Definitely. If you go for Ceph it should get complete devices = one disk per OSD. And you would want to have more one than single disk per node. Think three disk on five nodes as a starter. (The absolute lower limit is one OSD and three Nodes, but I would never trust this in a "productive" setting.)

Regarding speed: 1 GBit/s will technically work - but fun starts above 10 GBit/s. And it really should be an independent network, no VLANs or other tricks.

Just my 2€c...!

_gabriel · May 14, 2023

zfs on con/pro sumer (=not datacenter PLP protection) ssd will wearout quickly + many data transfer is done because test learning will wearout too.
if learning in <1 year why not. get spares and monitor wearout level if smart is accurate.

crembz · May 14, 2023

UdoB said:
Your post is fine. But is is complex. Too complex to generate a compact answer ;-)

You have several hardware devices. In that case I wouldn't virtualize TrueNAS. This approach has several pros and cons...

No. ZFS gives so many advantages! Data integrity check +, self-healing = reliability, snapshots, compression and so on.

...might be an exceptions.

Do not do that! "Classic" USB-Sticks will burn up quickly. Two USB-to-SSD/NVMe Adapters (for mirrored ZFS) might work. A normal installation will result (for example) in a ZFS pool named "rpool" which allows assigning virtual disks to VMs. Usually these disks are block devices (ZVOLs) but directory based storages are also offered. A separate "rpool" for OS/PVE is good practice though. In that case I would always use a mirrored SSD.

Ceph is great. But I refrained from using it because of the added complexity. You will experience problems your never guessed they exist ;-) I have a test-cluster with four machines and Ceph. When producing problems (on purpose, for learning) the debugging scenarios offered a complete new world of terms and topics to me.

(My "small" solution is to use ZFS replication between a small number of nodes. It does not scale well but it works great - if you can tolerate data-loss between replication intervals!)

Definitely. If you go for Ceph it should get complete devices = one disk per OSD. And you would want to have more one than single disk per node. Think three disk on five nodes as a starter. (The absolute lower limit is one OSD and three Nodes, but I would never trust this in a "productive" setting.)

Regarding speed: 1 GBit/s will technically work - but fun starts above 10 GBit/s. And it really should be an independent network, no VLANs or other tricks.

Just my 2€c...!

Thanks heaps for the advice. I think I have a few design decisions to make.

Most of the boxes are tiny PCs with a single sata and nvme drive. Currently I have just setup a zfs mirror on them and I'm running esxi, nutanix and hyper v virtualised. Nutanix seems to take a massive hit in performance running on proxmox, I'm guessing it's a cow/cow issue. Wondering whether moving it to the truenas share might be better than local zfs/raw. I'm wondering whether the zfs mirrors on consumer drives are going to cause performance issues in the long run.

Perhaps ext4 and proxmox replication is the way to go. If I loose a node, I have 7 more running while I rebuild it ... hrm I wonder.

I think I'll avoid ceph for now, I can't squeeze enough drives into the boxes to have more than a single drive per box.

I decided to virtualize truenas to control the boot order between my media streaming docker box and the nas box, ensuring that the nas comes up first before any containers try connecting to the nfs shares. That and I have a ton of cores on the truenas box that would go to waste otherwise.

Search

Search

Help with designing a home cluster on consumer hardware

crembz

Member

crembz

Member

UdoB

Distinguished Member

_gabriel

Renowned Member

crembz

Member