Should I Not?

J-dub · Dec 20, 2023

Hello!
I come from the wonderful land of youtube - where I saw all these tubers showing off their $50 - $200 Proxmox clusters. I thought to myself "if these nobs can do it with old garbage, I can do it with new garbage!".

My goal is to host simple home services like PiHole DNS, a suite of Arrs, Jellyfin or Kodi, Opensense, Home Assistant, etc.. basic stuff running in containers & VMs.
My plan was to use 2 clustered mini-PCs until I found I needed 3 to avoid issues. Fine, I bought 3 miniforums NPB5 (Intel 13 for the encoding - 32 DDR5, dual 2.5gbe NICs, 2 USB4 40gbs) - slapped a 4tb Crucial nvme and 500gb 2.5ssd in each and figured I was WAY over kill for what I needed.

HA, HCI, Here I Come!!!

And then I read the Proxmox docs...

Looks like I need 25gbs networking for the data side chatter - minimum, 10gbs for the public & more data chatter and a simple 1gbs for the cluster talking to itself. On 3 separate switches to keep it all clean.
The USB4 on these PCs is not licensed for legacy thunderbolt3 networking so I'm SOL on using those for a 20ish gbs mesh. So looks like I'm done right there right?
It also looks like ZFS or Ceph is required for the HCI and that they will chew up and spit out my consumer grade garbage. So all over, that's it, not worth doing?

Then I look at reddit, watch some more tube and it still seems like everyone's 3rd cousin has had this working awesome for last 5 years on 10 year old mini PCsand I shouldn't worry about - just do it.

So rather than wondering and rather then lurking I thought I'd ask here ... Should I just not?

Thank you!

esi_y · Dec 20, 2023

J-dub said:
My goal is to host simple home services like PiHole DNS, a suite of Arrs, Jellyfin or Kodi, Opensense, Home Assistant, etc.. basic stuff running in containers & VMs.

Probably common use case of lots of hobbyists on the forum which do not meet the criteria below.

J-dub said:
My plan was to use 2 clustered mini-PCs until I found I needed 3 to avoid issues. Fine, I bought 3 miniforums NPB5 (Intel 13 for the encoding - 32 DDR5, dual 2.5gbe NICs, 2 USB4 40gbs) - slapped a 4tb Crucial nvme and 500gb 2.5ssd in each and figured I was WAY over kill for what I needed.

It is overkill, but you could have saved yourself a bit of cash by running 2 + a QDevice.

J-dub said:
Looks like I need 25gbs networking for the data side chatter - minimum, 10gbs for the public & more data chatter and a simple 1gbs for the cluster talking to itself. On 3 separate switches to keep it all clean.

Not for your setup, you would be fine on a Gigabit switch. If you want to save those SSDs a bit (the 4TB is probably hight TBW, but the 500GB if you meant it for your system would be happier if you added something like this on top of it (it's not official, not mine either) to save yourself lots of writes:
https://github.com/isasmendiagus/pmxcfs-ram

J-dub said:
The USB4 on these PCs is not licensed for legacy thunderbolt3 networking so I'm SOL on using those for a 20ish gbs mesh. So looks like I'm done right there right?

What do you mean licensed? TB networking works quite alright with 3 nodes (it takes some playing with):
https://forum.proxmox.com/threads/intel-nuc-13-pro-thunderbolt-ring-network-ceph-cluster.131107/

J-dub said:
It also looks like ZFS or Ceph is required for the HCI and that they will chew up and spit out my consumer grade garbage. So all over, that's it, not worth doing?

As for CEPH, look above, as for ZFS, I would not have it on the 500G, tweaking the params a bit and on the 4TB should be fine for the VMs.

J-dub said:
Then I look at reddit, watch some more tube and it still seems like everyone's 3rd cousin has had this working awesome for last 5 years on 10 year old mini PCsand I shouldn't worry about - just do it.

Try and compare for yourself.

J-dub said:
So rather than wondering and rather then lurking I thought I'd ask here ... Should I just not?

You already got the miniforums, didn't you?

J-dub said:
Thank you!

Do it.

Dunuin · Dec 20, 2023

tempacc346235 said:
Not for your setup, you would be fine on a Gigabit switch. If you want to save those SSDs a bit (the 4TB is probably hight TBW, but the 500GB if you meant it for your system would be happier if you added something like this on top of it (it's not official, not mine either) to save yourself lots of writes:
https://github.com/isasmendiagus/pmxcfs-ram

He was talking about the ceph requirements for the clustered filesystem. And yes, while it might run with Gbit NICs you really won't have fun below 10Gbit/s NICs and won't be reliable without dedicated corosync NIC.

J-dub said:
And then I read the Proxmox docs...

Looks like I need 25gbs networking for the data side chatter - minimum, 10gbs for the public & more data chatter and a simple 1gbs for the cluster talking to itself. On 3 separate switches to keep it all clean.
The USB4 on these PCs is not licensed for legacy thunderbolt3 networking so I'm SOL on using those for a 20ish gbs mesh. So looks like I'm done right there right?
It also looks like ZFS or Ceph is required for the HCI and that they will chew up and spit out my consumer grade garbage. So all over, that's it, not worth doing?

You are buying consumer mobile hardware (basically a laptop without the useful integrated UPS and terminal) for enterprise workloads like virtualization, ceph, ZFS, ...don't expect too much.
Next time better read the official hardware recommendations of the stuff you want to run before buying the hardware for it.
If your hardware lacks the proper Enterprise SSDs and NICs for ZFS/ceph you could still run those 3 nodes unclustered using local LVM-thin. This would be way less demanding for network and disk requirements.
Pihole and OPNsense could be run highly available without a PVE cluster and all your torrenting/streaming probably won't need high availability anyway.

esi_y · Dec 20, 2023

Dunuin said:
He was talking about the ceph requirements for the clustered filesystem. And yes, while it might run with Gbit NICs you really won't have fun below 10Gbit/s NICs and won't be reliable without dedicated corosync NIC.

I don't run CEPH myself on "homelab" anything unless I want to experiment, but do you really suggest he would be having issues with Pihole, Jellyfin and HA? I agree maybe he would be better off with extra switch for the corosync if he saturates the other one with CEPH, but the link I gave was doing CEPH happily over FRR TB networking.

Dunuin said:
You are buying consumer mobile hardware for enterprise workloads like virtualization, ceph, ZFS, ...don't expect too much.
Next time better read the official hardware recommendations of the stuff you want to run before buying the hardware for it.

I might be wrong but I was under the impression he bought it for that homelab only.

Dunuin said:
If your hardware lacks the proper Enterprise SSDs and NICs for ZFS/ceph you could still run those 3 nodes unclustered using local LVM-thin. This would be way less demanding for network and disk requirements.

I do not vouch for CEPH (I would not discourage him to try out either), but I think he can run ZFS pool for VMs on that 4TB just fine.

esi_y · Dec 20, 2023

I mean he literally wrote "HA, HCI here I come" ... he can do that ... without CEPH, ZFS replication on and a good gigabit managed switch (or two cheap ones). It will work. The only thing I would be worried about is shredding the 500G SSDs.

alexskysilk · Dec 20, 2023

J-dub said:
So rather than wondering and rather then lurking I thought I'd ask here ... Should I just not?

As a hardware vendor, I say go for it! actually, if 3 nodes is better than 2, a dozen is better than 12!!

seriousness aside (

) consider what you're trying to accomplish.

J-dub said:
My goal is to host simple home services like PiHole DNS, a suite of Arrs, Jellyfin or Kodi, Opensense, Home Assistant, etc.. basic stuff running in containers & VMs.

this can probably be accomplished with a raspberry pi. A single miniforum NPB5 is massive overkill. multiple of them... well you get the idea.

"but I want to learn clustering!" you can easily do this virtualized on a SINGLE NPB5.

"but I want to brag on the forums!" like a said, a dozen!

J-dub · Dec 20, 2023

tempacc346235 said:
It is overkill, but you could have saved yourself a bit of cash by running 2 + a QDevice.

Not for your setup, you would be fine on a Gigabit switch. If you want to save those SSDs a bit (the 4TB is probably hight TBW, but the 500GB if you meant it for your system would be happier if you added something like this on top of it (it's not official, not mine either) to save yourself lots of writes: https://github.com/isasmendiagus/pmxcfs-ram

What do you mean licensed? TB networking works quite alright with 3 nodes (it takes some playing with): https://forum.proxmox.com/threads/intel-nuc-13-pro-thunderbolt-ring-network-ceph-cluster.131107/

As for CEPH, look above, as for ZFS, I would not have it on the 500G, tweaking the params a bit and on the 4TB should be fine for the VMs.

You already got the miniforums, didn't you?

True - but I think Ceph still wants 3 drive minimum and I originally planned to go that route.
Ohh fun find! I just have the PVE OS on the 500GB at least that's the plan - is that still high writes for the OS drive (syslogs maybe)?
If a manufacture didn't opt in to Legacy Thunderbolt mode (free, but requires a licensing agreement) to Intel, Intel wont provide a Legacy Thunderbolt vendor key. That guide is using an "Intel" NUC with actual Thunderbolt 4 ports. I tried it (and failed) anyways
I was planning to just use Ceph (or ZFS?) on the NVME drives as a single pool for the VMs. At least originaly that's what I was going to do until I could't get the Thunderbolt-Net working.
I can return them until January 31st ...

I can still grab a cheap 4 port 10gbs switch for the Ceph and use my current 8 port 2.5gbs for the Public, and grab another 1gbs 4 port for the Corosync. I do have a 1gbs switch I could put the VMs on (my access points and firewall are using it too). I'd also have to grab a cheap usb - ethernet adapter and a Thunderbolt to 10gbs ethernet adapter to set that all up... for each node.

Or.. I could return the nodes and drives ($2100ish) and build a single server and just back it up to my old NAS.

I "thought" the cluster would be fun, a bit faster and remove a single mode failure risk. Not sure it's working out however!

Still tempted to try though. They are running Pmox just begging for further harassment. I am a bit worried about a 3 node failure in 6 months though lol.

I blame youtubers.

esi_y · Dec 20, 2023

J-dub said:
I can return them until January 31st ...

Get two NUCs and a QDevice something, forget CEPH but stay with ZFS, you can have HA with that. At least that's what I would do. BTW The TB networking works about the same with NUC11 already, there's not much gain with Adler or Raptor lake. The other thing with 11gen, you get all P-cores only, I find that more predictable for virtualisation anything.

J-dub said:
I can still grab a cheap 4 port 10gbs switch for the Ceph and use my current 8 port 2.5gbs for the Public, and grab another 1gbs 4 port for the Corosync. I do have a 1gbs switch I could put the VMs on (my access points and firewall are using it too). I'd also have to grab a cheap usb - ethernet adapter and a Thunderbolt to 10gbs ethernet adapter to set that all up... for each node.

To be honest, I never understood the 10g obsession when it's becoming "pedestrian" anyways and for such little homelab the TB networking is quite workable (as you discovered, with the right hardware) and gets you 20-30G for the price of a good cable (you already have the "network cards").

J-dub said:
Or.. I could return the nodes and drives ($2100ish) and build a single server and just back it up to my old NAS.

Or.

J-dub said:
I "thought" the cluster would be fun, a bit faster and remove a single mode failure risk. Not sure it's working out however!

It is "fun", but CEPH will kill the fun, in that @Dunuin is right. It would be just like a proof-of-concept.

J-dub said:
Still tempted to try though. They are running Pmox just begging for further harassment. I am a bit worried about a 3 node failure in 6 months though lol.

You budget something for backups, cluster or not.

J-dub said:
I blame youtubers.

Make your own channel to recoup the cost.

J-dub · Dec 20, 2023

alexskysilk said:
As a hardware vendor, I say go for it! actually, if 3 nodes is better than 2, a dozen is better than 12!!

seriousness aside () consider what you're trying to accomplish.

this can probably be accomplished with a raspberry pi. A single miniforum NPB5 is massive overkill. multiple of them... well you get the idea.

"but I want to learn clustering!" you can easily do this virtualized on a SINGLE NPB5.

"but I want to brag on the forums!" like a said, a dozen!

I was running all that more on a 2016 Synology 2 bay NAS (with networking on my Unifi gear and a couple of pi4's for piHole). It was fighting the NAS's and Unifi's "walled garden" operating systems that got me looking to alternatives.
I've used MS HyperV SCCM professionally before so the idea of clustering isn't new - but the cheap consumer open-sauce method - looked fun and useful. So here we are!
I'm not the bragging type, nor do I have the pockets for it! Hence the Minisforums PCs :-\

esi_y · Dec 20, 2023

J-dub said:
I'm not the bragging type, nor do I have the pockets for it! Hence the Minisforums PCs :-\

For those who say Proxmox VE w/HA on NUCs at home is somehow crazy:
https://shreddedbacon.com/post/openstack-home-environment/

So yeah ... do it, you will have fun!

esi_y · Dec 20, 2023

J-dub said:
I was running all that more on a 2016 Synology 2 bay NAS (with networking on my Unifi gear and a couple of pi4's for piHole). It was fighting the NAS's and Unifi's "walled garden" operating systems that got me looking to alternatives.

Get an EdgeRouter, even the X for $50, keep the APs and switches. Sell the dream hardware and have Christmas without walls.

J-dub · Dec 20, 2023

Dunuin said:
He was talking about the ceph requirements for the clustered filesystem. And yes, while it might run with Gbit NICs you really won't have fun below 10Gbit/s NICs and won't be reliable without dedicated corosync NIC.

You are buying consumer mobile hardware (basically a laptop without the useful integrated UPS and terminal) for enterprise workloads like virtualization, ceph, ZFS, ...don't expect too much.
Next time better read the official hardware recommendations of the stuff you want to run before buying the hardware for it.
If your hardware lacks the proper Enterprise SSDs and NICs for ZFS/ceph you could still run those 3 nodes unclustered using local LVM-thin. This would be way less demanding for network and disk requirements.
Pihole and OPNsense could be run highly available without a PVE cluster and all your torrenting/streaming probably won't need high availability anyway.

I should have deep dived the docs - true! I knew a bit from reading reddit/lemmy posts, youtube vids, tech websites, etc. I just assumed there would be a good solution - no matter what I bought (I mean, the the posts are using mini PCs from 2012-18). I also figured if the Proxmox didn't work out for it, another HCI solution might. I tried a few once I found that Proxmox wasn't going to be "right" for what I had (once the thunderbolt-net fell through). After using those, I have decided that I want to use Proxmox however - so here I am taking suggestions for "J-dubs clusterf*k Version 2.0"!

My expectations have been fully tempered since day 1 - I've never expected enterprise worthy performance or stability from my silly, random Chinese manufacturer, laptop-in-box, consumer grade PCs. I did think they would be FAR beyond what I needed in a simple home server and (based on what I've read and watched) I had no reason to believe it wouldn't be a "robust" solution... that is until I dug deeper during setup.

I've learned a lot (Debian is neat eh!) - it's been fun. It would be fun to setup a virt stack cluster and if I do - it'll be a Proxmox cluster. That said, I'm starting to think it's better to use a solo server and my old NAS... mostly due to the networking requements for "any" HCI HyperVisor.

The HA wasn't for media services - but I do want it for a private wiki, a couple of databases and to move away from Unifi networking. I haven't even gotten started on self-hosting yet... since I want to naildown the baremetal and it's OS first

And yes for all to know - I'll be running proper backups too. 3-2-1 I get it

J-dub · Dec 21, 2023

tempacc346235 said:
Get two NUCs and a QDevice something, forget CEPH but stay with ZFS, you can have HA with that. At least that's what I would do. BTW The TB networking works about the same with NUC11 already, there's not much gain with Adler or Raptor lake. The other thing with 11gen, you get all P-cores only, I find that more predictable for virtualisation anything.

To be honest, I never understood the 10g obsession when it's becoming "pedestrian" anyways and for such little homelab the TB networking is quite workable (as you discovered, with the right hardware) and gets you 20-30G for the price of a good cable (you already have the "network cards").

Or.

It is "fun", but CEPH will kill the fun, in that @Dunuin is right. It would be just like a proof-of-concept.

You budget something for backups, cluster or not.

Make your own channel to recoup the cost.

I think I'll try this idea.. I should have gone Nucs to start with. I had them in the cart... then I saw dual 2.5gbs and DDR5 and lower price... and made the wrong choice. C'est la vie!

I only brought up 10gbe since that's the most reasonable usb-ethernet price/speed option available.. I agree - it should be the base speed for everything now!

CEPH looks awesome - for Azure, AWS, etc. When I want full depth rack and $50k in drives and networking... I'll check it out again!

esi_y · Dec 21, 2023

J-dub said:
I think I'll try this idea.. I should have gone Nucs to start with. I had them in the cart... then I saw dual 2.5gbs and DDR5 and lower price... and made the wrong choice. C'est la vie!

I was testing PVE on two NUC11s (just i3, it was some weird sale 30% off on the PAHIs Lite, people did not realize it has TB and everything except for Iris, of course no vPro) and qnetd running external to it, ZFS pool for the VMs/CTs, I had them connected with TB4 cable (despite it's TB3 only), no FRR, just point-to-point as migration network, insecure to see how much throughput it will have (and to offload the NUCs, also what's the point to encrypt TB cable), getting something like 18Gbps on that I remember, stable. When I had it dropping before it was a dodgy TB3 cable causing it. Some people reported issues with NUC8s I remember for the TB connection.

If you have HA and replication with ZFS over that TB, it is workable, i.e. it does work well without all the complexity of something like FRR. I did not do CEPH, I did have backups done out of the "cluster" over gigabit, I do not like how vzdump works, I was zfs sending it away taking advantage of snaphosts, saving CPU cycles for the VMs instead. I could not saturate the Gbit switch with residential fibre connection out, so I did not worry about corosync issues neither did I need any special settings on the switch. Oh and I was zfs sending it all out over mbuffer through IPsec site2site done by the said ER-X though I think the NUC would have managed with AES-NI, why bother it when it has a cluster to run.

Now the fun part, as initial load test I would launch (unnamed) blockchain sync, one service on one node, another on the other, it was going to build up 2TB DBs constantly shredding "consumer" NVMes (but the TBW was like 2PB), in fact I had to limit it to 1 core on each NUC because of course it would take it up all. Then added regular "homelab" things totalling around 10 (it's just 4C 8T on those 2x i3s), out of habit I prefer CTs for whatever possible, but probably 4 were VMs.

Now! ... the 24/7 run (and heat?) killed one of the NUCs within 2 weeks

, so RMA'd that and happily went on.

Due to cluster setup and backups, I did not have any manual recovery work to do. I would argue it would have died within 2 years to someone with a "normal" use, bad piece. The Pros are rated for 24/7 use but I did not notice any different TDPs. The NVMes held up, but I would not be surprised if they go too if I kept doing THAT. I would RMA them too, if it's declared to last 2PB and it gets killed in 2 months at 100TB written, what else does it deserve?

Basically just set it up with resiliency in mind, put any kind of load on it and see how it works for you. Before someone piles into me, this is all for "lab", I am not suggesting people run their production like this.

J-dub said:
I only brought up 10gbe since that's the most reasonable usb-ethernet price/speed option available.. I agree - it should be the base speed for everything now!

I do not like the principle of adding crazy complexity e.g. USB adapter, extra switch as SPOC which happens to be a heater too to end up connecting 2-3 "nodes" especially if it's going to take up the USB C ports that could just do TB with a direct cable. Of course I would not run a serious setup like that, but neither would we be talking USB NICs for that.

J-dub said:
CEPH looks awesome - for Azure, AWS, etc. When I want full depth rack and $50k in drives and networking... I'll check it out again!

I think the linked thread with 3 newest NUCs was basically showing what it can do in the kind of "lab". For a typical home setup, doing CEPH is more for bragging rights than practicality. But maybe some youtubers have something else in mind for "home".

Dunuin · Dec 21, 2023

J-dub said:
Ohh fun find! I just have the PVE OS on the 500GB at least that's the plan - is that still high writes for the OS drive (syslogs maybe)?

Here its 10GB per day per system disk. So too much for very cheap flash like USB pen drives/SD cards but consumer SSD (while not recommended) should last quite some time.

J-dub · Dec 21, 2023

So for just those writes (not including updates, etc.) at 10GB a day that would be..
224.43 Years for the NVME with a 800TBW rating and 5 year warranty.
44.89 Years for the 500GB ssd with a 160TBW and a 5 year warranty.
I could get the same ssd for $110 with 2TB and a 700TBW.. but that still seems needless for a home setup?

Again people seem to be have great success with their used, old, junk mini PCs. My cheap "new" junk - should be super "okay" for the small use case it has.
These MinisForum PCs draw about 20watts and cost $480 (before drives, with RAM)... I'm still tempted to just use them w/ dedicated 2.5gbs switches.

The NUCs are nice, but I loose compute (Intel11vs13 ddr4vs5 ) and a 2.5gbe port and spend another $200ish per node (for old NUC 11s, have to buy RAM too).. but I gain thunderbolt-net...

Ohhh the first world problems! Arrrrrggggghhhh!

Dunuin · Dec 21, 2023

Yes, PVE system disk usually isn't the problem. Bigger problem is ZFS for the VM storage. Here my homeserver is writing ~1TB per day while idling without anyone actively using it. Most of the writes are storing logs and metrics. There the 160 TB TBW would be exceeded and the warranty lost after half a year...
And no, its no creating lots of logs/metrics. Its just the damn average write amplification of factor 20. So the dailly 50 GB (thats just 600kb/s of writes) of data per day is amplified to 1TB.

J-dub · Dec 21, 2023

That's only 2 (rated) years on that NVME.... hmmmmm

Yeah that's not super great.

Dunuin · Dec 21, 2023

And that is idling. If I actually use that server this multiple TBs per day. Maybe not that bad with your workload. The more sync writes or small random IO you got, the worse the write amplification will be. I've seen here a write amplification of factor 3 to 62 depending on the workload.
Replaced the consumer SSDs with Enterprise SSDs that got 41500 TB TBW for the 2TB storage and now I don'T really have to care about wear. Most likely the controller will fail way before the NAND will be worn.

J-dub · Dec 21, 2023

Looks like:

No I Should not.

Thinking return 2 nodes, keep one and just be ready for failure of one system is the smart move here. Clustering would have been fun - but stupid.

Those pesky Youtubers! (I have a CNC Plasma table to use from them too )

I think I need to watch less Youtube - my ADHD can't handle it - Squirrel!

Should I Not?

Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Member

Renowned Member

Member

Renowned Member

Renowned Member

Member

Member

Renowned Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

We value your privacy