accomplishing the same capacity efficiency with ceph requires ~10 nodes in a 8+2 EC configuration- I dont think you're looking to make a data center in your house.
I don't understand where you guys are getting this idea from.
As shown above, with the system that has four nodes where each node has six 3.5" HDD bays, as long as those four nodes presents the six HDDs as OSDs per node, then I get 24 OSDs with four nodes, which, with either a (6,2) (or (8,2) as you mention, in your case) EC, I can get 74.56% storage efficiency account to
this erasure coding calculator.
Where do you guys get the idea that if you have either (6,2) (or (8,2) EC) that you need (k+m) number of nodes from?
(Someone else on the Level1Techs forum said nearly exactly the same thing and I have yet to see any justification for this. My thinking is that as long as the nodes can supply the OSDs, and I am running a
minimum of three nodes for a quorum, then I can have (6,2) (or (8,2) as you mention) EC. Therefore; if I bought the aforementioned system which has four nodes and each node has six 3.5" HDD bays, then each node can contribute six OSDs to the Ceph cluster, for a total of 24 OSDs split between said four nodes. I don't understand where you guys are getting the idea that I would need eight (for (6,2) EC) or ten (for (8,2) EC) comes from.
If I am running (8,2) EC, then I would need at least 10 OSD, supplied by a
minimum of three nodes (for a quorum). But being that 10 doesn't divide evenly by 3, so I can either have one node that supplies 3 OSDs, another node that supplies another 3 OSDs, and a third node that supplies 4 OSDs.
But if I want it to be able to divide evenly, then I can have five nodes, each supply two OSDs, but it can still be a (8,2) EC. So I am not sure where you guys are getting this from.)
I dont think you're looking to make a data center in your house.
Technically, I already have somewhere between like 12 or 13 "nodes" that I can use. (Two Z420s, two 5950X towers, a 7950X, a 6700K, two 3930K, four dual Xeon E5-2690 (v1), a 4930K, and my "do-it-all" Proxmox (dual Xeon E5-2697A v4) whatever that works out to be.
The old towers can be repurposed for this as they have
can have upto eight 3.5" HDD bays in some of my older tower systems.
(This also doesn't include two 8-bay QNAP NASes (uses the Annapurina Labs AL832 processor I think), and also my old 12-bay dual Xeon L5310 server as well.)
So I don't
have to buy the aforementioned system, but I was looking at it because it is relatively inexpensive (vs. EPYC, for example), but I
could also just repurpose older stuff that I already have if all it is going to be doing is running Ceph.
And all of this already fits in my office.
since I imagine the "bulk" of your storage fits in the "bulk" category, figure out how much payload storage you actually need, and size the ceph solution accordingly WITH FAST SSDs.
Yeah, it is too costly (
especially now) to replace 288 TB of raw capacity with SSDs.
Wayyyy too expensive to do that now.
A lot of things
technically hate HDDs, but we make them work with it anyways because $/GB, HDDs still rule the bulk storage world.
and your 100GB backbone serves no purpose when the drives are good for 0.3
Oh I know. But it's there and I already have it, and it takes the traffic load off the GbE PHY layers since smart TVs come with 100 Mbps RJ45 ports, not QSFP28 IB ports. (At least both my Samsung TVs does.)
speaking of... 100GB IB?!?! WHY?! do you already have switches that are free?
I
used to have and run my own CAE company (HPC/CFD/FEA/CAE) and those applications work a lot better with 100 Gbps IB as the system interconnect rather than GbE. And on a $/Gbps basis (even back then), it was cheaper for me to go with 100 Gbps IB (36-port MSB-7890 switch back then was ~$2230 USD) where as 10 GbE would be cheaper on an absolute cost basis, but more expensive on a $/Gbps (either total switching capacity, or on a per-port) basis than said 100 Gbps IB.
Therefore; I
skipped 2.5G, 5G, 10G, 25G, 40G, 50G entirely and went straight to 100 Gbps IB.
The
only thing that is under evaluation now is for me to swap out the MSB-7890 with a MSB-7800 for ~$700 USD. (I originally
had a MSB-7800, but the switch itself had issues because it was rebooting itself on the hour, every hour, so I returned that one.)
But yes, from a $/Gbps perspective, it wasn't cost efficient.
I am getting the sense that this is a hobby and not a business.... WHERE are you putting all this equipment?
It's a hobby now. One that I am trying to keep expenses/costs contained.
My office.
It doesn't generate
that much heat/noise. Air exhaust temp about 150 mm away from the back of my "do-it-all" Proxmox server, nominally, is only around like 34.5 C. Something like that. Maybe 40 C. It only gets hotter when the system is
actively working on something (e.g. running
pixz or something like that), but that's the nominal air exhaust temp.
Noise is similar in the sense that it only gets louder when the system is
actively working on something, else is a nominal hum. Wife can close the door to my office if it really bothers her, but if she just hangs out/stays upstairs, she can't even hear it with the noise being confined to my office downstairs.
I can only imagine how happy she'll be if you spend money on a always on noisy space heater that doubles your electric bill
So....that's a bit ironic in the sense that since we've moved to a bigger house, my systems is struggling to keep the house
as warm vs. our older, smaller house, where the heat from the systems was actually doing a fair big of lifting in terms of keeping the house warm.
From a cost efficiency perspective, natural gas heating is cheaper on a $/MJ basis than electric heating via my systems.
On the other hand though, if we
didn't have my servers, then we would've been paying for a variety of streaming services, which after a while, it'll add up to
more than the cost of the systems and the electricity costs.
So, as such, the systems ability to also at least partially heat our newer, bigger house, is a fringe benefit (in the winter) and a fringe detriment (in the summer, as that will have to be cooled).
But, that's where the overall system power efficiency calculations come into play where, yes, I
could spend money to buy newer-to-me old enterprise server systems (like said aforementioned server) where for a given power consumption, it would be able to do more computational work, or the flip side now (with how much RAM and storage costs), is for me to hold off on buying anything new-to-me, and just re-use what I already have.
Older stuff is not as power efficient (Z420 systems with its single E5-2690 (v1)
idles at around 140 W. And my 3930K and 4930K
idle at around 200 W. But it's still me
not needing to buy new-to-me stuff.
Either way, I'm going to be paying (whether it comes out of initial capex and reduced electricity cost over time or electricity costs over time (by using the older systems that I already have)).
Unfortunately, for a "leap" in efficiency for me, would mean spending somewhere around $4000-5000 USD to switch over to a dual EPYC 7763 system, and wife would
definitely kill me for that.
So I am trying to keep the overall total system cost efficiency in check (because if I buy new-to-me, with RAM prices where they are now, the TARR on that looks really crappy right now) and my old(er) systems
can run Ceph, but the X9 platform/systems
can't do IOMMU for example, which means no GPU passthrough.
(I bought my 5950X systems and my 7950X system
because wife was complaining about the noise that my old quad half-width node blade server (with the dual E5-2690 (v1) was making), so she has effectively banned me from turning that thing back on again. But as a piece of equipment, I mean, I
do have it, and it
can be turned on to serve up Ceph. Wife just doesn't want me to because of how loud that thing is. She put up with it when I had my CAE services company because at least I was making money with said noise. But now that I stopped doing that since the beginning of COVID (when companies were scaling back), I "replaced" that with two 5950X and a 7950X compute nodes instead.
The NH-D15 CPU HSF is a lot quieter. Bigger, less physically dense than the 2U four node Supermicro setup, but less noise. And more computationally efficient and also just computationally faster.
But yeah....