Hardware Feedback - Proxmox Ceph Cluster (3-4 nodes)

_--James--_ · May 29, 2024

justinclift said:
Interesting idea. That could be the right direction too.

Its just that, most of us in the enterprise are going to want to deploy current gen hardware to get more out of the life cycle. Its hard to do an apples to apples comparison when looking to the gray market for a mock-up. Then we are stuck with gray market hardware that may not be suitable for production

alexskysilk · May 29, 2024

jmounts79 said:
Its just that, most of us in the enterprise are going to want to deploy current gen hardware to get more out of the life cycle

In my experience, purchasing in the enterprise often happens based on budget and vendor relations. lifecycle is determined by the company policy for depreciation schedule and not the relative "age" of the technology. The solution is either defined by the consulting arm of the vendor, or by engineering who have to make all kinds of tradeoffs to fit within the above limitations. Most of the time the use cases are generic enough to fit. Sometimes they dont.

jmounts79 said:
Its hard to do an apples to apples comparison when looking to the gray market for a mock-up.

It depends on who you are in the cycle. if you're the engineer, you have your use case benchmarks set up; its pretty easy then to include them with your rfp. If you have sufficient flexibility to test out "grey market" hardware it just means you'd have to do the benchmarking yourself but I have yet to see any company in the fortune 1000 size range do that.

jmounts79 said:
Then we are stuck with gray market hardware that may not be suitable for production

If you did the legwork you wouldn't. This is a common issue with small companies that either can't afford (or dont want to) proper labbing of a solution before deploying.

The moral of the story is: validate your use case, not the hardware.

justinclift · May 29, 2024

alexskysilk said:
The moral of the story is: validate your use case, not the hardware.

What does that mean in practical terms?

_--James--_ · May 29, 2024

justinclift said:
What does that mean in practical terms?

Means to validate the hardware

I am dealing with an Org that dropped 10m+ on Nutanix and under provisioned the hell out of it. They have no fail-over resources, have constant performance issues,..etc. Because they didn't take the time to review the hardware during the ordering process.

Use case validation only takes you so far. As you will have many use cases and not all(many) of them will be well documented.

alexskysilk · May 29, 2024

jmounts79 said:
Use case validation only takes you so far. As you will have many use cases and not all(many) of them will be well documented.

garbage in, garbage out. this isnt a problem specific to computer hardware procurement. If you dont model the outcome ahead of making decisions you can end up disappointed.

justinclift said:
What does that mean in practical terms?

if you are not prepared to fulfill the application requirements you cant very well provide working infrastructure. I imagine your devs should be giving you baselines and HOPEFULLY the means to validate equipment for those baselines, at all levels of service (compute/instance, storage, networking, etc)

Its of course possible to guess, but that opens up overpaying/underprovisioning for capex and/or opex. as long your cfo doesnt mind, sure.

_--James--_ · May 29, 2024

alexskysilk said:
garbage in, garbage out. this isnt a problem specific to computer hardware procurement. If you dont model the outcome ahead of making decisions you can end up disappointed.

In my experience, most software vendors don't even know what their baselines are. You might get lucky and they can field an edge case your way. Also, no one said this was specific to hardware procurement, but it absolutely is where it's felt the most. Hell, I remember a time where storage vendors were more interested in selling space then IOPS and having to change that dialog over to IOPS required not only pulling in a new team from $vendor, but also jumping to a completely $new-vendor. This was just 8-12 years ago too.

alexskysilk · May 29, 2024

I have two comments about that-

1. most software is generic in purpose. when I say benchmark, I mean for what you intend to be doing with it. this is doubly relevent when discussing virtualization.
2. There are people who specialize in this kind of proving work. If there is enough time and money riding on picking and deploying the right solution for your business, pay them.

(and yes, I'm aware that paying vars or consultants has a very spotty track record of actually yielding useful results and/or without just milking the principal... but like you said, this isnt easy.)

_--James--_ · May 29, 2024

sure however software starts generic then becomes something completely different and undocumented over time. Ill throw one word on this for ya, Sage. This company ended up requiring a 800k storage solution to keep up performance with their undocumented changes, saying nothing of the insane mods that other customers were paying for that were shipping with each major revision they pushed out. This was a software ERP deployment that went into the millions during its life cycle(s) with support being both pro-engagement and yearly for break/fix. But they still couldn't be bothered to spec out performance requirements on their newly minted builds.

Then we have the use case issue, where almost no two deployments across customers are the same. Sure we can cycle in VARs and partners to help with this, but unless they know your business model (because, lets face it most businesses don't know what they are doing either), its just another layer on the guessing game. My favorite is when finance is split between two autonomous teams that fight over ERP rights and lock each other out, because the devs that were cycled in did not include both teams for the mod/changes and didnt understand the core concept that they are building on, which nets in more performance lost.

Also, Dev's do not benchmark they just generalize on "what should" it be. I have yet to have a dev give me good target baselines that were legit. Devs are good at pushing hardware, not so good at making their code live with in the restraints of their given platform/hardware.

I remember this time I had to show a dev their "MSACCESS" magic code was single threaded by showing them ESXTOP expanded world data, because the dev didnt believe task manager. Helped them flip over to multi-threaded workloads and they ended up eating a full 128core Epyc server by themselves due to inefficiencies in ...well its MS Access...would not even discuss SQL...

and it all nets in the same result: Buy more hardware, because the software isn't getting fixed.

justinclift · May 30, 2024

jmounts79 said:
they ended up eating a full 128core Epyc server by themselves due to inefficiencies in ...well its MS Access

As punishment they could be made to stand next to the server (with no ear protection) for a full minute while it did that.

_--James--_ · May 30, 2024

justinclift said:
As punishment they could be made to stand next to the server (with no ear protection) for a full minute while it did that.

If only that data center was not badged, bio metric, and infrared secured!

But...do we really want finance "shadow" IT near our servers?

justinclift · May 30, 2024

jmounts79 said:
But...do we really want finance "shadow" IT near our servers?

Errr, yes?

It might sound counter-intuitive. But sometimes helping people learn about stuff they're using ends up with them having a better understanding or appreciation of things. It can go the opposite way too, but that's super rare.

_--James--_ · May 30, 2024

justinclift said:
Errr, yes?

It might sound counter-intuitive. But sometimes helping people learn about stuff they're using ends up with them having a better understanding or appreciation of things. It can go the opposite way too, but that's super rare.

My experience with these types, you give them an inch and invite them into the thick of it, they come out all knowing and find a way to get a seat above your department to dictate what is and is not happening. I know not all finance report writers are like that, but these guys sure were.

justinclift · May 30, 2024

Damn.

_--James--_ · May 30, 2024

yea, this guy landed at the feet of the president of the company and started to dictate IT policy. It was fun times, and was part of why I moved on.

Ramalama · May 30, 2024

adresner said:
After going down many rabbit holes, I have finally come to the conclusion that the best solution (for my office) is a Proxmox cluster with 4 nodes. Depending on my final build, I might be able to only have 3.

For now, I will use both proxmox backup and veeam to backup my VMs to a TrueNAS box and replicate those backups to 2 remote locations.

It's off lease from a trusted supplier. I have been using their gear for 5 years now, excellent support.

Use case: 10 VM running MS Exchange, Active Directory, SQL, Reporting and Accounting Software, File Server, Anti-Virus server, Mattermost Server, Nextcloud, and an array of LXC's with little services. Currently using Hyper-V on a single server... backed up with Veeam to TrueNAS. Had been thinking about a TrueNAS with Proxmox or xcp-ng hypervisor but thats a single point of failure.. even if I add 3 Proxmox, if the Truenas goes down.. i'm down. Circle back to a Proxmox Ceph cluster.. 3 nodes but realize 4 is better.

Was going to use 25GB but realize I need 100GB Mesh network, based on the excellent benchmark guide posted here.

Still learning about CEPH, is 8x drives per node too many? are the drives too big? According to https://florian.ca/ceph-calculator/ ill have about 19TB of safely usable space. Correct?

Went back and forth on Intel AMD, SSD vs NVME, etc. Settled in on the following build.

**add in 100GB networking, still waiting on that quote.

Would appreciate any feedback on my concepts or hardware choice (like I'm a bonehead for going AMD?)

Thanks!

I bet this is around 30k usd/eur ?
You get for that money already 3x genoa servers, each with:
Asus rs520a-e12-rs12u
9274F + 12x 32gb ddr-5 memory (384gb) + Intel E810 + 12x micron 7450/7500 pro 4TB + 2x pm9a3 2tb for OS in m.2 110 formfactor.

I would go with 9374F and 64gb dimms, but that's out of budget for you and you have 3 servers, so 9274 is more as enough.
And you don't need a ton of memory anyway if you go with ceph and such fast nvme drives.

But however buy from dell that old crap
Cheers

Ramalama · May 30, 2024

Ah and i forgot, they are very loud, on normal load so 99% of the time, the fans spin at 7280rpm, which is somewhat okay.

On full load they spin at 17k RPM and that sounds like a plane next to you.

But that are 400W cpus, so it's expected

justinclift · May 30, 2024

Ramalama said:
Asus rs520a-e12-rs12u

ASUS? Just say no: https://www.youtube.com/watch?v=7pMrssIrKcY

Plenty of other places make good gear. No need to subsidize companies pulling that crap.

_--James--_ · May 30, 2024

justinclift said:
ASUS? Just say no: https://www.youtube.com/watch?v=7pMrssIrKcY

Plenty of other places make good gear. No need to subsidize companies pulling that crap.

Came back to say literally this ^

Ramalama · May 30, 2024

justinclift said:
ASUS? Just say no: https://www.youtube.com/watch?v=7pMrssIrKcY

Plenty of other places make good gear. No need to subsidize companies pulling that crap.

I have those servers, they are good. Bios updates are good and the Hardware is good.
And additionally the Server Section has Nothing todo with the Customer Section, completely different department/support etc.

There are other Barebones, from Gigabyte/Asrock and a lot more.
Even from Supermicro.

But they all provide Bios Updates far too late, like half year after AMD released a new Agesa-FW, some of them don't even Care.
TBH, i find it really stupid that you provide an absolutely unrelated video without checking first the alternatives.

Dell/HP are the only ones, which provide very often, even sometimes to many Bios-Updates.
Asus is like on the second place in the Server Space. And Supermicro's build Quality of Chassis and Components is just Cheap and sometimes even Crap!
Cheers

_--James--_ · May 30, 2024

Ramalama said:
I bet this is around 30k usd/eur ?
You get for that money already 3x genoa servers, each with:

Considering I just quoting this out with SMC, Dell, HPE, and Gigabyte no, you really dont get what you claim for 30k. Its closer to 70K. The Micron 7450's 4TB are 22k alone with 12 drives per server...

Hardware Feedback - Proxmox Ceph Cluster (3-4 nodes)

Member

Distinguished Member

Well-Known Member

Member

Distinguished Member

Member

Distinguished Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Renowned Member

Renowned Member

Well-Known Member

Member

Renowned Member

Member

We value your privacy