Node hardware redundancy for cluster

Gizmot

New Member
Dec 20, 2020
2
0
1
40
In the case of a standalone simple node, wisdom would tell the minimum configuration is to use 2 boot drive ("RAID1"), 2 datastore drives ("RAID1"), 2 PSUs and 2 NICs.
The point of this redundancy is maximized runtime.

Now, if you switch to a 3+ nodes HA setup, you end up with at least everything 3x.

While it does not hurt, except for your wallet, is there a good argument for dual PSU, dual NICs and dual boot drive in a HA setup? From a reliability and financial point of view, it seems to make more sense to put your money on more nodes instead. Am I missing something?
 
There is a simple question you need to ask yourself: what impact are you willing to suffer (e.g. any hardware failure is a node failure)?
And maybe: what is your SLA?

If you run kubernetes clusters / containerized apps that might be a valid approach. Apps running in s don't always have this stateless approach so either you spend on hardware or you spend on logic on the app-layer (e.g. guest cluster) which often adds a lot of administrative overhead.
 
There is a certain appealing factor for hardware failure == node failure in a way that you simply take that node out to fix and replace with an other one.
I guess the deciding factor is the node price. When your nodes are expensive dual CPU Xeon with 200+TB ram each, I guess the redundant hardware pricing is irrelevant.
What I had in mind is inexpensive 1U servers for remote locations.
 
Again: what is your expectation / SLA and your requirement?
There is no definitive answer to that question.
I have customers with awesome stability on one nodes, because it is just stupid simple setup.
On the other hand patching is a pain, because you need to shutdown services. That brings me back to the SLA question.

And especially for robo sites (where you can't get easily or without travel-hassle) the approach of a node redundancy might not be favourable. Having a system which can withstand certain failures will relax your need to jump right into the car to fix the gear before a second node fails and potentially drives the cluster down.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!