[Solved] Why do i need 3 Nodes for an HA Cluster?

t.lamprecht · Oct 18, 2024

If your NAS has redundancy and the network between the NAS and the PVE cluster has redundancy and all that has power redundancy it can be totally fine to use that as HA setup in practice.

It's just that using that there are more components involved to provide the basic system, like the network between each cluster node and the storage. Avoiding that avoids failure potential, i.e., a switch that is not there cannot fail. If you want, or need, to use the extra component and thus the switch, or network cable, then you should at least ensure that the whole thing can still work if one switch or cable breaks.

This is mostly relevant for a green field project, as with some good design you can use that to reduce components, thus cost, while maybe even increasing the redundancy and reliability of your system. If you already got the NAS, it might well be the poorer (financial) decision to switch away from it.

To be able to give you more specific answer what would be best more confidently, one would need to know much more details. But, I think that's not a good fit for this thread, and it's also a lot of work to gather that info, interpret it and write the answer, so also not the best fit for this community forum (not saying you shouldn't try it, just don't be surprised if nobody replies on it).

In general, I'd recommend you to set up a test system and try out everything you want to hedge against and what seems realistic to happen in your environment.

Oh, and just a reminder: HA is to improve uptime (i.e., availability) on partial failure, but not a backup & recovery strategy, which one always should have, at least if one cares about the data (or your payroll depends on that).

Johannes S · Oct 18, 2024

Abhijit Roy said:
but if i used enterprise ready nas as external storage with redundancy then what is the technical issue i really do not understand

It's still a single point of failure, e.g. In case of issues with the NAS Hardware. If you can live with the risk, this might not be a problem but should be taken in consideration.

In the end you will have to evaluate the height of the cost of more hardware compared to the cost of a NAS failure and time for recovery.

As I said: There are businesses who do this, so they propably assumed the risk to be low enough that it's ok for them. Your needs might be the same or not, only you can decide this

Abhijit Roy · Oct 18, 2024

it is not like typical backup i was bothered that vm storage and should be in different hardware apart from servers node that's why

esi_y · Oct 18, 2024

Abhijit Roy said:
but if i used enterprise ready nas as external storage with redundancy then what is the technical issue i really do not understand

Technically, there's no issue. Your setup would be as reliable as its least reliable non-redundant component. But you must know this, so it may be just fine for you, after all.

Abhijit Roy · Oct 18, 2024

t.lamprecht said:
If your NAS has redundancy and the network between the NAS and the PVE cluster has redundancy and all that has power redundancy it can be totally fine to use that as HA setup in practice.

It's just that using that there are more components involved to provide the basic system, like the network between each cluster node and the storage. Avoiding that avoids failure potential, i.e., a switch that is not there cannot fail. If you want, or need, to use the extra component and thus the switch, or network cable, then you should at least ensure that the whole thing can still work if one switch or cable breaks.

This is mostly relevant for a green field project, as with some good design you can use that to reduce components, thus cost, while maybe even increasing the redundancy and reliability of your system. If you already got the NAS, it might well be the poorer (financial) decision to switch away from it.

To be able to give you more specific answer what would be best more confidently, one would need to know much more details. But, I think that's not a good fit for this thread, and it's also a lot of work to gather that info, interpret it and write the answer, so also not the best fit for this community forum (not saying you shouldn't try it, just don't be surprised if nobody replies on it).

In general, I'd recommend you to set up a test system and try out everything you want to hedge against and what seems realistic to happen in your environment.

Oh, and just a reminder: HA is to improve uptime (i.e., availability) on partial failure, but not a backup & recovery strategy, which one always should have, at least if one cares about the data (or your payroll depends on that).

then you pls suggest how do i implement ha with external storage in proper way

esi_y · Oct 18, 2024

Abhijit Roy said:
then you pls suggest how do i implement ha with external storage in proper way

You are asking how to assemble a car with 3 wheels since that is what your workshop has. You cannot, you can assemble a tri-cycle. It will be better in some ways than a car and worse in some others. The typical use case for Proxmox VE is to use CEPH. It would be your homework to research on that, it's entirely separate product, Proxmox just give you a nice frontend if you want to run it on the nodes itself.

alexskysilk · Oct 18, 2024

Abhijit Roy said:
then you pls suggest how do i implement ha with external storage in proper way

Not possible. No one can make the tradeoff decisions in designing storage for you.

Start here:
https://pve.proxmox.com/wiki/Storage#_storage_types

These are your options. then google what you're not familiar with/not understand for the different shared storage types. When you start having some ideas what you'd like to pursue, you can search the forums for people experiences for specific topologies (eg, relative performance with device type/count/connectivity topology, cost, gotchas, etc.)

waltar · Oct 20, 2024

I'm still wondering why a NAS is so many times defined as a single point of failure as a NAS appliance itself is defined as a HA system and a super combination to a pve cluster. Eg a netapp consists of 2-16 nodes, a isilon/PowerScale of 3-252 nodes and if you pull a cable (or doing os updates with reboot) you don't miss 1 ping while doing that, metro cluster combine that to different sites in sync or async modes too. The question is if one need that HA which is indeed more HA than a pve cluster can do (!!) without lost a ping which cannot be done to a pve vm on a died node as the vm could just be started on other node with service failure for that case which takes around 3min (due to internal pve logic) !!
As for small installations beginning with one nfs node with internal storage you could extent that to second one and even replicate the data or you start with one node with external storage, take a second node to same storage and can take over the filesystem to shutdown first node (while failing or doing os upgrade etc) and even that could be replicated to second nfs setup in other building.
NFS itself could scale to measured 40GByte/s from single node and so it's even not a performance blocker itself.
I like nfs really for pve as it allow very fast and smooth vm migrations and it's a very easy to handle solution in case of any hw failures occuring which isn't that case with ceph, see so many chats about occuring problems with.
Even with NAS you have snapshots and can go back to pre-vm/lxc version in seconds, not in the pve gui but possible in other way.

esi_y · Oct 20, 2024

waltar said:
I'm still wondering why a NAS is so many times defined as a single point of failure as a NAS appliance itself is defined as a HA system

Not "a" NAS, but "the" NAS in this thread, at least I understood it that way:

Abhijit Roy said:
that storage must have RAID with dual controller, you may suggest any alternate option in aspect of shared storage only

I apologise if I misunderstood, though.

waltar said:
The question is if one need that HA which is indeed more HA than a pve cluster can do

waltar said:
which takes around 3min (due to internal pve logic) !!

That's for another thread entirely, but it is a funny juxtaposition indeed.

waltar said:
As for small installations beginning with one nfs node with internal storage you could extent that to second one and even replicate the data or you start with one node with external storage, take a second node to same storage and can take over the filesystem to shutdown first node (while failing or doing os upgrade etc) and even that could be replicated to second nfs setup in other building.

Yes, I just thought all the replies in this thread, we were assuming that those 3 pieces of hardware effectively rely on one (albeit with dual controller).

waltar · Oct 20, 2024

Yes, but one could start so small with 1x nfs and go up to highest requirements until money is over. And don't to forget don't need to do all these block storage confusion like "where's my space used" (as every week new here) and just moving regular dir's+files from one small do even better storage solutions without any caveats to that migration and without any heavy change in a pve cluster as there's just one exchange in the datacenter storage part to do on same mountpoint and so absolut nothing related to the lxc and vm configs with a storage exchange - it's that easy.

esi_y · Oct 20, 2024

waltar said:
Yes, but one could start so small with 1x nfs and go up to highest requirements until money is over.

That's a valid point, but until that point it is as resilient as that one appliance.

waltar said:
And don't to forget don't need to do all these block storage confusion like "where's my space used" (as every week new here) and just moving regular dir's+files from one small do even better storage solutions without any caveats to that migration and without any heavy change in a pve cluster as there's just one exchange in the datacenter storage part to do on same mountpoint and so absolut nothing related to the lxc and vm configs with a storage exchange - it's that easy.

For the record, I think I said above if I was paid by Proxmox and wanted a happy customer, then CEPH 5+.

I do not want to confuse OP some more, but I do not see much point e.g. running CEPH on the nodes, it's really mostly made "convenient" in terms of GUI and limited hardware for as many users as possible. I like shared storage that is separate from compute nodes, but resilient. It can be CEPH, but even then I would prefer it managed entirely separately. It absolutely does not have to be CEPH, at all. (You may see I do not really have much to say to CEPH threads for a reason too.

) I like storage to be fast, not just resilient.

esi_y · Oct 20, 2024

BTW If external shared storage was a thing for Proxmox, maybe STONITH block device could be a thing for fencing and you would not need to wait for the said 2 minutes to recover a VM too...

waltar · Oct 20, 2024

Never heard about STONITH in the last 30years of computing which doesn't mean that it's bad but never stumbled it with any of our customers and so will say it's a niche product, maybe really good but perhaps never see it in action or in my life.

ubu · Oct 20, 2024

You can do Stonith with IPMI

esi_y · Oct 20, 2024

ubu said:
You can do Stonith with IPMI

But PVE cluster does not benefit, e.g. it does not go on to recover services quicker.

alexskysilk · Oct 20, 2024

waltar said:
Never heard about STONITH in the last 30years of computing which doesn't mean that it's bad but never stumbled it with any of our customers and so will say it's a niche product,

Stonith (the principal) is a requirement for any cluster of resources. its not a product, its a method to ensure survival of a known good provider. Thats like saying you've never heard of electricity and therefore its a niche product

--edit: stonith is effectively the only way to handle a two node cluster. Since pve requires a minimum of three, there are other options (eg, self fencing.)

esi_y · Oct 20, 2024

alexskysilk said:
Stonith (the principal) is a requirement for any cluster of resources. its not a product, its a method to ensure survival of a known good provider. Thats like saying you've never heard of electricity and therefore its a niche product

Well, when PVE refers to the said everywhere as "fencing" (strictly it is not shooting off "the other nodes", it's shooting itself

) then of course it's possible never to come across STONITH. I wonder who is familiar with extended virtual synchrony and yet most would have heard of corosync...

esi_y · Oct 20, 2024

And for what it's worth, this is how it is "discussed" on Proxmox forums:
https://forum.proxmox.com/threads/pacemaker-manage-proxmox-vm.38942/#post-523616

alexskysilk · Oct 20, 2024

esi_y said:
Well, when PVE refers to the said everywhere as "fencing" (strictly it is not shooting off "the other nodes", it's shooting itself) then of course it's possible never to come across STONITH.

That used to be an option a bunch of versions ago. For some reason the devs removed it.

esi_y · Oct 20, 2024

alexskysilk said:
For some reason

This is my issue with the missing discussions, there's often no record on why some decision was taken. Anyhow I used the term only because if you were going to search specifically for the blockdev based fencing, it would literally come (results first) as SBD.

[Solved] Why do i need 3 Nodes for an HA Cluster?

Proxmox Staff Member

Famous Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

We value your privacy