[Solved] Why do i need 3 Nodes for an HA Cluster?

If I had to guess, they had to offer soft watchdog to support the homelab crowd

I would bet my money on this one too.

, and since it worked "well enough" there was no point to support that AND multiple other options.

Yeah, that's a bit flawed rationale in my book.

BTW I now found the actual thread where I tried to ask about this:
https://forum.proxmox.com/threads/is-fence_pve-abandoned-by-proxmox.153448/#post-697384

So it's a mystery ... :)
 
open source, fix it ;)

That's what I heard here multiple times too, except it does not work that way. If I submitted such piece, they would not want it. This is abstracting from (not only mine) licensing issue of the contributed code, discussed here:
https://forum.proxmox.com/threads/educational-content.152530/page-2#post-713164

I mean, I just came here and ask this and that and that (think: Why does PVE does not do this or that) or state that I do not recommend ZFS for homelab user and there's almost like this weird feeling "I should not have even asked" ...
 
except it does not work that way.
why not?
If I submitted such piece, they would not want it.
fork it.
This is abstracting from (not only mine) licensing issue of the contributed code, discussed here:
https://forum.proxmox.com/threads/educational-content.152530/page-2#post-713164
I skimmed through that, but the short version is: what is the concern? if the devs change the license upstream, that doesnt invalidate your existing fork.
state that I do not recommend ZFS
you can and do state whatever you want. not sure what the connection is. just because you have opinions doesnt obligate the devs or anyone else. if this was sufficiently important to you (to anyone) you're free to fork.
 
  • Like
Reactions: Johannes S

I heard this multiple times on this forum as a discussion-ending remark. Forking something meaningfully means also rolling it on providing support for it. Also, you are basically suggesting I would then become competition to the original. The freedom is great, I would typically use it for modifications for own needs. I do not have the ambition to be in the business of a hypervisor, though.

I skimmed through that, but the short version is: what is the concern? if the devs change the license upstream, that doesnt invalidate your existing fork.

That's correct, but see above and also - the issue remains, I provided something of value for free under that condition that it stays (including derived works GPL), at least that's how I provide value for free.

you can and do state whatever you want. not sure what the connection is.

The connection basically is that Proxmox dropped e.g. mdadm support (and users who cannot grasp let alone troubleshoot ZFS suffer with the limited choice), but does not want to discuss it. Just the no-discussion principle.

just because you have opinions doesnt obligate the devs or anyone else. if this was sufficiently important to you (to anyone) you're free to fork.

Just in case, if you meant by "forking" run my own patches, that is something I can do. Who says I am not? :) BTW Proxmox does not make it exactly easy to keep it easy to be rolling such patches on because they do not adhere to the principles to make it easy, I do not know if on purpose or not, but it does not help.
 
Forking something meaningfully means also rolling it on providing support for it. Also, you are basically suggesting I would then become competition to the original.
AHHH here it is.

Yes, that is exactly what what it means. you can bellyache all you like, but at the end of the day the existing provider has no responsibility to care about your wants/needs. THEY provide this support, and consequently only their priorities matter. You have two choices, both involving investment- do it yourself or pay someone to. complaining is a third I suppose, but I dont think it does you any good.

When you provide feedback it is valuable and useful, even if it isnt taken into action; there could be a bunch of reasons for that. When you repeat your feedback, because the rebuttal (or lack thereof) isnt to your satisfaction, you graduate to complaining. No one likes that.
 
  • Like
Reactions: t.lamprecht
AHHH here it is.

Yes, that is exactly what what it means. you can bellyache all you like, but at the end of the day the existing provider has no responsibility to care about your wants/needs. THEY provide this support, and consequently only their priorities matter.

Yes and in the case above (fencing method) they decided to prioritise homelab use. Noted.

You have two choices, both involving investment- do it yourself or pay someone to. complaining is a third I suppose, but I dont think it does you any good.

In this case, I am sure you are aware the easiest to choice is to go for another solution that supports e.g. SBD. If bringing issue up means "complaining" that's basically discussion-ending event.

When you provide feedback it is valuable and useful, even if it isnt taken into action; there could be a bunch of reasons for that. When you repeat your feedback, because the rebuttal (or lack thereof) isnt to your satisfaction, you graduate to complaining. No one likes that.

I get that, I know I am probably the most unlikable BZ reporter, etc. But consider that users, not just me, suffer for this - lack of providing proper rationale. Do they have to? No. Would it be better for everyone? Yes. Note that I do accept discussions ended when reason such as "this is not good ROI for us" is provided.

On a separate note, the license is not just my issue as in, if it was guaranteed the derived works stay GPL, then of course one is even less motivated to fork anything.
 
which is indeed more HA than a pve cluster can do (!!) without lost a ping which cannot be done to a pve vm on a died node as the vm could just be started on other node with service failure for that case which takes around 3min (due to internal pve logic) !!
This is apples to oranges, you compare a very specific application level HA mechanism for just storage with a general HA mechanism that can handle arbitrary VMs/CTs, and as such is made as a fallback to hedge against unknown unknowns.
It's like I'd state that your NAS can do less HA because it cannot recover VMs and CTs, makes not much sense, as its targeting something different. Application level HA will always be more powerful and also cheaper, because it can take a lot of assumptions, has a more fine-grained view of things (no KVM/OS/user land layers in between) and shortcuts that are only valid for some applications under specific circumstances.
 
That used to be an option a bunch of versions ago. For some reason the devs removed it.
See:
https://git.proxmox.com/?p=pve-ha-m...b=800a0c3e485f175d914fb7b59dfcd0cd375998de#l9

We did not remove it, as above link describes: rgmanager, the one that had STONITH was going EOL, so we developed our own HA stack with lessons learned from the previous one, the biggest being that more complexity is certainly not what almost all user need and also what most user would not be able to handle, so pacemaker wasn't an option. Watchdog based self-fencing was one of the simplifications, but here the brittleness of HA fence devices and their APIs, experienced first hand with support time we had to spend on that, was also a big reason to not favor STONITH. But we also looked into the latter, there are even some patches on the list implementing it as opt-in option, but demand was always low and so development was not pushed further.
While it would allow recovering faster, it also increases complexity of the code base and the gain will always be way worse than any application level HA can be. That said, would be still OK to add that if there's user demand.
 
This is apples to oranges

I knew this reply would come here, I know it's not to me, I also wrote it's for separate thread, but he has a point in that the particular fencing mechanism of PVE does not allow for faster fencing even he might have highy available storage.

to hedge against unknown unknowns.

That's very valid point, but if e.g. I STONITH networking from the offended node, I do not need to know what happened there, I can just recover instantly elsewhere.

So you read and remember _every_ discussion about the HA manager?

Of course not, but even staff do not (I do not expect everyone to know everything), but I do remember what you wrote to me yourself in relation to the related topic:
https://forum.proxmox.com/threads/4-node-clusters-with-qdevices.136756/#post-606973

Which is fine, and in line with the quoted now:

Maybe save some time then on re-reading and just look in the README in the official git repo:
https://git.proxmox.com/?p=pve-ha-m...b=800a0c3e485f175d914fb7b59dfcd0cd375998de#l9

But none of the two gives rationale for self-fencing which I believe is effectively ~60+60secs failover.

Do note I am not arguing that it should be different or that softdog is problematic (it's actually very reliable), but it also means the grace period needs to be relatively high. There are other fencing options which could recover quicker, I know some people don't care, some others do. I only asked why is the fencing e.g. not more of "plugin" type solution.

Again, I am not arguing, this is how I talk when I go fast.
 
the biggest being that more complexity is certainly not what almost all user need and also what most user would not be able to handle, so pacemaker wasn't an option.

This is what I was after, thank you.

there are even some patches on the list implementing it as opt-in option, but demand was always low and so development was not pushed further.

Fair enough, thank you for exhausting answers. Really appreciate that.
 
but he has a point in that the particular fencing mechanism of PVE does not allow for faster fencing even he might have highy available storage.
Maybe, but as from the other reply that you probably read just now: it's always a balance complexity and ROI, and while going from 120s at best to 60s at best is naturally something, it still far from "do not notice, ping just keeps coming back" and it also needs fence devices setup, so not exactly free for the user either – albeit there are some IPMI/iKVM ones, not sure how well they are supported nowadays; I did not follow STONITH and fence-agents development all to closely since a few years.

Anyhow, for a hedge against unknown unknowns, like some odd HW hiccup causing a node to hang up overnight, 60s vs. 120s is not _that_ big of a difference compared to the hours of downtime until the one managing that HW can be reached and check this out (not every site has dedicated admins, most setups are done on consultancy). If 120s is too long, it's quite likely that 60s is also too long, and going into the sub 10s territory that is required for most users to not notice a hiccup is not really feasible with recovery based HA, even if we could detect outages in <1s while ruling out all false-positives (which is basically impossible, but let's just assume it can be done), then recovery + boot of the VM OS will still require tens of seconds, so too long again.
OTOH, targeted application level HA, like some Postgres replication or HA proxy in front of multiple VMs providing the same service, or software defined storage like Ceph, can provide this sort of HA that feels borderline magic.

not arguing that [...] softdog is problematic (it's actually very reliable),
Not only that, from my experience it's more reliable than basically most external HW watchdog, albeit that seems self-inflicted due to poor drivers.

Edit: added some slightly more context
 
Last edited:
  • Like
Reactions: esi_y
If 120s is too long, it's quite likely that 60s is also too long, and going into the sub 10s territory that is required for most users to not notice a hiccup is not really feasible with recovery based HA, even if we could detect outages in <1s while ruling out all false-positives (which is basically impossible, but let's just assume it can be done), then recovery + boot of the VM OS will still require tens of seconds, so too long again.

It will never be like the storage example, of course, but it does not have to be 60 + bootup, it could probably be 10. But I understand that needs different stack.

OTOH, targeted application level HA, like some Postgres replication or HA proxy in front of multiple VMs providing the same service, or software defined storage like Ceph, can provide this sort of HA that feels borderline magic.

I actually agree on this as for e.g. PostreSQL you would want k8s and operator like CloudNativePG. This is not competition to PVE, it's a different solution and architecture entirely.

Not only that, from my experience it's more reliable than basically most external HW watchdog, albeit that seems self-inflicted due to poor drivers.

Edit: added some slightly more context

I know it works well, I was literally nitpicking on the 120s window and single method of fencing. Anyhow, glad we got the answer from the source.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!