can someone better explain 2 node cluster failure scenarios ?

RolandK

Renowned Member
Mar 5, 2019
998
214
88
52
hello,

i come across many postings of people who seem to run their proxmox "cluster" in a "non HA" dual node setup without quorum, for easier manageability of 2 nodes.

i'm trying to explain that this is the wrong way to proceed, but after a lot of search , i did not really find satisfying and good explanations what's exactly bad on this an what can exactly go wrong , so that you may screw your nodes.

imho, at least with "inter node network failure" we have split brain between 2 instaces of pmxcfs.

i wonder how that behaves on reconnect of the network or how they can get out of sync so they won't "re-join". and i have to admit, that i'm not deep enough into the clustering details, so that i cannot answer this on my own.

i'd be happy if somebody could better describe the failure scenarios and consequences (or point me to some description) .

it's always good to learn about the details , and with that to be able to better convince newbies NOT to do that - or at least make them more sensible on what can go wrong and what needs to be done to fix it.

it's that "it works for me - so what?" which i dislike so much....

thank you!
 
Last edited:
Hello Roland,

as far as I know (never tried a 2-node cluster) you can not do the following things when theres no quorum:

- cant change vm state (cant start, cant stop etc.)
- cant login to web-ui since a few pve-versions
- if you edit corosync.conf while second node was offline you get split-brain (this can be fixed though)
- if you loose corosync-connection while both nodes are still running, you get split-brain (not sure if fix is same from the last bullet-point)

What what your research on that bullet-points? As far as a I know Proxmox is working on a possibility to manage multiple systems over a single ui - but this will take some time.

Greetings Jonas