[SOLVED] Split brain resolution after running one cluster member as standalone

Matache_Macelaru

New Member
Dec 10, 2022
4
1
3
I have a rather strange question, and a nice use case. Cluster of 3 nodes on PVE 6.4-13, being moved offsite but not together.

CONTEXT:
Because of reasons, they cannot move all three nodes at once, so instead they will move first only one (A), then the other two(B&C).​
Adding on top of this mess, the first node that will be moved has to run some VMs, and the other two will continue to run in the old place a few more days.​
Lucy for me - there is no shared storage involved, coz that would be another mess o_O

I know that for A to work in the new location, I need pvecm e 1 to allow it to start, this is OK and working as expected in the new location.

Now finally let's get to the question: how to proceed when B&C arrive in the new location ?

I don't know that PVE will be able to merge the two sets of diverging data ( A + B&C ), but I really don't care about C's data, so I'm thinking to help the cluster by giving it something it knows to handle, by inverting the problem instead of 2 nodes out of sync TO just one node out of sync ;)

In the new location:
- Power Off A
- Power On B&C
- Check B&C are in sync
- Then Power On A

Q1. Will this trick work ?
(I am expecting that anything done on A in the meantime will be reverted, and that's ok)​
Q1.1. How is the "latest" version of the cluster being determined - quorum/consensus of data OR timestamp ?

The alternate solution I am considering is:
- Isolate A's network​
- Power On B&C​
- Check B&C are in sync​
- With A's network still isolated - use the manual removing a cluster node info and remove node A​
- As a good measure, reboot A​
- reconnect A's network​
- Follow normal steps for joining A to the cluster​
Q2. Can I rejoin the same cluster after using this method ?
The manual isn't clear on this, for "Separate a Node Without Reinstalling" section, just the main one states that should not rejoin the same cluster.​

There is of course the third option, reinstall, clearly supported in the manual - however I wish to avoid this if possible.

I am not afraid to get my hands dirty in cli, would very much like to learn and tinker than do a clean install.

After writing 90% of this, it hit me that I could spin up 3 VMs and simulate this scenario :mad:
 
Following up on my self with VMs:

Q1. Yes, it will work, latest version is determined by quorum/consensus

Q1 Details
pvecm e 1 is only temporary, upon reboot the node is waiting for quorum/number of members in initial cluster​
Upon merging, any changes done while in standalone are reverted​
Q2. Have not tested, as my goal has been reached



I'm leaving this here for anyone else with similar predicament.
 
Last edited:
  • Like
Reactions: datdenkikniet
Oops I just split brained my two cluster node by mistake. Node2 is about to get shutdown but node1 seems to have shut itself down. I don't mind re-installing proxmox but crikey, that was a surprise. Luckily the VM that I care about has not migrated (its on the off machine), although I do see one VM that is about to get deleted (on the running node2) I assume, no idea what will happen when I power up node1, luckily for me it is no signifigance. Now I see why they say minimum of 3 physical good to learn this hard before I got too deep. I wish I had a way to verify if I really do need to re-install my "cluster-node2". node2 is about to get powered down and node1 powered up, the cluster destroyed and node2 reinstalled. O am I best to to have them both powered up for any part of this rest? Probably I did not give you enough detail to answer.