[SOLVED] Split brain resolution after running one cluster member as standalone

Matache_Macelaru

New Member
Dec 10, 2022
4
1
3
I have a rather strange question, and a nice use case. Cluster of 3 nodes on PVE 6.4-13, being moved offsite but not together.

CONTEXT:
Because of reasons, they cannot move all three nodes at once, so instead they will move first only one (A), then the other two(B&C).​
Adding on top of this mess, the first node that will be moved has to run some VMs, and the other two will continue to run in the old place a few more days.​
Lucy for me - there is no shared storage involved, coz that would be another mess o_O

I know that for A to work in the new location, I need pvecm e 1 to allow it to start, this is OK and working as expected in the new location.

Now finally let's get to the question: how to proceed when B&C arrive in the new location ?

I don't know that PVE will be able to merge the two sets of diverging data ( A + B&C ), but I really don't care about C's data, so I'm thinking to help the cluster by giving it something it knows to handle, by inverting the problem instead of 2 nodes out of sync TO just one node out of sync ;)

In the new location:
- Power Off A
- Power On B&C
- Check B&C are in sync
- Then Power On A

Q1. Will this trick work ?
(I am expecting that anything done on A in the meantime will be reverted, and that's ok)​
Q1.1. How is the "latest" version of the cluster being determined - quorum/consensus of data OR timestamp ?

The alternate solution I am considering is:
- Isolate A's network​
- Power On B&C​
- Check B&C are in sync​
- With A's network still isolated - use the manual removing a cluster node info and remove node A​
- As a good measure, reboot A​
- reconnect A's network​
- Follow normal steps for joining A to the cluster​
Q2. Can I rejoin the same cluster after using this method ?
The manual isn't clear on this, for "Separate a Node Without Reinstalling" section, just the main one states that should not rejoin the same cluster.​

There is of course the third option, reinstall, clearly supported in the manual - however I wish to avoid this if possible.

I am not afraid to get my hands dirty in cli, would very much like to learn and tinker than do a clean install.

After writing 90% of this, it hit me that I could spin up 3 VMs and simulate this scenario :mad:
 
Following up on my self with VMs:

Q1. Yes, it will work, latest version is determined by quorum/consensus

Q1 Details
pvecm e 1 is only temporary, upon reboot the node is waiting for quorum/number of members in initial cluster​
Upon merging, any changes done while in standalone are reverted​
Q2. Have not tested, as my goal has been reached



I'm leaving this here for anyone else with similar predicament.
 
Last edited:
  • Like
Reactions: datdenkikniet

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!