Splitbrain 1 Node after Hardware Error

Hello all,

We had a 3 node proxmox and ceph HCI Cluster that worked fine. After a hardware Error that we could not really pinpoint happend on node 2 did we shut that one down take home and repair. It was a few times turned on without network to check the hardware issue.
In the meantime 3 additional nodes have been added to the cluster where 1 Node was offline. Kind of stupid on our part, I known, but it happend now...:(

Now we put the node that was previously defect back into the cluster.
All network links work and all corosync networks are pingable and full working.
The ceph cluster could rebuild and is 100% green but the proxmox Cluster is in splitbrain.
I have on the one side a cluster that shows 6 nodes with 5 online and on the other side a 3 node cluster with only 1 online...
To change the /etc/pve/corosync.conf is not possible on the broken one as it is only read only. Stopping the corosync service did not help as well and rebuilds did also not help.

Do I have any chance to get it back into the cluster?
Would really appreciate help:)
 
To change the /etc/pve/corosync.conf is not possible on the broken one as it is only read only.
Yes, that's the culprit.

I've been there once. The lesson I learned was to only modify the structure of a cluster when all nodes are online :-)

The workaround is to make corosync.conf editable. As that node has not quorum you need to mount the configuration database locally and edit the file to be exactly identical with the intact part of the cluster. I would try to do:

systemctl stop pve-cluster
systemctl stop corosync

Start the cluster file system again in local mode:

pmxcfs -l

Edit the corosync configuration file:

vim /etc/pve/corosync.conf
# OR: vim /etc/corosync/corosync.conf ???

You can now start the file system again as a normal service:

killall pmxcfs
reboot # and hope for the best!

(Basically copied from https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_separate_node_without_reinstall - with a different background!)

Note that I ignored Ceph...
 
Last edited:
Am I able to just add it back?
The short answer is yes. the longer answer is you need to take into consideration what ceph daemons are running on the node and account for them in the interim.

moving all but OSDs are trivial- just create new ones on other nodes and delete the ones on the "broken" one. OSDs add a wrinkle; its possible to reimport them but thats a tricky proposition; depending on how large your payload is (data on disk) and how fast the disks are, the simplest solution would be to delete and recreate the OSDs once rejoined to the cluster.

Having said all that, if you do have substantial data (and/or slow OSDs) fixing the cluster may be the better way. @UdoB gave you some of the steps, but I would also check network connectivity anyway- especially if you have bonds/multiple switches. an active/passive bond with the active component not being able to reach the other nodes on your corosync network can cause the very symptom you describe. regardless of your specific issue, best practices are to have two corosync rings each on their own interface.
 
But I would also check network connectivity anyway- especially if you have bonds/multiple switches. an active/passive bond with the active component not being able to reach the other nodes on your corosync network can cause the very symptom you describe. regardless of your specific issue, best practices are to have two corosync rings each on their own interfacWill try @UdoB steps tomorrow.
already checked all the network connectivity and could not find and issues with that.
I also already have two corosync rings on different network cards that also don't seem to have an issue in general.
My guess is, that the biggest issue was adding additional nodes without it being online so the discrepancy between the old configuration and the new configuration is to big.
Will try @UdoB suggestion tomorrow and get back to you