split brain recovery

adambialy

New Member
Jan 24, 2019
3
0
1
37
Hi all
I got problem I think with split brain. Can't find too much about that, and how to recover from this on Internet so asking for help you guys.
What has happen is I tried to remove one node (wrongly named) and after that (I think after following: https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node, pvecm expected 1)
I had weird situation:
split.png
on first screen I'm logged into pve-b16-c2 on second to pve-b16-c1 (c1/2 corresponding dell chasis)
so from c1 I can see all c1 servers from c2 I can see all c2 servers.
Could anybody push me to right direction how to recover from it please?
Any advice much appreciated.
Thanks
 
Seems the clusternetwork is flakey/got botched - you need to make sure that all nodes (in both chassis) can see each other via corosync
check the outputs of
* `pvecm status`
* `pvecm members`
* `journalctl -r` (general logs)
* `journalctl -r -u corosync.service` - corosync logs
* `journalctl -r -u pve-cluster.service` - pmxcfs logs

with the pvecm status (and deep down with `/etc/corosync/corosync.conf`) you see which nodes' ips are used and where the connection might have been lost.

Keep in mind that corosync usually runs over multicast - maybe the switch between both chassis has currently a problem?

Hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!