[SOLVED] Broke my cluster

radar

Member
May 11, 2021
30
2
13
49
Hi,
I have a 3-nodes cluster that was configured as follows:
Code:
node1 with ip 192.168.1.120
node2 with ip 192.168.1.121
node3 with ip 192.168.1.122

So I wanted to change and make it coherent (node1 with ip 192.168.1.121, node2 with ip 192.168.1.122, and node3 with ip 192.168.1.123).
I started doing it on node3, so I edited `/etc/network/interfaces` and `/etc/hosts` and forgot to edit `/etc/pve/corosync.conf`.

Of course, after rebooting this node, it was no more part of the cluster and I was no more able to edit `/etc/pve/corosync.conf` on this latter.
I made correct changes to other nodes and they were working okay, but node3 was still not in the cluster. At this time, `config_version` was set to 4 in `/etc/pve/corosync.conf` in nodes 1 and 2, and still at 3 in node3.

So I decided to revert back to original configuration and downgraded the `/etc/pve/corosync.conf` back to 3 in node1, which has reflected it in node2. Reverted back the IP addresses of all the nodes and rebooted them.
But now, each node is isolated from the others and my cluster is not working anymore. I'm no more able to start any service on the cluster because of lack of quorum.

Edit: to start services, I have lowered the required quorum to 1.

Any help here for fixing my cluster?
Thanks a lot.
 
Last edited:
Solved the issue by editing the corosync.conf file on all nodes. I was able to do that thanks to:
Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
This allowed me to edit the corosync.conf file.
Then, I did
Code:
killall pmxcfs
systemctl start pve-cluster
systemctl start corosync
One of the nodes needed to be rebooted but everything is okay now.