I broke my quorum, mis-configured corosync.conf

Veikko

Active Member
Dec 4, 2017
17
0
41
Finland
Hi!
I am upgrading from corosync 2 to corosync 3. I used the script to check that my settings are OK. I have a 2-ring corosync setup, and 2 of my older nodes were having the corosync ring addresses in the host file, and in the corosync.conf the host name was in use. All had worked ok, but the script advised to change an IP address to the corosync.conf.

That's when I did an error. When editing the corosync.conf, I lifted the version number from 2 to 3. I did it to the WRONG PLACE, so not to the "config version=" which would be the correct place, but all the way to the end to

...
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 3

I saved the config, ran the script again an all was green, so I proceeded. When I finally updated the packages, and the corosync service restarted, It hungs up and does not start. Also, because I have no quorum, I cannot fix the configuration file.

I tried "pvecm expected 1" but the system says "Cannot initialize CMAP service". What would be the correct procedure to fix the config?

Thanks,
-Veikko
 
This is from the manual:

"This is not enough if corosync cannot start anymore. Here it is best to edit the local copy of the corosync configuration in /etc/corosync/corosync.conf so that corosync can start again. Ensure that on all nodes this configuration has the same content to avoid split brains. If you are not sure what went wrong it’s best to ask the Proxmox Community to help you."

My case is exactly like that. But, because corosync is not running hence the error in the config, changing the settings to the local folder does not help, because the local copy never get's propagated to the cluster file. So, I find this hint a bit misleading or incomplete.
 
And now I answer my own question. Just remembered how it's done:

systemctl stop pve-cluster
/usr/bin/pmxcfs -l
[main] notice: forcing local mode (althought corosync.conf exists)

Then fix the configuration to the cluster file, and the cluster heals.

I have done it a few times when troubleshooting, but it was a long time ago and I was not able to easily find this from the corosync manual. Maybe adding this there? Or if you have decided that it's too powerful command and can brake things, I understand. Anyway, I'm happy that this got solved. Aiming to upgrade to 6 soon!

Cheers,
-Veikko
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!