corosync out of sync

richieman

Member
Apr 16, 2021
13
0
6
54
Hello. I had a problem with a node and it was turned off. While it was off I added a new node to the cluster to replace it. Now the original node is fixed and I turned it on again but now corosync is out of sync because a new node was added while it was off. I got important VM's in there. How can I get it back in sync?
Thanks for any help!
Richard

journalctl -u corosync.service:


Code:
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Oct 21 12:35:30 ripr corosync[8417]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Oct 21 12:35:30 ripr corosync[8417]:   [QUORUM] Sync members[6]: 1 2 3 4 5 6
Oct 21 12:35:30 ripr corosync[8417]:   [QUORUM] Sync joined[5]: 1 3 4 5 6
Oct 21 12:35:30 ripr corosync[8417]:   [TOTEM ] A new membership (1.61921) was formed. Members joined: 1 3 4 5 6
Oct 21 12:35:30 ripr corosync[8417]:   [CMAP  ] Received config version (44) is different than my config version (43)! Exiting
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Unloading all Corosync service engines.
Oct 21 12:35:30 ripr corosync[8417]:   [QB    ] withdrawing server sockets
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Oct 21 12:35:30 ripr corosync[8417]:   [QB    ] withdrawing server sockets
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync configuration map access
Oct 21 12:35:30 ripr corosync[8417]:   [QB    ] withdrawing server sockets
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync configuration service
Oct 21 12:35:30 ripr corosync[8417]:   [QB    ] withdrawing server sockets
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Oct 21 12:35:30 ripr corosync[8417]:   [QB    ] withdrawing server sockets
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync profile loading service
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync resource monitoring service
Oct 21 12:35:30 ripr corosync[8417]:   [SERV  ] Service engine unloaded: corosync watchdog service
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 6 joined
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Oct 21 12:35:31 ripr corosync[8417]:   [KNET  ] link: Resetting MTU for link 0 because host 4 joined
Oct 21 12:35:31 ripr corosync[8417]:   [MAIN  ] Corosync Cluster Engine exiting normally
 
Could you please send us the output of

Code:
corosync-cfgtool -n

from the node with issues and from one without issues?

Please also send us the contents of the corosync config files at

Code:
/etc/pve/corosync.conf
/etc/corosync/corosync.conf

from both nodes.
 
In the mean time I seem to have resolved the issue. Here is what I did. First I copied /etc/corosync/corosync.conf from a working machine. corosync still would not start due to the same error.

I figured I had to edit /etc/pve/corosync.conf but it was read-only. So I tried:
Bash:
pvecm expected 1
But it gave an error: Cannot initialize CMAP service

Then I restarted "pve-cluster" and to my surprise it was working again and /etc/pve/corosync.conf is back in sync. After that I had to fix a lot of other unrelated issues on the node but it seems to work now.
 
Ok, good to hear, but make sure to remove the `pvecm expected 1` , this is a sure way to run into a out-of-sync state. Make sure your corosync files match in a byte-per-byte fashion on all hosts, you can use `sha256sum` to verify they match.

You can use the `corosync-cfgtool -n` to see if all the connections are OK (they should report both 'enabled' and 'connected'). You can also use `pvecm status` to see the current corosync status.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!