I've upgraded corosync in my cluster in preparation of the upgrade to Proxmox 6.x. (followed https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0)
I did have some difficulties ... but it worked.
After upgrading the first node I got:
Then it started logging this very frequently:
After that I also found an earlier log:
After some research I found "transport: udpu" in corosync.conf which needed to be changed to "transport: knet". Thanks to https://forum.proxmox.com/threads/unicast.56141/
Maybe a nice extra check to add in the pve5to6 tool?
There were multicast issues some years ago when setting up the cluster, probably the reason why udpu was set. It seems to work now ...
It's a bit noisy in the log though. The line below is repeated every minute. ... is that normal?
The pve5to6 tool also suggested to update the ring0_addr of a few nodes to an IP(it sitill had a hostname).
Can we add a WARNING in the file itself that the config_version number needs to be incremented?
I'm not sure is this contributed to the trouble I had here but I came across https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_edit_corosync_conf while investigating, and manually updated it while also changing the transport value.
I did have some difficulties ... but it worked.
After upgrading the first node I got:
Code:
# pvecm status
Cannot initialize CMAP service
Then it started logging this very frequently:
Code:
Mar 10 22:19:12 host pmxcfs[13939]: [quorum] crit: quorum_initialize failed: 2
Mar 10 22:19:12 host pmxcfs[13939]: [confdb] crit: cmap_initialize failed: 2
Mar 10 22:19:12 host pmxcfs[13939]: [dcdb] crit: cpg_initialize failed: 2
Mar 10 22:19:12 host pmxcfs[13939]: [status] crit: cpg_initialize failed: 2
After that I also found an earlier log:
Code:
Mar 10 22:06:26 host corosync[13992]: [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Mar 10 22:06:26 host corosync[13992]: [MAIN ] Please migrate config file to nodelist.
Mar 10 22:06:26 host corosync[13992]: [MAIN ] parse error in config: crypto_cipher & crypto_hash are only valid for the Knet transport.
Mar 10 22:06:26 host corosync[13992]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1386.
After some research I found "transport: udpu" in corosync.conf which needed to be changed to "transport: knet". Thanks to https://forum.proxmox.com/threads/unicast.56141/
Maybe a nice extra check to add in the pve5to6 tool?
There were multicast issues some years ago when setting up the cluster, probably the reason why udpu was set. It seems to work now ...
It's a bit noisy in the log though. The line below is repeated every minute. ... is that normal?
Code:
Mar 10 22:42:09 hostname corosync[24995]: [KNET ] pmtud: Starting PMTUD for host: 1 link: 0
Mar 10 22:42:09 hostname corosync[24995]: [KNET ] udp: detected kernel MTU: 1500
Mar 10 22:42:09 hostname corosync[24995]: [KNET ] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 1397
The pve5to6 tool also suggested to update the ring0_addr of a few nodes to an IP(it sitill had a hostname).
Can we add a WARNING in the file itself that the config_version number needs to be incremented?
I'm not sure is this contributed to the trouble I had here but I came across https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_edit_corosync_conf while investigating, and manually updated it while also changing the transport value.