Today we had an issue related to corosync i suppose.
The whole cluster disintegrated, nodes started rebooting, i even powered off all nodes and started them again - this seemed to fix the issue for the moment.
There are 4 nodes and sometimes they just stopped seeing each other. Sometimes they seen each other in pairs or only themselves.
I looked at "corosync-quorumtool -m -a" and the Ring ID sometimes jumped fast, it was in 400s and now is at 12000s.
It also displayed "Activity blocked" at Quorate.
This cluster is upgraded from 5.4 to 6. Corosync was updated according to the wiki prior to the cluster upgrade.
Attached corosync logs.
Now i had to stop pve-ha-lrm and pve-ha-crm to prevent random rebooting and effectively disabe HA in the process.
There were no modifications to the networking, it just started happening during operation.
The corosync conf file ends with this section (What does the version line mean? Corosync is at version 3.):
Is this a bug related to corosync 3?
The whole cluster disintegrated, nodes started rebooting, i even powered off all nodes and started them again - this seemed to fix the issue for the moment.
There are 4 nodes and sometimes they just stopped seeing each other. Sometimes they seen each other in pairs or only themselves.
I looked at "corosync-quorumtool -m -a" and the Ring ID sometimes jumped fast, it was in 400s and now is at 12000s.
It also displayed "Activity blocked" at Quorate.
This cluster is upgraded from 5.4 to 6. Corosync was updated according to the wiki prior to the cluster upgrade.
Attached corosync logs.
Now i had to stop pve-ha-lrm and pve-ha-crm to prevent random rebooting and effectively disabe HA in the process.
There were no modifications to the networking, it just started happening during operation.
The corosync conf file ends with this section (What does the version line mean? Corosync is at version 3.):
Code:
totem {
cluster_name: clustername
config_version: 18
interface {
bindnetaddr: 172.22.1.50
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
Is this a bug related to corosync 3?