Hello.
We continue to have the issue of cluster nodes red color at Datacenter section of pve web interface.
Since the long thread from December:
We do not use any shared storage besides /etc/pve .
I eliminated all NFS.
NTP/ systemd-timesyncd has been configured to use our pfsense hardware to get time updates
There are no Backup or heavy network traffic leading up to the red issue. I set all backups to occur on Saturday.
I set up a central rsyslog server for the cluster nodes. This is useful to check what leads up to the issue. SO far I have not found the reason for cluster issue.
From the logs the issue is always preceded with lines like this:
shortly after:
I am not sure of the exact second that nodes go red .
I've set up rsyslog to send an email alert every time there is a 'processor failed' line. Lately this has been occurring once a day .
From cli pvecm commands show the cluster OK.
After yesterdays issue I updated all 4 noted to latest pve testing software. That did not fix the issue.
To make the red nodes turn green I do this at every node:
I am not sure what to do next. Any suggestions?
We continue to have the issue of cluster nodes red color at Datacenter section of pve web interface.
Since the long thread from December:
We do not use any shared storage besides /etc/pve .
I eliminated all NFS.
NTP/ systemd-timesyncd has been configured to use our pfsense hardware to get time updates
There are no Backup or heavy network traffic leading up to the red issue. I set all backups to occur on Saturday.
I set up a central rsyslog server for the cluster nodes. This is useful to check what leads up to the issue. SO far I have not found the reason for cluster issue.
From the logs the issue is always preceded with lines like this:
Code:
Jan 29 05:14:47 sys3 corosync[7088]: [TOTEM ] A processor failed, forming new configuration.
..
Jan 29 05:14:47 dell1 corosync[7575]: [TOTEM ] A processor failed, forming new configuration.
..
Jan 29 05:14:47 sys4 corosync[8688]: [TOTEM ] A processor failed, forming new configuration.
..
Jan 29 05:14:48 sys5 corosync[5069]: [TOTEM ] A processor failed, forming new configuration.
..
shortly after:
Code:
Jan 29 05:14:48 sys5 corosync[5069]: [MAIN ] Corosync main process was not scheduled for 3947.5273 ms (threshold is 1840.0000 ms). Consider token timeout increase.
Jan 29 05:14:48 sys4 corosync[8688]: [TOTEM ] A new membership (10.1.10.21:15808) was formed. Members
Jan 29 05:14:48 sys3 corosync[7088]: [TOTEM ] A new membership (10.1.10.21:15808) was formed. Members
Jan 29 05:14:48 dell1 corosync[7575]: [TOTEM ] A new membership (10.1.10.21:15808) was formed. Members
Jan 29 05:14:48 sys4 corosync[8688]: [QUORUM] Members[4]: 4 2 1 3
Jan 29 05:14:48 sys3 corosync[7088]: [QUORUM] Members[4]: 4 2 1 3
Jan 29 05:14:48 dell1 corosync[7575]: [QUORUM] Members[4]: 4 2 1 3
Jan 29 05:14:48 sys4 corosync[8688]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 29 05:14:48 sys3 corosync[7088]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 29 05:14:48 dell1 corosync[7575]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 29 05:14:48 sys5 corosync[5069]: [TOTEM ] A processor failed, forming new configuration.
Jan 29 05:14:48 sys5 corosync[5069]: [TOTEM ] A new membership (10.1.10.21:15808) was formed. Members
Jan 29 05:14:48 sys5 corosync[5069]: [QUORUM] Members[4]: 4 2 1 3
Jan 29 05:14:48 sys5 corosync[5069]: [MAIN ] Completed service synchronization, ready to provide service.
I've set up rsyslog to send an email alert every time there is a 'processor failed' line. Lately this has been occurring once a day .
From cli pvecm commands show the cluster OK.
After yesterdays issue I updated all 4 noted to latest pve testing software. That did not fix the issue.
To make the red nodes turn green I do this at every node:
Code:
systemctl restart pve-cluster
I am not sure what to do next. Any suggestions?