Fencing and cluster state logging

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
We had a node fence itself as the IPMI reset counter reached zero (Motherboard logs) but we can't locate logging information leading up to this event.

Surely corosync or pvecm logs events such as losing quorum, occasional heartbeat messages being lost or discarded?

We previously appear to have identified an issue with a node being fenced after systemd-timesync jumped time due to a problem with one or more members of pool.ntp.org. We now subsequently run ntp which drifts to correct time instead of jumping and discards answers from ntp servers which are not in quorum with the majority...
 
Surely corosync or pvecm logs events such as losing quorum, occasional heartbeat messages being lost or discarded?
They are logged in the syslog, but the last messages before a reset might not be written to the log anymore.

We previously appear to have identified an issue with a node being fenced after systemd-timesync jumped time due to a problem with one or more members of pool.ntp.org. We now subsequently run ntp which drifts to correct time instead of jumping and discards answers from ntp servers which are not in quorum with the majority...
For a stable time, best use a local ntp server for the cluster, then the time is only synced from one source, lose to the cluster.