Update:
I see some retransmits sometimes daily 1-2-3 on one or two nodes on others nothing. Sometimes it is happing daily and sometimes no corosync logs for 5 months.
I actually have another cluster on that subnet (different cluster name)...
Hi all. I found this works well to completely remove ceph and config
systemctl stop ceph-mon.target
systemctl stop ceph-mgr.target
systemctl stop ceph-mds.target
systemctl stop ceph-osd.target
rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon...
I get the misalignment of the time links being down but the time frames before node fenced and after node joined are telling. Since February when we had ecc ram issue I do not see any corosync entries for "link down" and since node joined...
We had a node 5 in a 6 node cluster fenced due to excessive ram ecc errors. HA worked great and all vms started on other nodes. The cluster worked with no corosync issues for last year since it was put it in production (we had ecc errors in...