I have a (possibly unrelated) problem where one cluster member's TOTEM keep generateing retransmits from other nodes or local timeouts. When I checked nodeC kept joining and leaving the cluster, about 50 per seconds. Restarting corosync resulted the same.
It turned out however that on other nodes corosync memory consumption at that point was 18 gigabytes, out of that 18 gigabytes residential. This is clearly suboptimal in itself, however it turned out that until I have restarted all of them on all nodes nodeC corosync cannot be started without generating thousands of join/leave events per minute (and more memory to be eaten on the others).
Restaring all gave me 600M (normal) memory consumption and nodeC able to join without errors.
It turned out however that on other nodes corosync memory consumption at that point was 18 gigabytes, out of that 18 gigabytes residential. This is clearly suboptimal in itself, however it turned out that until I have restarted all of them on all nodes nodeC corosync cannot be started without generating thousands of join/leave events per minute (and more memory to be eaten on the others).
Restaring all gave me 600M (normal) memory consumption and nodeC able to join without errors.