We have a 3 node CEPH hyperconverged cluster at my work, and the one thing that kept biting us in the ass from time to time, seemingly at random, was corosync toten re-transmissions.
The best thing we did was to separate the corosync network. To save on costs and since we won't ever need to...