Ahhh...while no one likes to see problems I'm actually glad I just spotted this thread. Same issues, started with v6 upgrade from v5. I though my problem was because my cluster (which actually started on v3, with rebuild to v4 then upgrades from there) is running in multiple locations with a layer 2 vpn tunnel between them.
Same exact symptoms others are reporting: cluster quorum goes, restarting corosync service fixes it -- sometimes having to restart it on one node, sometimes on more or even all nodes to get it back. Lasts randomly anywhere from 5-10 minutes to 5-10 hours. Hasn't lasted an entire day since the upgrade.
My cluster is "production", but it is production for my company with no other customers on it other than some sites being monitored via omd/check_mk, so I can try things.
Is there any way to "tune" the timing on corosync? My guess is that this is a factor -- at least in mine.
Same exact symptoms others are reporting: cluster quorum goes, restarting corosync service fixes it -- sometimes having to restart it on one node, sometimes on more or even all nodes to get it back. Lasts randomly anywhere from 5-10 minutes to 5-10 hours. Hasn't lasted an entire day since the upgrade.
My cluster is "production", but it is production for my company with no other customers on it other than some sites being monitored via omd/check_mk, so I can try things.
Is there any way to "tune" the timing on corosync? My guess is that this is a factor -- at least in mine.