Hi !
Are there any known failover issues for corosync ?
The issue am reporting here was cause by me the tester while trying to confirm that I got a stable corosync network but so far I don't believe so.
The test I ran consist on shutting the majority of the links for ring0 (primary ring) and then shutting down a link for ring1 (standby ring).
Rings total: 2
Nodes total: 19
* During the test I shutdown 16 links for ring0 out of 19 and the quorum was not lost. Then I recovered 4 links from ring0 and shut 1 of ring1 on a host which had both links up, after that the cluster lost the quorum and all servers were rebooted.
* Interesting enough, I previously tested the failover as above successfully but in the previous test only 12 links were shut from ring0 and 1 from ring1 no links from ring0 were recovered. In this case the only server that rebooted was the one which lost its link 1.
Was cluster failure expected to happen on the first scenario that i described ? For what i see/understand from the logs the cluster failed over again to ring0 once I shutdown a link from ring1 but I did not see this happening on the 2nd scenario.
Can someone help understand this issue ?
Here attach are the logs for the first described scenario.
Thanks!
Are there any known failover issues for corosync ?
The issue am reporting here was cause by me the tester while trying to confirm that I got a stable corosync network but so far I don't believe so.
The test I ran consist on shutting the majority of the links for ring0 (primary ring) and then shutting down a link for ring1 (standby ring).
Rings total: 2
Nodes total: 19
* During the test I shutdown 16 links for ring0 out of 19 and the quorum was not lost. Then I recovered 4 links from ring0 and shut 1 of ring1 on a host which had both links up, after that the cluster lost the quorum and all servers were rebooted.
* Interesting enough, I previously tested the failover as above successfully but in the previous test only 12 links were shut from ring0 and 1 from ring1 no links from ring0 were recovered. In this case the only server that rebooted was the one which lost its link 1.
Was cluster failure expected to happen on the first scenario that i described ? For what i see/understand from the logs the cluster failed over again to ring0 once I shutdown a link from ring1 but I did not see this happening on the 2nd scenario.
Can someone help understand this issue ?
Here attach are the logs for the first described scenario.
Thanks!