corosync was not able to come back

ednt

Renowned Member
Mar 16, 2017
112
7
83
Hi,

last friday we had, unfortunately, a loop in our network.
Maybe for 60 to 120 seconds, then I pulled the connector which causes the loop.
After this everything looks fine again, until ...
someone wants to login to our proxmox cluster: it was not possible via web.
Luckily it was possible via ssh.

We noticed that the corosync quorum was not reached.
1 vote out of 17 and a lot of traffic on the 5405 port.

We tried to restart all corosyncs, but it doesn't help.

Then we stopped all corosyncs, started first 2 of them and set the required votes to 2.
This worked. (pvecm expected X)
So we added one node after each other.
This did it.

But ...
I think it is horrible, that corosync is not able to come up by itself with (only) 17 nodes.

In my opinion this is a fault by design.

Or is there anything which can be done via configuration?
Maybe a random delay for each node when he try to connect to the corosync cluster.
 
Last edited:
without any logs, config or version information this will be hard to trouble shoot..