Hi,
last friday we had, unfortunately, a loop in our network.
Maybe for 60 to 120 seconds, then I pulled the connector which causes the loop.
After this everything looks fine again, until ...
someone wants to login to our proxmox cluster: it was not possible via web.
Luckily it was possible via ssh.
We noticed that the corosync quorum was not reached.
1 vote out of 17 and a lot of traffic on the 5405 port.
We tried to restart all corosyncs, but it doesn't help.
Then we stopped all corosyncs, started first 2 of them and set the required votes to 2.
This worked. (pvecm expected X)
So we added one node after each other.
This did it.
But ...
I think it is horrible, that corosync is not able to come up by itself with (only) 17 nodes.
In my opinion this is a fault by design.
Or is there anything which can be done via configuration?
Maybe a random delay for each node when he try to connect to the corosync cluster.
last friday we had, unfortunately, a loop in our network.
Maybe for 60 to 120 seconds, then I pulled the connector which causes the loop.
After this everything looks fine again, until ...
someone wants to login to our proxmox cluster: it was not possible via web.
Luckily it was possible via ssh.
We noticed that the corosync quorum was not reached.
1 vote out of 17 and a lot of traffic on the 5405 port.
We tried to restart all corosyncs, but it doesn't help.
Then we stopped all corosyncs, started first 2 of them and set the required votes to 2.
This worked. (pvecm expected X)
So we added one node after each other.
This did it.
But ...
I think it is horrible, that corosync is not able to come up by itself with (only) 17 nodes.
In my opinion this is a fault by design.
Or is there anything which can be done via configuration?
Maybe a random delay for each node when he try to connect to the corosync cluster.
Last edited: