corosync was not able to come back

ednt

Active Member
Mar 16, 2017
96
7
28
Hi,

last friday we had, unfortunately, a loop in our network.
Maybe for 60 to 120 seconds, then I pulled the connector which causes the loop.
After this everything looks fine again, until ...
someone wants to login to our proxmox cluster: it was not possible via web.
Luckily it was possible via ssh.

We noticed that the corosync quorum was not reached.
1 vote out of 17 and a lot of traffic on the 5405 port.

We tried to restart all corosyncs, but it doesn't help.

Then we stopped all corosyncs, started first 2 of them and set the required votes to 2.
This worked. (pvecm expected X)
So we added one node after each other.
This did it.

But ...
I think it is horrible, that corosync is not able to come up by itself with (only) 17 nodes.

In my opinion this is a fault by design.

Or is there anything which can be done via configuration?
Maybe a random delay for each node when he try to connect to the corosync cluster.
 
Last edited:

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,419
1,677
174
without any logs, config or version information this will be hard to trouble shoot..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!