Quorum timeout

Anthony-Frnog

New Member
Dec 15, 2020
3
0
1
124
Hi,

I use about 10 nodes in Proxmox Cluster (PVE 5.3, I know, it's the last release and I can't upgrade this cluster)
These nodes are separated between 3 datacenters.

If one of them (datacenter) is unavailable during one or 2 minutes, nodes presents on this datacenter will reboot.


Is it possible to setup the timeout before that Proxmox consider nodes dead and execute a reboot sequence.

Thanks in advance for your answer.

Best regards
Anthony
 
seem to be a bug, as 1node down shouldn't impact other nodes. (you still should have quorum)

can you reproduce it 100% ?

they are a bug fixed in pve-cluster recently (in pmxcfs), where sometimes /etc/pve could be locked on all nodes after 1 node down, but I don't think it was backported to 5.3, as it's end of support.
 
Thanks for your quick answer.
Yes, I can reproduce this at 100%.

For example, if I disconnect a node from my switch during one minute, then I reconnect, this node will reboot :(

I would like know if i can configure the timeout delay before consider the node dead on quorum.

Anthony
 
Sorry, I think I wrongly understand.
"If one of them (datacenter) is unavailable during one or 2 minutes, nodes presents on this datacenter will reboot."
vs
" if I disconnect a node from my switch during one minute, then I reconnect, this node will reboot "

is it only 1 node, or all nodes of the datacenter, when 1 node is disconnect ?

If this is only this node, this is normal. as you don't have network, you don't have quorum.
if HA is enabled, the node will be reboot.

It's not possible to change the timeout. (it's hardcoded)



also, with 3 datacenters, you should have 2 differents links between each datacenter.
(or if you loose 1link of a datacenter, if you loose this link, the full datacenter will lost quorum, and all nodes of this dc will reboot)
 
I have the 2 case, but the more important is:
If one of them (datacenter) is unavailable during one or 2 minutes, nodes presents on this datacenter will reboot.

It's very strange that "quorum" timeout can't be modified...

Last question: Is it possible to make something in corosync.conf in order to add

token: 120000

(2 minutes) for example ?

Maybe I'm totally wrong :/
 
I have the 2 case, but the more important is:
If one of them (datacenter) is unavailable during one or 2 minutes, nodes presents on this datacenter will reboot.

It's very strange that "quorum" timeout can't be modified...

Last question: Is it possible to make something in corosync.conf in order to add

token: 120000

(2 minutes) for example ?

Maybe I'm totally wrong :/
It's not related to corosync. It's related to HA (pve-ha-crm && pve-ha-lrm) && watchdog reset and also hardcoded values in pmxcfs. (so, without recompilation + a lot of changes in differents perl files, it's not possible to change it)


you shouldn't use HA if you don't have redundant links between your 3 DC.
you should have something like

Code:
DC1--------DC2
  |           |
  |-----DC3---|
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!