How to increase cluster communication timeout?

M-SK

Member
Oct 11, 2016
46
4
13
53
Hello,

We are currently having issues with a switch stack connecting our cluster nodes. The switch restarts intermittently and cluster nodes fence themselves and restart. This loss of communications lasts between 60-120 seconds.

Until we replace the switch, is there any way to temporarily increase timeout value after which cluster nodes start to fence themselves?

Thanks!

EDIT - It's a Proxmox 4.4 cluster.
 
not really AFAIK, but you could disable HA and reboot the nodes (or restart the pve-ha-lrm and pve-ha-crm)
 
All my NIC's are in use and I was thinking of a more temporary solution till we replace the switch. And I still need HA. This issue happens twice per month.
 
And I still need HA.
but if the cluster communication is not reliable, disabling the fencing/timeout can lead to dangerous split-brain situations (hence why there is fencing after all), so one should either use ha with fencing with (a) reliable network(s) or disable ha/fencing
 
but if the cluster communication is not reliable, disabling the fencing/timeout can lead to dangerous split-brain situations (hence why there is fencing after all), so one should either use ha with fencing with (a) reliable network(s) or disable ha/fencing

I understand what you're saying, I was just hoping there's a way to tune the timeouts.