We have a tree node cluster and our datacenter is having little networking issues those days. It means that part of the traffic is lost (resulting pings loss) and the watchdog on each node, which is the default software softdog, sees that the machine cannot contact the others and shutdown the server. Resulting, every node on the cluster goes down at the same time (which is kind of sad we you try to have HA).
My question is : is it possible to manually configure softdog to increase the number of polls before unplugging everything ? Say we want to reset the clock every 5sec instead of the default values.
Also, how many polls does the watchdog before triggering the shutdown and what is the time interval between the beginning of the countdown and the watchdog trigger ?
The only file I found is /lib/modules/4.4.8-1-pve/kernel/drivers/watchdog/softdog.ko and it is compiled.
My question is : is it possible to manually configure softdog to increase the number of polls before unplugging everything ? Say we want to reset the clock every 5sec instead of the default values.
Also, how many polls does the watchdog before triggering the shutdown and what is the time interval between the beginning of the countdown and the watchdog trigger ?
The only file I found is /lib/modules/4.4.8-1-pve/kernel/drivers/watchdog/softdog.ko and it is compiled.