Modify the HA triggering time

smaker

Member
Mar 4, 2022
1
0
6
30
I want to modify the HA triggering time from 120s to about 60s.The following is my modifications:
pve-ha-manager:src/PVE/HA/NodeStatus.pm -my $fence_delay = 60; +my $fence_delay = 30;

pve-ha-manager:src/watchdog-mux.c -int client_watchdog_timeout = 60; +int client_watchdog_timeout = 30;


pve-ha-manager:src/PVE/HA/Env/PVE2.pm @@ -241,7 +241,7 @@ sub get_pve_lock { my $last_lock_time = $last->{lock_time} // 0; my $last_got_lock = $last->{got_lock}; - my $retry_timeout = 120; # hardcoded lock lifetime limit from pmxcfs + my $retry_timeout = 60; # hardcoded lock lifetime limit from pmxcfs


pve-cluster:data/src/memdb.c -#define CFS_LOCK_TIMEOUT (60*2) +#define CFS_LOCK_TIMEOUT (60)


pve-cluster:data/PVE/Cluster.pm @@ -1076,7 +1076,7 @@ my $cfs_lock = sub { # fixed command timeout: cfs locks have a timeout of 120 # using 60 gives us another 60 seconds to abort the task local $SIG{ALRM} = sub { die "got lock timeout - aborting command\n"; }; - alarm(60); + alarm(40); cfs_update(); # make sure we read latest versions inside code()
Please give us some advice

verison:
pve-ha-manager 2.0-5
pve-cluster 5.0-27
 
advice: don't do that

more serious: the current 60/120 values inform a lot of other timeouts across the code base (which will then be wrong, which in turn means error handling will not have opportunity to run), and furthermore, reducing it will cause more frequent fencing events (potentially catastrophic if you trigger a 'cascade') since corosync/knet does need some time to detect links being down and coming backup up.