Modify the HA triggering time

smaker · May 17, 2022

I want to modify the HA triggering time from 120s to about 60s.The following is my modifications:


pve-ha-manager:src/PVE/HA/NodeStatus.pm
-my $fence_delay = 60;
+my $fence_delay = 30;


pve-ha-manager:src/watchdog-mux.c
-int client_watchdog_timeout = 60;
+int client_watchdog_timeout = 30;


pve-ha-manager:src/PVE/HA/Env/PVE2.pm
@@ -241,7 +241,7 @@ sub get_pve_lock {
     my $last_lock_time = $last->{lock_time} // 0;
     my $last_got_lock = $last->{got_lock};
 
-    my $retry_timeout = 120; # hardcoded lock lifetime limit from pmxcfs
+    my $retry_timeout = 60; # hardcoded lock lifetime limit from pmxcfs


pve-cluster:data/src/memdb.c
-#define CFS_LOCK_TIMEOUT (60*2)
+#define CFS_LOCK_TIMEOUT (60)


pve-cluster:data/PVE/Cluster.pm
@@ -1076,7 +1076,7 @@ my $cfs_lock = sub {
        # fixed command timeout: cfs locks have a timeout of 120
        # using 60 gives us another 60 seconds to abort the task
        local $SIG{ALRM} = sub { die "got lock timeout - aborting command\n"; };
-       alarm(60);
+       alarm(40);
 
        cfs_update(); # make sure we read latest versions inside code()

Please give us some advice

verison:
pve-ha-manager 2.0-5
pve-cluster 5.0-27

fabian · May 17, 2022

advice: don't do that

more serious: the current 60/120 values inform a lot of other timeouts across the code base (which will then be wrong, which in turn means error handling will not have opportunity to run), and furthermore, reducing it will cause more frequent fencing events (potentially catastrophic if you trigger a 'cascade') since corosync/knet does need some time to detect links being down and coming backup up.

Search

Search

Modify the HA triggering time

smaker

Member

fabian

Proxmox Staff Member

We value your privacy