Proxmox HA increase fence delay

EuroDomenii

Well-Known Member
Sep 30, 2016
144
30
48
Slatina
www.domenii.eu
TEST SETUP

My testing setup Proxmox cluster:

Node1- dedicated OVH production , reboot time 110 sec
Node2 - dedicated OVH backup server
Node3 - VPS OVH cloud, reboot time several seconds
Share storage ceph

HA groups:
HA12 - node1 priority 2, node2 priority 1, restricted, unchecked nofailback
HA32 - node3 priority 2, node2 priority 1, restricted, unchecked nofailback

Hardware watchdog configured with ipmi_watchdog:

cat /etc/modprobe.d/ipmi_watchdog.conf
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10

Test1 working:

1)Trigger kernel crash on node1 with echo c > /proc/sysrq-trigger
2)hardware watchdog manage to reboot the dedicated server ( same setup softdog fails)
3)Keep the node1 fenced, so during reboot I enter in bios setup
4)The Vms restarted successfully on node2
5)Reset node1
6)Node1 again active ( is a production server, more powerful, with nofailback unchecked on purpose in HA12)
7) the Vms returns to node1

THE ISSUE

In the above test, If I skip step 3 and let node1 reboot by hardware watchdog after kernel crash, I want to keep the Vms on node1.

This is a genuine reboot try, during fence delay window.

My problem is that fence delay is only 60, and my server reboot time is 110. Before having a chance to reboot, the VMs restarts on node2.
$fence_delay = 60; in /usr/share/perl5/PVE/HA/NodeStatus.pm

Repeating the same test with node3 ( which boots very fast, being a VPS) all the Vms stays on node3 and restarts on here, after grabbing the lock.

INCREASE FENCE DELAY

I saw a commit by Dietmar, increasing fence_delay from 30 to 60 second. https://git.proxmox.com/?p=pve-ha-m...ff;h=ceac1930e8747b758982396949e14d9f0c8b13fd

Btw, should this option be configurable from GUI?

Now, the option is in /usr/share/perl5/PVE/HA/NodeStatus.pm. Increasing the option in both node2/node1/node3 didn’t change the test workflow, won’t work! ( edit with vim and reboot).

Thank you!
 
Hi,
In the above test, If I skip step 3 and let node1 reboot by hardware watchdog after kernel crash, I want to keep the Vms on node1.

This is a genuine reboot try, during fence delay window.

My problem is that fence delay is only 60, and my server reboot time is 110. Before having a chance to reboot, the VMs restarts on node2.
$fence_delay = 60; in /usr/share/perl5/PVE/HA/NodeStatus.pm

Repeating the same test with node3 ( which boots very fast, being a VPS) all the Vms stays on node3 and restarts on here, after grabbing the lock.

AFAIK, we addressed that in ha-manager with version to 1.0-39
A service gets freezed on a reboot since then, as we expect that a Node comes up (relatively fast) on a graceful reboot.

IMHO, this is the saner solution than increasing the timeout, because the boot time can differ, so an user would set the worst case timeout which then degrades the availability of the HA stack, recovery on a real failure would get delayed unnecessarily.
An possibility would be allowing the user to set the wanted behavior (reboot -> freeze or to long reboot -> recovery).

If you are unable to update for now you could execute the following command before you reboot the server gracefully as a workaround for now:

Code:
systemctl stop pve-ha-lrm

This freezes the HA services actively, so the won't get touched by the manager and thus not fenced/recovered.
 
  • Like
Reactions: EuroDomenii
My pve-ha-manager version 1.0-40, Proxmox 4.4

Thank you for the tips on graceful reboot, but is not my use case. The default behaviour on graceful reboot is freezing the VMs, and works as expected, even in case of long reboots.

In my situation, systemctl stop pve-ha-lrm is not usable, because on node1 I trigger kernel crash with echo c > /proc/sysrq-trigger. Afterwards, the node1 is rebooted by hardware watchdog.

If I try to hack the code, the correct variable to increase is $fence_delay in /usr/share/perl5/PVE/HA/NodeStatus.pm ? Because it seems that has no effect in my case.

Not logging to syslog either…

sub node_is_offline_delayed {
my ($self, $node, $delay) = @_;

$delay = $fence_delay if !defined($delay);
syslog('warning', "%s", "fence delay value $delay\n");
 
For the sake of simplicity, I have described only 3 nodes in the cluster, that are participating in the test.

In fact, there is a 4 node cluster, and node4 is Master at this moment! I guess I should hack/increase the $fence_delay in /usr/share/perl5/PVE/HA/NodeStatus.pm in node4 also, because it should trigger the fencing event. Let me try this and get back with result.
 
Yes. This was the trick. In order to increase $fence_delay value you must be modify the file /usr/share/perl5/PVE/HA/NodeStatus.pm on all cluster nodes, because every node could play the master role in a cluster at a certain moment and trigger the fencing process.

Now, working test is:

1) Trigger kernel crash on node1 with echo c > /proc/sysrq-trigger
2) Hardware watchdog manage to reboot the dedicated server ( same setup softdog fails)
3) Due to increased fence delay ( 160sec) , the cluster is not fencing node1 ( with reboot time 110sec)
4) Node1 again active ( is a production server, more powerful, with nofailback unchecked on purpose in HA12)
5) The Vms restart on node1 ( without moving to node2 backup).

In this respect, as a feature request, fence delay should be configurable from GUI.

My response was also “delayed” due to my previous hacks with syslog, which lead me to a lot of false positives ( like https://forum.proxmox.com/threads/master-old-timestamp-dead.26489/ , vms not starting etc)

Code:
systemctl status pve-ha-crm -l
● pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
  Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
  Active: active (running) since Fri 2017-02-10 18:05:04 EET; 44min ago
 Process: 2902 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
Main PID: 2912 (pve-ha-crm)
  CGroup: /system.slice/pve-ha-crm.service
          └─2912 pve-ha-cr
Feb 10 18:48:11 gina pve-ha-crm[2912]: got unexpected error - Undefined subroutine &PVE::HA::NodeStatus::syslog called at /usr/share/perl5/PVE/HA/NodeStatus.pm line 52.

Fixed with apt install --reinstall pve-ha-manager on each node.
 
kernel crash with echo c > /proc/sysrq-trigger

Node that, while a valid test, it may happen totally different in real world situations.
E.g., you need also longer now for a recovery in a case of Hardware Failure, node local Power failure, node local network failure.

If you expect that the most failures come from node crashing, and then come up totally unharmed then yes, your measure will have a positive effect.
But, imo, this is, while still possible, not the usual failure case. But also, as you did not increased the timeout to much you should not run into major problems.
I just wanted to explain why this may not always have advantages and why it could have disadvantages.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!