Is it possible for HA to simply monitor a network link ?

ulysse31 · Oct 29, 2021

Hi,

It may be a silly question, but I'll ask it anyways ...
I've built a 3 node cluster, each node as a 2 bonds of 2 physical interfaces each, one bond is a VLAN trunk for the VM network assignment (via a vlan-aware bridge on top of the bond), the other bond is for the proxmox interface and management (one management VLAN).
The cluster was setup to use both links (the trunk bond as an IP address on the native vlan for the LACP bond).
If the "VM bond" goes down (lets say cable failure on the 2 fibers), the VMs (that were added to the HA) are not migrated to another cluster node ...
The node itself sees that the bond is down ... (message declaring bond down on dmesg).
But on the web GUI, the interfaces (bond, as well as physical iface of the bond) are indicated as "LINK - Yes" on the network section of the impacted node.
Is it possible to setup the HA (or on each node, don't know) to stop all VMs that are running if the "VM bond" link goes down, and let the HA start them on another node were the "VM Bond" is up ?
Thanks for the kind answers and help.

Regards,

--
Ulysse31

czechsys · Nov 1, 2021

Not from PVE side.

You need implement something as STONITH (locally or remotely), for example shutdown the affected node.

ulysse31 · Nov 2, 2021

Hello @czechsys !

Thanks for the reply, is there a documentation somewhere on how to implement that ?

Thanks a lot.

ulysse31 · Nov 3, 2021

Hello All,

I've been searching documentations to solve this issue, and maybe have some about watchdog and softdog.
It seems that on all proxmox nodes in a HA, softdog is enabled by default, that means a services (watchdog-mux) opens /dev/watchdog and writes data to it regularly, less than every 60 seconds, otherwise, the node reboots.
I've build a little Perl daemon, that monitors the link on the bond0 interface (crucial interface for VM hosting, without that bond, the node is useless).
That daemon works on 2 consecutive phases:
- arming phase : it polls the interface and checks that the link is up and steady, time interval and number of checks is configurable, for now, i set check 3 times every 20 secs, which gives a link steady during 60 secs, onces this step is validated, it goes to watch phase.
- watching phase : it polls the interface and check that link is up, for now every 10 seconds, and, if the link is down more than 5 consecutive times (50 seconds total), it kills watchdog-mux process, forcing machine to reboot.

Since this daemon is at boot, it means that if link stays down, the daemon will wait on "arming phase" that link becomes up and steady.
I do not like the action of killing watchdog-mux, would have preferred something more "clean" / "soft", if someone has an idea or suggestion, I would be really thankful ^^'

Thanks for your help,

--
Ulysse31

ulysse31 · Nov 3, 2021

UPDATE:

Seems that I may also define "Default maintenance policy" of the HA to "Migrate" and send a "reboot" command instead of killing the watchdog-mux process ...
Seems cleaner ? no ?
Thanks for the help

--
Ulysse31

Search

Search

Is it possible for HA to simply monitor a network link ?

ulysse31

New Member

czechsys

Renowned Member

ulysse31

New Member

ulysse31

New Member

ulysse31

New Member