HA fencing with softdog vs IPMI watchdog — which one is actually used when both are configured?

atlas32 · 2026-07-02T11:45:23+0200

Hi,

I'm trying to understand the watchdog selection logic in PVE HA. My 3-node
cluster has:

- IPMI/iLO present on all nodes (HPE ProLiant, ipmi_watchdog module loadable)
- softdog also available (default fallback)
- /etc/default/pve-ha-manager: WATCHDOG_MODULE=ipmi_watchdog

Question: if I configure ipmi_watchdog but the IPMI BMC becomes unresponsive
(stuck firmware, network partition affecting the dedicated IPMI port, etc.),
does pve-ha-manager fall back to softdog automatically, or does HA simply not
work until I fix IPMI?

I'd like to understand the failure modes before I rely on this in production.
Reading the source got me partway but I'd love confirmation from people who
have actually seen it fail.

mkoeppl · 2026-07-02T14:50:57+0200

Hi!

The default fallback is selected if no other watchdog is configured when the watchdog-mux daemon is started. There is no mechanism to fall back if a hardware watchdog fails. Such a failure is something you'd probably want to know about and fix instead of being silently moved to softdog, since you're relying on a hardware watchdog as an independent component.

aaron · 2026-07-02T14:56:06+0200

I recommend to stick with the soft watchdog. It works well and you avoid any issues due to questionable quality of OOBM hardware and software

g.fiore · 2026-07-03T15:44:55+0200

@atlas32 — just to add one practical bit on top of what Michael and Aaron
already said, in case you do end up going the ipmi_watchdog route despite
Aaron's (very sensible) recommendation:

if you commit to it, don't just configure it and forget it — monitor the
BMC-side state from the OS explicitly:

ipmitool mc watchdog get

and alert on two things:
- "Watchdog Timer Is:" showing "Stopped" while HA thinks the node is
active
- "Watchdog Timer Actions:" not matching what you configured (e.g. wiki
example uses "Hard Reset")

Reason for the second check: some BMC firmware updates silently reset the
watchdog configuration back to defaults. There's a thread on this forum
where an iDRAC9 v6 -> v7 upgrade broke ipmi_watchdog integration entirely,
so it's not a theoretical concern.

That way you're not blindly trusting a component whose health you can't
observe from the OS side — which was, I think, the core of Michael's point
about "you'd want to know about it".

Nothing to add to the architecture side, Michael already nailed it.

HA fencing with softdog vs IPMI watchdog — which one is actually used when both are configured?

atlas32

New Member

mkoeppl

Proxmox Staff Member

aaron

Proxmox Staff Member

g.fiore

New Member

We value your privacy