Hi,
I'm trying to understand the watchdog selection logic in PVE HA. My 3-node
cluster has:
- IPMI/iLO present on all nodes (HPE ProLiant, ipmi_watchdog module loadable)
- softdog also available (default fallback)
- /etc/default/pve-ha-manager: WATCHDOG_MODULE=ipmi_watchdog
Question: if I configure ipmi_watchdog but the IPMI BMC becomes unresponsive
(stuck firmware, network partition affecting the dedicated IPMI port, etc.),
does pve-ha-manager fall back to softdog automatically, or does HA simply not
work until I fix IPMI?
I'd like to understand the failure modes before I rely on this in production.
Reading the source got me partway but I'd love confirmation from people who
have actually seen it fail.
I'm trying to understand the watchdog selection logic in PVE HA. My 3-node
cluster has:
- IPMI/iLO present on all nodes (HPE ProLiant, ipmi_watchdog module loadable)
- softdog also available (default fallback)
- /etc/default/pve-ha-manager: WATCHDOG_MODULE=ipmi_watchdog
Question: if I configure ipmi_watchdog but the IPMI BMC becomes unresponsive
(stuck firmware, network partition affecting the dedicated IPMI port, etc.),
does pve-ha-manager fall back to softdog automatically, or does HA simply not
work until I fix IPMI?
I'd like to understand the failure modes before I rely on this in production.
Reading the source got me partway but I'd love confirmation from people who
have actually seen it fail.