Is softdog better/worse/same as watchdog_ipmi?

trilljester

Member
Oct 9, 2018
10
0
6
49
Hi,

I configured my servers in a new Proxmox cluster we're building to use the BIOS-based watchdog, which in turn enabled the softdog kernel module in Proxmox VE. Is this watchdog better/worse/same than using the IPMI watchdog?

Also, my system gives me three options for what do when the watchdog timer expires, Do Nothing, Reset, or Poweroff. Should I set it to Do Nothing so I can allow Proxmox to handle that?

Thanks for any assistance.
 
. Is this watchdog better/worse/same than using the IPMI watchdog?

Not the same, as they are a bit of a different concepts.

In our experience, it depends totally on the hardware watchdog, i.e., what model and what driver. We made terrible experience with hpt based watchdogs in the past, if it was just the kernel driver or also the HW itself wasn't sure, but a watchdog which crashes the host on normal use doesn't sound alright..

There are lots of other hardware based watchdogs which work fine, though.

The softdog in the Kernel is a very simple design, needs no real locking or anything else which could hang it up once running - even if the kernel itself would be deadlocked by a bug or the like. So while yes, there is a chance that in some scenario it does not triggers, we never heard of that. Further, HW have does possible deficits too, things can always malfunction, the chance doing so is rather small though.

Also, my system gives me three options for what do when the watchdog timer expires, Do Nothing, Reset, or Poweroff. Should I set it to Do Nothing so I can allow Proxmox to handle that?

Normally all watchdog kernel modules are blacklisted anyway (it's dangerous to just activate one if nobody constanstly resets it, or if it's a known bad one). So even if you choose to use that you probably need to load the module, see: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_configure_hardware_watchdog

Tl;Dr: Never heard of issues with the softdog, we heard of issues with some specific HW Watchdogs. The HW Watchdog could be more reliable in theory (independent component), so if yours works OK you will do fine with that one too.
 
As the point of a hardware engineer, I always use independent (hardware) watchdog.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!