[SOLVED] Single-node watchdog configuration possible?

leesteken

Distinguished Member
May 31, 2020
7,588
2,461
278
My system recently hung at the end of a shutdown with some stack trace about _raw_spin_lock and dvb_frontend_release while I was away. I assume the watchdog noticed this as it reported
watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [TVRecEvent:5422]. A manual reboot fixed the issue and the system shuts down automatically again (when I'm not using it) and started when a new TV recording was scheduled. Can I configure the softdog to reboot automatically (after a short time)?

Debian guides suggest installing the watchdog package but that is incompatible with Proxmox. Also, all threads/information that I could find warned users to not mess with the watchdog (when using HA) and fix the underlying problem instead. Which is of course good advice, but I don't have a cluster and no HA and I fear that such crashes sometimes just happen (due to bugs or hardware hick-ups) and I would be nice if it restarted.
It's not a big or regular problem but can someone tell me whether (and how) I can configure the watchdog that comes with Proxmox and/or Debian?
 
Maybe I should leave the watchdog/softdog alone as Proxmox does not seem to want you to touch it. And it might also not even reboot the system anyway when a soft-lockup happens.

Instead, I have added this to /etc/sysctl.d/71-local.conf in an attempt to reboot automatically on a soft-lockup: kernel.softlockup_panic = 1 and kernel.panic = 60. If I have this issue again in the future then I'll report back whether this helps or not.
 
kernel.softlockup_panic = 1 is not a good idea. It not only happens on shutdown but can also have other causes and the reboots were very annoying today (especially while trying to troubleshoot why an AMD HD7750 suddenly crashes the amdgpu driver and shows no output).