Watchdog will not trigger on Intel System

Supaman

New Member
Jan 10, 2024
12
1
3
Hello,

i want to activate Hardware Watchdog function.

I have a Intel based System with a n5105 CPU. The Bios has a Watchdog on/off option, which is set to "enabled".
The WD Module für Intel Systems is "iTCO_wdt".


# My Setup Steps so far:

define WD module
edit:
/etc/default/pve-ha-manager
WATCHDOG_MODULE=iTCO_wdt


disable nmi watchdog, which is embedeed in cpu apic.
edit:
/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

after editing:
> update-grub
reboot


Checking WD Service Status:

root@pve:~# dmesg | grep iTCO_wdt
[ 5.359228] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
[ 5.359988] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

root@pve:~# journalctl -b --grep iTCO_wdt
Aug 03 13:28:53 pve watchdog-mux[690]: Loading watchdog module 'iTCO_wdt'
Aug 03 13:28:53 pve kernel: iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
Aug 03 13:28:53 pve kernel: iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
Aug 03 13:28:53 pve watchdog-mux[690]: Watchdog driver 'iTCO_wdt', version 6


Until here everything lokks good and <should> work, but when i test with

> echo c > /proc/sysrq-trigger

the System hangs, but not reboot/reset.

Any ideas ?
 
Full content of the directory:

Code:
root@bbox:~# cd  /sys/firmware/acpi/tables
root@bbox:/sys/firmware/acpi/tables# ls -l
total 0
-r-------- 1 root root    476 Aug  4 09:15 APIC
drwxr-xr-x 2 root root      0 Aug  4 09:15 data
-r-------- 1 root root     92 Aug  4 09:15 DBG2
-r-------- 1 root root     52 Aug  4 09:15 DBGP
-r-------- 1 root root    136 Aug  3 18:39 DMAR
-r-------- 1 root root 480389 Aug  4 09:15 DSDT
drwxr-xr-x 2 root root      0 Aug  4 09:15 dynamic
-r-------- 1 root root    276 Aug  3 18:39 FACP
-r-------- 1 root root     64 Aug  4 09:15 FACS
-r-------- 1 root root    156 Aug  4 09:15 FIDT
-r-------- 1 root root     68 Aug  1 15:20 FPDT
-r-------- 1 root root     56 Aug  4 09:15 HPET
-r-------- 1 root root    204 Aug  4 09:15 LPIT
-r-------- 1 root root     60 Aug  4 09:15 MCFG
-r-------- 1 root root     45 Aug  4 09:15 NHLT
-r-------- 1 root root   1553 Aug  3 18:39 PHAT
-r-------- 1 root root    908 Aug  4 09:15 SSDT1
-r-------- 1 root root   6425 Aug  4 09:15 SSDT10
-r-------- 1 root root   1850 Aug  4 09:15 SSDT11
-r-------- 1 root root  15082 Aug  4 09:15 SSDT12
-r-------- 1 root root  14810 Aug  4 09:15 SSDT13
-r-------- 1 root root    324 Aug  4 09:15 SSDT14
-r-------- 1 root root  23819 Aug  4 09:15 SSDT2
-r-------- 1 root root  10549 Aug  4 09:15 SSDT3
-r-------- 1 root root  13271 Aug  4 09:15 SSDT4
-r-------- 1 root root  54407 Aug  4 09:15 SSDT5
-r-------- 1 root root   7962 Aug  4 09:15 SSDT6
-r-------- 1 root root    183 Aug  4 09:15 SSDT7
-r-------- 1 root root  10883 Aug  4 09:15 SSDT8
-r-------- 1 root root   9047 Aug  4 09:15 SSDT9
-r-------- 1 root root     76 Aug  4 09:15 TPM2
-r-------- 1 root root     72 Aug  3 18:39 UEFI
-r-------- 1 root root     40 Aug  4 09:15 WSMT
root@bbox:/sys/firmware/acpi/tables#
 
So this would be a bit of a brainstorming, but:

1. I would try using the wdat_wdt driver instead first.

2. Can you check only one WD module is loaded, e.g. lsmod | grep -e wdt -e dog?

3. Did you get them to initramfs? lsinitramfs initrd.img-... | grep -e wdt -e dog

4. If you resort to iTCO_wdt, do you also load i2c_smbus?

Also, what chipset/hardware was this exactly?
 
  • Like
Reactions: Supaman
@ esi_y

I have a fresh installed PVE incl. latest updates, all modifications are listet in my inital post, everything else is default. I tried hard to get the WD working, but not much to find above the things i have done, especially when it comes to the point, how to identify the hardware and checklist for prequisites.

1. I would try using the wdat_wdt driver instead first.
You mean: change WATCHDOG_MODULE=wdat_wdt ?

2. Can you check only one WD module is loaded, e.g. lsmod | grep -e wdt -e dog?
Code:
root@pve:~# lsmod | grep -e wdt -e dog
iTCO_wdt               16384  1
intel_pmc_bxt          16384  1 iTCO_wdt
iTCO_vendor_support    12288  1 iTCO_wdt

3. Did you get them to initramfs? lsinitramfs initrd.img-... | grep -e wdt -e dog
Code:
root@pve:~# lsinitramfs /boot/initrd.img-6.8.8-4-pve | grep -e wdt -e dog
usr/sbin/watchdog

4. If you resort to iTCO_wdt, do you also load i2c_smbus?
How do i check that / what do i need to do?

Code:
root@pve:~# i2cdetect -l
i2c-0   i2c             Synopsys DesignWare I2C adapter         I2C adapter
i2c-1   i2c             Synopsys DesignWare I2C adapter         I2C adapter
i2c-2   smbus           SMBus I801 adapter at efa0              SMBus adapter
i2c-3   i2c             i915 gmbus dpa                          I2C adapter
i2c-4   i2c             i915 gmbus dpb                          I2C adapter
i2c-5   i2c             i915 gmbus dpc                          I2C adapter
i2c-6   i2c             i915 gmbus tc1                          I2C adapter
i2c-7   i2c             i915 gmbus tc2                          I2C adapter
i2c-8   i2c             i915 gmbus tc3                          I2C adapter
i2c-9   i2c             i915 gmbus tc4                          I2C adapter
i2c-10  i2c             i915 gmbus tc5                          I2C adapter
i2c-11  i2c             i915 gmbus tc6                          I2C adapter
i2c-12  i2c             AUX C/DDI C (TC)/PHY C                  I2C adapter
i2c-13  i2c             AUX D/DDI D (TC)/PHY A                  I2C adapter

Also, what chipset/hardware was this exactly?
The Systrem is a MiniPC Box from A*iExpr*ss, a Topton with 4x LAN and Intel n100 COu/Chipset.

Maybe this is helpful:


Code:
root@pve:~# systemctl status watchdog-mux.service
● watchdog-mux.service - Proxmox VE watchdog multiplexer
     Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static)
     Active: active (running) since Sat 2024-08-03 17:17:54 CEST; 18h ago
   Main PID: 673 (watchdog-mux)
      Tasks: 1 (limit: 76825)
     Memory: 276.0K
        CPU: 3.311s
     CGroup: /system.slice/watchdog-mux.service
             └─673 /usr/sbin/watchdog-mux

Aug 03 17:17:54 pve systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Aug 03 17:17:54 pve watchdog-mux[673]: Loading watchdog module 'iTCO_wdt'
Aug 03 17:17:54 pve watchdog-mux[673]: Watchdog driver 'iTCO_wdt', version 6
 
Last edited:
You mean: change WATCHDOG_MODULE=wdat_wdt ?

Yes, how about to try that first?

Code:
root@pve:~# lsinitramfs /boot/initrd.img-6.8.8-4-pve | grep -e wdt -e dog
usr/sbin/watchdog

Ok I just realised something. PVE really only activates the WD upon watchdog-mux start. But you have even ...

GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

So you have no watchdog at all till the service starts up.

Did you do this just so you can test with?

echo c > /proc/sysrq-trigger

You know you can test the watchdog by killall -9 watchdog-mux and then observe dmesg -w or journactl -f ? That way you do not have to literally crash your system with null pointer deref and you know if ... e.g. the hardware watchdog does something other than rebooting by itself (maybe crash? :D).

How do i check that / what do i need to do?

If you fail with WDAT (and keep trying with iTCO_wdt) , I would just check if you have lsmod | grep i2c_smbus loaded with that.

The Systrem is a MiniPC Box from A*iExpr*ss, a Topton with 4x LAN and Intel n100 COu/Chipset.

BTW You think the softdog does not as good of a job? I would really mostly have experience with the BMC ones. The softdog is very reliable and the NMI one is also doing its job just fine.

Code:
Aug 03 17:17:54 pve systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Aug 03 17:17:54 pve watchdog-mux[673]: Loading watchdog module 'iTCO_wdt'
Aug 03 17:17:54 pve watchdog-mux[673]: Watchdog driver 'iTCO_wdt', version 6

Yeah that's fine, especially with as you earlier listed:

Aug 03 13:28:53 pve kernel: iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

Did you test (before all this) that the softdog worked, btw? But I have no reason to believe it would not, as you say the install was fresh.
 
  • Like
Reactions: Supaman
@ esi_y

Thanks a lot for your detailed advice! I am a linux beginner, and digging deeper step by step, mosty with lots of google research.

i will try the above and report the results, but it take some days ..
 
  • Like
Reactions: esi_y

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!