[SOLVED] NUC6CAY iTCO_wdt Watchdog support killed by last (kernel?) updates?

Apollon77

Well-Known Member
Sep 24, 2018
153
13
58
47
Hi,

I have a set of 4 NUCs that all run PVE and work in an HA cluster. I have enabled iTCO_wdt as built in hardware watchdog device on all of them and it worked well. Yesterday I did all the current updates and after that I had problems enabling one of the nodes. This was the (only) one which is a NUC6CAY device. Reason is that the kernel does not find the watchdog device anymore and so the lrm do not startup.

I now have the following software versions:

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 4.1.5-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1

and before the update I had the versions that were published before.

Could there anything in the pve or kernel updates that could prevent the watchdog device from being discovered? before it worked and it also works on the other NUCs :-( I also tried powering the device off completely.

Any idea?

I now have switched to the softdog for this node, but I do not know if I want to trust that for my operations.

Thank you for your support.
 
Hi,

please try to boot to the old kernel and check if this behavior still exists.

You can choose kernel in the GRUB menu.
or modify the /etc/default/grub
 
Yes, great idea (sorry that I did not tried this before ... )will try tonight and report
 
Ok, also not working with old kernel anymore ... so seems to be an HW issue :-( Thank you for your support!
 
Did you upgrade the firmware?
 
I downgraded to the version where it last worked and it still does not work. I will now try a upgrade to latest kernel available next days and retest. After that I assume it is caused by the HW.

It is already latest BIOS/Firmware
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!