[SOLVED] Any more NUC users with unusable watchdog here?

Apollon77

Well-Known Member
Sep 24, 2018
153
13
58
46
hey and sorry for the "a bit off topic" post here. I use Proxmox successfully and were on NUC5PPYH so far and happy. My HA setup is working great.
But NUC5PPYH is limited to 8GB RAM and in fact EOL so I decided to start upgrading and so I ended up on NUC8i5BEH2 ... but here the watchdog is not working because of Intel BIOS crap ... and now Intel answered that no support will come because

is not supported by the Intel® NUC Kit NUC8i5BEH
and the ROI is not enough for the feature implementation.
The feature will not be implemented due to the Operating System support limitations.

If someone also want to post something on that you're welcome to do that at the Intel Forum Thread: https://forums.intel.com/s/question...-watchdog-wronlgy-and-so-watchdog-is-unusable

Is the watchdog working on NUC7? Is anyone knowing that and using watchdos successfully there? If yes which model?

Thank you!

Ingo
 
Which watchdog are you using? The iCTO or some other? There is also the softdog available (default).
 
iTCO ... only available ... i know and I also use the softdog at the moment ... but no idea how reliable it is :)
 
;-)) exactly this is the risk :) it is really bad from Intel that they do not manage to fix the watchdog in the newer devices :-( It's a shame
 
;-)) exactly this is the risk
If the kernel is stuck, no VM/CT will run either. ;)

it is really bad from Intel that they do not manage to fix the watchdog in the newer devices :-( It's a shame
Understandable. But NUCs are consumer hardware, where HW watchdogs are rarely needed.
 
If the kernel is stuck, no VM/CT will run either. ;)

Yes, but I have a HA cluser ... so they should simply move then and the machine should reboot ... :)
So the question is how fencing would work when only softdog is used? If fencing and moving vms/cts works basically also in this case that it "hangs" then i could work around

Understandable. But NUCs are consumer hardware, where HW watchdogs are rarely needed.

Yes and no ... I have NUC5PPYH and there it works perfect ... with NUC6CAYH it also worked, NUC7+8 also have the chips and stuff but the BIOS destroyed the feature becaue of an invalid memory mapping ... .
 
Yes, but I have a HA cluser ... so they should simply move then and the machine should reboot ... :)
So the question is how fencing would work when only softdog is used? If fencing and moving vms/cts works basically also in this case that it "hangs" then i could work around
The softdog fences the node once the quorum is lost. The services (VM/CT) recover on the other nodes. The same if the kernel is stuck. It is always the quorate part of the cluster that forms the decision.
 
So the "Only" risk is that if kernel hangs that soft-watchdog do not triggers a reboot and the mchine hangs and needs manual power off ... but fencing is done by the other hosts ... so ok ...

One aside question that you might know: When a node gets fenced there is an email sent out ... is there any way to execute a script or such when a node get fenced or to get this info somewhere else to maybe be able to "shoot that node into the head" by rebooting it or such?
 
So the "Only" risk is that if kernel hangs that soft-watchdog do not triggers a reboot and the mchine hangs and needs manual power off ... but fencing is done by the other hosts ... so ok ...
Yes.

One aside question that you might know: When a node gets fenced there is an email sent out ... is there any way to execute a script
You mean external monitoring.

such when a node get fenced or to get this info somewhere else to maybe be able to "shoot that node into the head" by rebooting it or such?
We talk about a stuck kernel. ;) That's why the hardware watchdogs are preferred.

But as an example, you will need to cut power. By using a IP-based PDU or a reset switch connected to the reset pins inside the NUC.
 
Does the Proxmox API returns a "fenced" status on a node?
You can get the state through the API pvesh get /cluster/ha/status/current or CLI ha-manager status --verbose.
 
BTW: This issue is fixed since the latest kernels in pve ... so it seems it was a linux isue :)
 
Sorry to highjack this thread but it seems the original problem has been resolved, so it might be okay to re-appropriate this thread...

To the OP:

You said PVE was working fine on your 5PPYH. I am trying to install PVE on my 5PPYH and it gets stuck during installation due to a CPU lockup. Did you ever encounter similar issues?

Thanks
 
Hey,

I exchanged the NUC5PPYH in the meantime, but had one working some weeks ago without any issues.
 
I solved my problem: There is a boot option in the NUC BIOS that let's you tell NUC what type of OS you are booting. It says there that this should be switched to Linux when installing Linux. Once I switched this to Linux, the installation went through smoothly.

I did encounter another issue but that is the topic of another hijacked thread...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!