My watchdog crashes the server during reboot. Any idea how to fix that?

kwinz

Active Member
Apr 18, 2020
40
16
28
36
I have a problem that every time I want to cleanly reboot my proxmox server with either the reboot button in the PVE GUI, or via command line with `reboot`,
the watchdog power cycles my server!
I am afraid that crashing my server during reboot will eventually corrupt something.
We are using Dell servers with iDrac.

Any idea how to diagnose or fix this? @proxmox team?

System log after two reboots:
1631156982312.png

PS: I made a previous thread [1] were I described how I set it up and tested that the servers actually power cycle if I pruposely crash the watchdog daemon. But this is a new problem concerning the reboot not working.

[1] https://forum.proxmox.com/threads/testing-the-watchdog.86689
 
Last edited:
Hi,

do you use HA?

Is this starting only now, e.g., since a recent (kernel) update?

The syslog/journal from around the time when the reboot is triggered would be interesting too.
 
Hi, thanks for the reply!
do you use HA?
Yes, we do use HA.
Is this starting only now, e.g., since a recent (kernel) update?
No, this is not a regression. But I only noticed it now, since we rearely reboot your Proxmox servers.
We are still on Proxmox 6.3 with Kernel 5.4
The syslog/journal from around the time when the reboot is triggered would be interesting too.
Will collect that and get back to you next week.
 
Last edited:
Yes, we do use HA.
No, this is not a regression. But I only noticed it now, since we rearely reboot your Proxmox servers.
Hmm, OK, could be either an issue with the kernel, or some issues on when the system tries to cleanly shutdown the HA LRM service. Was the HA shutdown policy changed from the default? (You can check under Datacenter -> Options -> HA Settings)?

But the logs would definitively help to determine what component is involved.

We are still on Proxmox 6.3 with Kernel 5.4
In general, I'd recommend updating to latest PVE 6.4 soon, while I have no clear indication that this specific issue will be fixed by that, there are still quite some kernel, and also other important updates missing and those would be good to have in general.
 
  • Like
Reactions: kwinz
Hi @t.lamprecht!

1. We have now updated to the latest PVE 6.

2. "Datacenter -> Options -> HA Settings" is still set to "default"

3. The logs are telling: "IPMI Watchdog: Unexpected close, not stopping watchdog" appears at the end of the reboot process.
The line only occurs on VGA output however and is not persisted to /var/log/syslog


Screenshot_20210923-170611-rd.jpg

Screenshot_20210923-170344-rd.jpg

And in the IPMI logs this ALERT appears:
Thu Sep 23 2021 16:01:20​
The watchdog timer expired.​

As far as I can tell the syslog attached shows nothing interresting. The snipped that I attached contains two reboots. I think I updated also a few packages between the reboots. You can see that a shutdown was initiated, that the server is rebooting and after the reboot it finds the IPMI watchdog, loads the appropriate kernel driver and that the software watchdog is disabled.

Any idea why the watchdog kicker is simply killed instead of cleanly shut down?
 

Attachments

Last edited:
Hello Kwinz,

I just setup a new proxmox 7.4 cluster with 3 x Dell Poweredge R630 and the DRAC hardware watchdog like you did here, and I don't see this problem when I manually reboot one node. (I can't see in my drac log : "The watchdog timer power cycled the system.", so I think the node restart normally)

- Maybe this have been fixed with proxmox 7 ?
- Can you tell me if you still have the problem ?

Thanks :)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!