Watchdog.mux rebooting the the server

albgen

Member
Jan 9, 2021
19
0
21
44
Dears,

i have a very strange issue and basically on the logs i see the following:
Code:
root@SrvAuct2807206 ~ # journalctl --since "2025-10-10 00:00" --until "2025-10-11 23:59" | grep -i "watchdog"
Oct 10 13:24:33 SrvAuct2807206 watchdog-mux[898]: got terminate request
Oct 10 13:24:33 SrvAuct2807206 watchdog-mux[898]: clean exit
Oct 10 13:24:33 SrvAuct2807206 systemd[1]: Stopping watchdog-mux.service - Proxmox VE watchdog multiplexer...
Oct 10 13:24:33 SrvAuct2807206 systemd[1]: watchdog-mux.service: Deactivated successfully.
Oct 10 13:24:33 SrvAuct2807206 systemd[1]: Stopped watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 10 13:24:33 SrvAuct2807206 systemd[1]: watchdog-mux.service: Consumed 1.364s CPU time, 1.8M memory peak.
Oct 10 13:24:45 SrvAuct2807206 systemd[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 10 13:24:45 SrvAuct2807206 systemd[1]: Watchdog running with a hardware timeout of 10min.
Oct 10 13:24:45 SrvAuct2807206 kernel: watchdog: watchdog0: watchdog did not stop!
Oct 10 13:24:45 SrvAuct2807206 systemd-shutdown[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 10 13:24:45 SrvAuct2807206 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Oct 10 13:26:06 SrvAuct2807206 kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Oct 10 13:26:07 SrvAuct2807206 systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 10 13:26:07 SrvAuct2807206 watchdog-mux[887]: Watchdog driver 'Software Watchdog', version 0
Oct 11 09:16:54 SrvAuct2807206 watchdog-mux[887]: got terminate request
Oct 11 09:16:54 SrvAuct2807206 watchdog-mux[887]: clean exit
Oct 11 09:16:54 SrvAuct2807206 systemd[1]: Stopping watchdog-mux.service - Proxmox VE watchdog multiplexer...
Oct 11 09:16:54 SrvAuct2807206 systemd[1]: watchdog-mux.service: Deactivated successfully.
Oct 11 09:16:54 SrvAuct2807206 systemd[1]: Stopped watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 11 09:16:54 SrvAuct2807206 systemd[1]: watchdog-mux.service: Consumed 1.232s CPU time, 2M memory peak.
Oct 11 09:16:57 SrvAuct2807206 systemd[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 11 09:16:57 SrvAuct2807206 systemd[1]: Watchdog running with a hardware timeout of 10min.
Oct 11 09:16:57 SrvAuct2807206 kernel: watchdog: watchdog0: watchdog did not stop!
Oct 11 09:16:57 SrvAuct2807206 systemd-shutdown[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 11 09:16:57 SrvAuct2807206 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Oct 11 09:18:14 SrvAuct2807206 kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Oct 11 09:18:15 SrvAuct2807206 systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 11 09:18:15 SrvAuct2807206 watchdog-mux[895]: Watchdog driver 'Software Watchdog', version 0
Oct 11 09:26:05 SrvAuct2807206 watchdog-mux[895]: got terminate request
Oct 11 09:26:05 SrvAuct2807206 watchdog-mux[895]: clean exit
Oct 11 09:26:05 SrvAuct2807206 systemd[1]: Stopping watchdog-mux.service - Proxmox VE watchdog multiplexer...
Oct 11 09:26:05 SrvAuct2807206 systemd[1]: watchdog-mux.service: Deactivated successfully.
Oct 11 09:26:05 SrvAuct2807206 systemd[1]: Stopped watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 11 09:26:09 SrvAuct2807206 systemd[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 11 09:26:09 SrvAuct2807206 systemd[1]: Watchdog running with a hardware timeout of 10min.
Oct 11 09:26:09 SrvAuct2807206 kernel: watchdog: watchdog0: watchdog did not stop!
Oct 11 09:26:09 SrvAuct2807206 systemd-shutdown[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
Oct 11 09:26:09 SrvAuct2807206 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Oct 11 09:27:25 SrvAuct2807206 kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Oct 11 09:27:26 SrvAuct2807206 systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Oct 11 09:27:26 SrvAuct2807206 watchdog-mux[886]: Watchdog driver 'Software Watchdog', version 0
root@SrvAuct2807206 ~ #

this is proxmox 9.0.11. The vendor did a hardware test and found it healthy so it seems there is something with the kernel. Now, i have just installed the latest one with the hope that this will fix the issue because it seems like a kernel issue or driver/firmware:
Code:
root@SrvAuct2807206 ~ # uname -a
Linux SrvAuct2807206 6.17.1-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.17.1-1 (2025-10-06T16:20Z) x86_64 GNU/Linux

Anybody has the same issue?

thank you
 
Last edited:
Have you been able to find the reason for the reboot?
It looks to me as an issue with corosync setup.
As you know corosync is sensitive to latency changes and network stability and therefore needs a dedicated NIC.
The dedicated NIC does not need much bandwidth, 1G is usually recommended.
From Proxmox VE 9 documentation [1]:
  • Recommend to use a dedicated NIC exclusively for Corosync communication and
  • Re recommend to give Corosync at least a secondary link for redundancy.
  • Recommend to not use a network bond for Corosync's primary link, see [2] for more details.
Additionally, when a node hosts HA resources, and it loses corosync quorum for over 50-60 seconds it will fence itself (reboot). If the network becomes unusable from a majority of nodes in the cluster, that would result in the entire cluster rebooting. See [3] for more info about fencing.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_requirements
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_corosync_over_bonds
[3] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_fencing