One of our clusters used to have HA enabled, whilst not anymore it seems one of our pve-nodes had been rebooted by watchdog-mux.
Looking at the logs, the machine rebooted at 15:31:10, not even 4 seconds earlier this message is displayed:
A couple minutes after the reboot completed, journalctl shows us this:
We did notice network traffic was quite high @ ~9Gbit/s during this period. Hardware resources were below 60% usage.
Any idea what could've caused this sudden reboot?
Looking at the logs, the machine rebooted at 15:31:10, not even 4 seconds earlier this message is displayed:
Jan 06 15:31:06 watchdog-mux[2482]: client watchdog expired - disable watchdog updates
A couple minutes after the reboot completed, journalctl shows us this:
Jan 06 15:32:27 node10 kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Jan 06 15:32:28 node10 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 06 15:32:28 node10 watchdog-mux[2442]: Watchdog driver 'Software Watchdog', version 0
Jan 06 15:32:30 node10 corosync[3311]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
We did notice network traffic was quite high @ ~9Gbit/s during this period. Hardware resources were below 60% usage.
Any idea what could've caused this sudden reboot?