I've had a Proxmox install running rock solid for months and all of a sudden it's randomly rebooting. It can run for 5 minutes or 5 hours. The weird thing is, I've basically u
sed up all of the collective posts on the subject and still can't track it down. I'm now at the point where I really just want to know what the cause is for my own sanity. haha
I can't find anything in syslog. I literally watched it reboot and it just goes right from nothing to --REBOOT--.
Although it seems random, it does seem to do it more during the time I'm around, so it might be tied to load, however, it does still reboot without the fast NICs installed and only a 1 gig web interface going.
Things I have tried already:
Updated Proxmox to newest version. apt reinstalled the Proxmox specific packages.
Firmware on all the NICs and SAS card are up to date.
BIOS is up to date.
Tried both stock 5.x kernel and the optional 6.2 kernel.
Tested RAM. No issues.
Replaced PSU. (seems to be the most commonly suggested issue)
Power is rock solid and is being fed by a line-interactive UPS.
Removed NICs
Restarts even if no VMs are running.
Changed the watchdog settings to 0 in... whatever config has them - I don't even remember anymore.
Motherboard doesn't have a built in watchdog reset timer from what I can tell.
I can provide any logs or outputs anyone can think of to try and figure this out. The last resort is to reinstall Proxmox, but I think that's avoiding the problem (which may still exist if it's somehow related to hardware).
sed up all of the collective posts on the subject and still can't track it down. I'm now at the point where I really just want to know what the cause is for my own sanity. haha
I can't find anything in syslog. I literally watched it reboot and it just goes right from nothing to --REBOOT--.
Although it seems random, it does seem to do it more during the time I'm around, so it might be tied to load, however, it does still reboot without the fast NICs installed and only a 1 gig web interface going.
Ryzen 5650G Pro
4x Timetec 16GB 2666 MHz ECC
ASRock X570 Steel Legend motherboard
Aquantia 10 GbE NIC
LSI SAS controller (passed through to a VM - system crashes without the VM running though)
Mellanox ConnectX-4 25 GbE NIC (10 and 25 GbE NICs are bridged in Proxmox to act as a switch)
Misc SSDs for booting, VM storage. All mirrored.
Originally a 850W Seasonic PSU, now an 860W Fractal
Not that it matters - Rosewill 4U server case.
All the PCIe cards have a fan directly blowing on them since they all tend to get toasty.
Things I have tried already:
Updated Proxmox to newest version. apt reinstalled the Proxmox specific packages.
Firmware on all the NICs and SAS card are up to date.
BIOS is up to date.
Tried both stock 5.x kernel and the optional 6.2 kernel.
Tested RAM. No issues.
Replaced PSU. (seems to be the most commonly suggested issue)
Power is rock solid and is being fed by a line-interactive UPS.
Removed NICs
Restarts even if no VMs are running.
Changed the watchdog settings to 0 in... whatever config has them - I don't even remember anymore.
Motherboard doesn't have a built in watchdog reset timer from what I can tell.
I can provide any logs or outputs anyone can think of to try and figure this out. The last resort is to reinstall Proxmox, but I think that's avoiding the problem (which may still exist if it's somehow related to hardware).