PVE "random" reboots

vaclavku

New Member
Dec 17, 2024
1
0
1
Dear community,
Kindly ask you to share your view on the potential reasons of random reboots.
journalctl --list-boots
-8 Tue 2025-05-06 12:45:09 CEST Thu 2025-05-08 09:55:52 CEST
-7 Thu 2025-05-08 10:36:49 CEST Thu 2025-05-08 19:40:30 CEST
-6 Thu 2025-05-08 19:41:17 CEST Mon 2025-05-12 16:33:05 CEST
-5 Mon 2025-05-12 16:42:20 CEST Thu 2025-05-15 22:22:39 CEST
-4 Thu 2025-05-15 22:23:17 CEST Fri 2025-05-16 10:08:37 CEST
-3 Fri 2025-05-16 10:09:33 CEST Sat 2025-05-17 04:59:55 CEST
-2 Sat 2025-05-17 05:00:39 CEST Sun 2025-05-18 04:33:47 CEST
-1 Sun 2025-05-18 04:34:26 CEST Sun 2025-05-18 15:04:18 CEST
0 Sun 2025-05-18 15:04:57 CEST Sun 2025-05-18 19:10:59 CEST

In the logs (syslog, journalctl, dmesg), there's no information about any bug: just info about reboot.
90% of issues are solved automatically by reboot. Sometimes, however the PVE is frozen.

I've done HW tests (memtests, disk smart, CPU stress, power) without anything wrong.

I'm using PVE 8.4.1 stable enterprise version, I've updated kernel (6.11.11-2-pve).

I've these parameters:

Supermicro CSE-732I-500B, 4x 3,5" SATA/SAS, 2x 5,25", 500W, black
ASRock - X470D4U
AMD Ryzen 9 5950X
Mushkin Proline 32GB(1x32GB) 2666MHz -4x
SSD disk Samsung 980 PRO 1TB
SSD adapter AXAGON PCEM2-N / PCIe x4 - M.2 NVMe M-key
Seagate IronWolf Pro 16TB CMR - RAID1
EVGA SuperNOVA 650 G7

I've the latest BIOS:


BMC Firmware Version

3.02.00

BIOS Firmware Version

P4.20

PSP Firmware Version

0.14.0.27

I'm using ZFS pool to store data, however the systems are running on the SSD disk.
I only have two VM's with docker environment. Nothing else.
I've monitoring in place, but doesn't show any extraordinary events (such as peaks etc.).
During some frozen restarts, I saw: “watchdog bug soft lookup CPU stuck” in console.
I general, the reboot is almost always during a backup (PBS).

I tried this:
vm.swappiness = 1
vm.min_free_kbytes = 128/64
set the Core Watchdog Timer Enable setting to Disable

https://www.reddit.com/r/Proxmox/comments/1hanfrj/possible_fix_for_random_reboots_on_proxmox_83/

I've also crosschecked the ZFS meme cache.

Do you have any ideas of what I can test/update?

Many thanks.