2 Identical nodes, 1 stable and 1 kernel panic every day.

lightmaster

New Member
Feb 8, 2023
2
0
1
I've got 2 identical nodes running Proxmox (with another device acting as a qdevice). The only difference between the 2 nodes is that one has more memory than the other. 1 of them has been rock stable since I first installed Proxmox 7 on it, and the second can't go more than 24 hours without having a kernel panic, sometimes as little as an hour after boot and it crashes. I have tried swapping out for a known good set of RAM sticks, and I've tried running the computer's built-in diagnostic suite (RAM, CPU, SSD tests). I've also installed Pop_OS and ran that without it crashing for roughly a week. After a crash, I've looked in syslog and messages, and don't see anything that stands out before the crashes. I do have a picture of what little information is shown on the screen while it's in a kernel panic state and waiting to be powered off.

Is this picture enough to figure out what's causing the constant crashing? If not, what can I do to narrow down the cause?


20230217-060034.jpg
 
Could be a clock/timer issue or a powerstate/idle problem (or something else completely). Are both systems on the same/latest BIOS version? Do they have identical BIOS settings (C-states etc.)?