Kernel segfault - ProxMox slowly dying afterwards

namebereitsvergeben

New Member
Feb 17, 2025
2
0
1
Hi,

got a problem with my ProxMox Server. It runs quite fine for some hours, but then there are more and more kernel errors until the server is not responding anymore.
It's still running, but the VMs are unresponsive and the UI is showing them in an undefinied state.

These are the most relevant logs, one spoiler for every occurence.

kernel: perf: interrupt took too long (2695 > 2500), lowering kernel.perf_event_max_sample_rate to 74000


Feb 16 19:30:00 yubbMox kernel: [celeryd: celer[5242]: segfault at 5e93f6995a08 ip 000000000050c660 sp 00007ffd9d08a138 error 4 in python3.11[41f000+2b5000] likely on CPU 1 (core 1, socket 0)
Feb 16 19:30:00 yubbMox kernel: Code: 25 00 02 00 00 85 c0 0f 85 e8 fe ff ff e9 b6 fe ff ff 48 89 c3 e9 a5 fe ff ff 48 89 ef ff d0 eb e4 66 0f 1f 84 00 00 00 00 00 <48> 8b 47 08 f6 80 a9 00 00 00 40 75 03 31 c0 c3 48 8b 80 48 01 00


Feb 16 19:55:57 yubbMox kernel: .NET BGC[279856]: segfault at 5f93a9145838 ip 00007f942727aabd sp 00007f52c27fb790 error 4 in libcoreclr.so[7f9426fa9000+4b7000] likely on CPU 7 (core 7, socket 0)
Feb 16 19:55:57 yubbMox kernel: Code: f4 49 c1 ec 09 47 8b 2c a7 44 89 f1 c1 e9 04 bf 01 00 00 00 d3 e7 41 0f a3 cd 72 c1 41 09 fd 47 89 2c a7 4d 8b 3e 49 83 e7 f8 <41> 8b 0f 85 c9 78 04 31 f6 eb 0b 41 8b 7e 08 0f b7 f1 48 0f af f7


Feb 16 20:39:02 yubbMox kernel: gunicorn: worke[308149]: segfault at 5f1f2e0b9848 ip 000000000050c4a3 sp 00007f1f5d7f6490 error 4 in python3.11[41f000+2b5000] likely on CPU 1 (core 1, socket 0)
Feb 16 20:39:02 yubbMox kernel: Code: 3d e0 c6 50 00 75 54 8b 85 a8 00 00 00 25 00 02 00 00 85 c0 75 2d 90 48 83 eb 01 78 ce 49 8b 45 18 48 8b 2c d8 48 85 ed 74 ed <48> 8b 45 08 f6 80 a9 00 00 00 40 74 e0 48 8b 80 48 01 00 00 48 85


Could you help me out with this?
I've read about disabling the C6-State - tried this without success (obviousliy).

Any help is appreciated.
 
Tried a lot and did a kernel downgrade before posting here.

In the end it seems to be a temperature issue: The mini-pc running proxmoox is cooled completely passive (the case consists of metal). And, as I found out, the SSD is at the bottom of the case, where air cant circulate well.
After turning the pc by 90 degrees, so it stands on it's side and top and bottom are exposed to air, I had no more crashes/kernel errors.