Proxmox server Mistry reboot - CPU80: Core temperature is above threshold, CPU clock throttled

linuxteam · Nov 17, 2025

Hi There,

I have a proxmox node (Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz) and noticed that my proxmox syslog (running Proxmox PVE 8.2.2 ) often outputs the following set of critical errors throughout the day and suddenly got rebooted twice so we checked logs and we are not able to trace any reboot/reset event so raised request with our oem and they are telling that there is no hardware related issues.

2025-11-15T00:59:04.489054+05:30 proxmox-soc1 pvestatd[2748900]: Use of uninitialized value $size in int at /usr/share/perl5/PVE/Storage/LVMPlugin.pm line 133.
2025-11-15T00:59:04.489134+05:30 proxmox-soc1 pvestatd[2748900]: Use of uninitialized value $free in int at /usr/share/perl5/PVE/Storage/LVMPlugin.pm line 133.
2025-11-15T00:59:04.489159+05:30 proxmox-soc1 pvestatd[2748900]: Use of uninitialized value $lvcount in int at /usr/share/perl5/PVE/Storage/LVMPlugin.pm line 133.

[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.2: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.2: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[Sat Nov 15 01:07:17 2025] i40e 0000:5e:00.2: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF

[Sat Nov 15 01:32:36 2025] CPU69: Core temperature is above threshold, cpu clock is throttled (total events = 4)
[Sat Nov 15 01:32:36 2025] CPU13: Core temperature is above threshold, cpu clock is throttled (total events = 4)
[Sat Nov 15 01:32:36 2025] CPU107: Core temperature is above threshold, cpu clock is throttled (total events = 1)
[Sat Nov 15 01:32:36 2025] CPU51: Core temperature is above threshold, cpu clock is throttled (total events = 1)
[Sat Nov 15 01:32:38 2025] CPU31: Core temperature is above threshold, cpu clock is throttled (total events = 16)
[Sat Nov 15 01:32:38 2025] CPU87: Core temperature is above threshold, cpu clock is throttled (total events = 16)
[Sat Nov 15 01:32:38 2025] CPU100: Core temperature is above threshold, cpu clock is throttled (total events = 1)
[Sat Nov 15 01:32:38 2025] CPU44: Core temperature is above threshold, cpu clock is throttled (total events = 1)
[Sat Nov 15 01:32:39 2025] CPU32: Core temperature is above threshold, cpu clock is throttled (total events = 8)
[Sat Nov 15 01:32:39 2025] CPU88: Core temperature is above threshold, cpu clock is throttled (total events = 8)
[Sat Nov 15 01:32:41 2025] CPU71: Core temperature is above threshold, cpu clock is throttled (total events = 6)
[Sat Nov 15 01:32:41 2025] CPU15: Core temperature is above threshold, cpu clock is throttled (total events = 6)
[Sat Nov 15 01:32:42 2025] CPU4: Core temperature is above threshold, cpu clock is throttled (total events = 8)
[Sat Nov 15 01:32:42 2025] CPU60: Core temperature is above threshold, cpu clock is throttled (total events = 8)
[Sat Nov 15 01:32:43 2025] CPU78: Core temperature is above threshold, cpu clock is throttled (total events = 15)
[Sat Nov 15 01:32:43 2025] CPU22: Core temperature is above threshold, cpu clock is throttled (total events = 15)
[Sat Nov 15 01:32:43 2025] CPU61: Core temperature is above threshold, cpu clock is throttled (total events = 25)
[Sat Nov 15 01:32:43 2025] CPU5: Core temperature is above threshold, cpu clock is throttled (total events = 25)
[Sat Nov 15 01:32:44 2025] CPU16: Core temperature is above threshold, cpu clock is throttled (total events = 2)
[Sat Nov 15 01:32:44 2025] CPU72: Core temperature is above threshold, cpu clock is throttled (total events = 2)
[Sat Nov 15 01:32:45 2025] CPU19: Core temperature is above threshold, cpu clock is throttled (total events = 16)
[Sat Nov 15 01:32:45 2025] CPU75: Core temperature is above threshold, cpu clock is throttled (total events = 16)
[Sat Nov 15 01:32:45 2025] CPU64: Core temperature is above threshold, cpu clock is throttled (total events = 29)
[Sat Nov 15 01:32:45 2025] CPU8: Core temperature is above threshold, cpu clock is throttled (total events = 29)
[Sat Nov 15 01:32:45 2025] CPU73: Core temperature is above threshold, cpu clock is throttled (total events = 21)

I have 3 other nodes as well in the same cluster and there also same hardware is running but there we are not facing this issues

I’m trying to determine what exactly triggered the reset. Below is the relevant section of the BMC/kernel logs

2025 Nov 14 19:26:49 UTC

4.1(3c)):kernel:-:[platform_reset_cb_handler]:77

latform Reset ISR -> ResetState: 1
2025 Nov 14 19:26:49 UTC

4.1(3c)):cipmi: Intel ME Operating State:
2025 Nov 14 19:26:49 UTC

4.1(3c)):cipmi: Intel ME is initializing.
2025 Nov 14 19:26:49 UTC

4.1(3c))

OCTOR-BMC: Stopping Application [tsa_server]
2025 Nov 14 19:26:49 UTC

4.1(3c)):doctor-bmc: Application tsa_server is stopped.
2025 Nov 14 19:26:49 UTC

4.1(3c)):cipmi: IPMI Request Message --> Chan:11, Netfn:0x00, Cmd:0x06, Data: 0x30 0x01, CC:0x00
2025 Nov 14 19:26:49 UTC

4.1(3c)):kernel:-:[platform_reset_cb_handler]:77

latform Reset ISR -> ResetState: 0

What I need help with

Where in Proxmox (or on the host) should I look to find the exact reset reason?
Which logs are most reliable for detecting watchdog resets or kernel panics?
Is there a recommended way to correlate BMC reset triggers with Proxmox journal logs?
Do Intel ME state changes like “M0 without UMA” indicate any deeper firmware problem?

guruevi · Nov 17, 2025

Well, I think it's pretty clear, your CPU is overheating. The second log from the IPMI request is just the system asking itself what the cause of the system restart was (Netfn 0x00 is Chassis, Cmd 0x06 is system reset cause). The data field is likely proprietary and you'd need the IPMI itself to answer what it means.

Presuming this is SuperMicro (because they are shit when finding these kinds of issues), then I would suggest logging the temperature graphs from IPMI on an external system, on better systems you can retrieve at least several weeks worth of temperature data from iLO/iDRAC type systems.

Use something like Prometheus collectors if you need to, your CPU is getting hot, why, likely a broken fan or the cooler came disconnected from the CPU or less likely dust or something clogging the intake. External factors could be datacenter cooling issues, but a proper investigation into the various intake temperatures and exhaust temperatures and core/CPU/motherboard and various chassis temperatures, providing your system has that (again, SuperMicro is bad at this).

linuxteam · Nov 20, 2025

HI Guruevi,

Thanks for your reply, we have done stress testing but we did not found any issues. CPU temperature also looking fine in IPMI.

There is no log found in dmesg or syslog or cluster log which can give us proper root cause of reboot.

Is there someone in the community who have also faced such type of issues, and did you find root cause of reboot? if yes then can you show us some way to dig it.

guruevi · Dec 1, 2025

Again, this is happening with 1 machine in a group of 3 identical machines. The errors are quite straight forward, the CPU is telling the kernel that it is getting too hot and it is throttling. The reason for that is unknown. Could be bug in UEFI, could be a specific bug in the CPU firmware, motherboard, but it's not the software, the software is doing what it is supposed to. At some point the CPU resets cold, so there will be no logs, because the CPU resets before anything can be written to disk. Given the logs, that's a potential cause, overheating or the CPU "thinks" it is overheating. If all else works and the CPU is actually being cooled and you can verify that, faulty CPU would be my guess.

I would start by updating the firmware for the motherboard and any CPU firmware (intel-microcode package on Proxmox), if that still doesn't fix it, then replace the CPU.

alexskysilk · Dec 1, 2025

linuxteam said:
I have 3 other nodes as well in the same cluster and there also same hardware is running but there we are not facing this issues

remove the heatsinks and reinstall with fresh thermal transfer compound. would be a good opportunity to clean and dust while you're there.

LnxBil · Dec 1, 2025

I also had this temperature phenomon in one of many HPE servers and never figured out why. First occurence was years ago and it still logs a lot of messages. iLO is not reporting this, so the problem seems only to be "visual" and we just considered it noise and ignored it.

Search

Search

Proxmox server Mistry reboot - CPU80: Core temperature is above threshold, CPU clock throttled

linuxteam

New Member

guruevi

Renowned Member

linuxteam

New Member

guruevi

Renowned Member

alexskysilk

Distinguished Member

LnxBil

Distinguished Member

We value your privacy