Fan Control, LM-Sensors HP Workstation | CPU Temp too high??

gs800uk

New Member
Oct 2, 2023
13
0
1
Hi all,

I wanted to reach out to anyone who may have experience running Proxmox using HP workstations (Z8 G4 - 48 x Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz (2 Sockets) in my case) and how to control the fans/thermal temperatures.

I currently run around 5 Windows VMs on this machine with a fairly low load of 5% CPU host usage on average so nothing major at this point but watching the output of watch sensors of CPU temp is avg 60c-65c which is pretty high.... considering this CPU has Tcase of 85c max. Obviously, this is a concern as i will be adding more VMs to this with heavier loads.

So investigating further it would seem all system fans 'sound' like they are on the lowest RPM. To test i made another Windows VM (assigned 24 cores) and ran prime 95 to add more stress to the host to see if the fans kick in..... They don't and i got near the 85c mark and stopped the test. They still remain in the lowest RPM state regardless it seems so i had a look in the BIOS settings to see if one could adjust the fan curve and the only option for this is setting a 'min fan speed' which i set to 20% to be safe.

What i noticed is sensors output have the wrong high and critical temps and as mentioned above Intel say these chips are 85c max:

Code:
coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +59.0°C  (high = +93.0°C, crit = +103.0°C)
Core 0:        +59.0°C  (high = +93.0°C, crit = +103.0°C)
Core 1:        +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 2:        +47.0°C  (high = +93.0°C, crit = +103.0°C)
Core 3:        +50.0°C  (high = +93.0°C, crit = +103.0°C)
Core 9:        +49.0°C  (high = +93.0°C, crit = +103.0°C)
Core 10:       +47.0°C  (high = +93.0°C, crit = +103.0°C)
Core 16:       +50.0°C  (high = +93.0°C, crit = +103.0°C)
Core 18:       +47.0°C  (high = +93.0°C, crit = +103.0°C)
Core 19:       +51.0°C  (high = +93.0°C, crit = +103.0°C)
Core 24:       +49.0°C  (high = +93.0°C, crit = +103.0°C)
Core 26:       +47.0°C  (high = +93.0°C, crit = +103.0°C)
Core 27:       +48.0°C  (high = +93.0°C, crit = +103.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +93.0°C, crit = +103.0°C)
Core 16:       +49.0°C  (high = +93.0°C, crit = +103.0°C)
Core 9:        +49.0°C  (high = +93.0°C, crit = +103.0°C)
Core 10:       +49.0°C  (high = +93.0°C, crit = +103.0°C)
Core 11:       +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 2:        +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 17:       +50.0°C  (high = +93.0°C, crit = +103.0°C)
Core 18:       +51.0°C  (high = +93.0°C, crit = +103.0°C)
Core 19:       +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 24:       +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 25:       +48.0°C  (high = +93.0°C, crit = +103.0°C)
Core 26:       +50.0°C  (high = +93.0°C, crit = +103.0°C)
Core 27:       +47.0°C  (high = +93.0°C, crit = +103.0°C)

Ive tried the Fancontrol setup but get the 'There are no pwm-capable sensor modules installed' in the hope i could perhaps change these incorrect high and crit values to what they should be or at least put my own custom behaviour in place....

What I would like to know is there any other way i can control this or edit those values? Perhaps I don't have the correct modules installed? I'm at a loss here with where to look next and would greatly appreciate any help :)
 
You ever get this figured out? Right now I have to use BIOS, but it's an unnecessary restart..
 
I did figure it out,

The recorded values from that command i believe are readings from TJ MAX as apposed to TCase which from what i read can be 15-20c difference.

According to https://ark.intel.com/content/www/u...old-6136-processor-24-75m-cache-3-00-ghz.html my cpu is 85 TCase.

So it does make sense that in fact i got panicked over the wrong type of reading. To take it a step further i ran Prime95 in one of my VMs for 30mins to raise these temps to 80-85c (TJ MAX) and both CPU bouncing in usage 60-80-100% usage, and it stayed around this temp with no thermal shutdown etc. I ran this test a good 5 times and still performed well.

I still under the HP bios have the lowest rpm set to 20% to be safe and been running this machine for a good 8 months now with no problems.