[SOLVED] Dell PowerEdge R6615 fans revving up and down constantly

Hello,

we've got 3 Dell PowerEdge R6615 with an AMD Epyc 9174F in it and the fans are doing something weird. Most of the time they are at 8-9k RPM and fairly quiet. But every minute or 2 they are revving up to full speed (~ 24k) or half speed (~ 13k) and then back down.
There is almost no load on this system (we're in the evaluating steps rn), just proxmox + ceph and 3 CTs idling.

Im looking at the temps via IPMI and the CPU temp is around 50 to 55°C, but they dont seem connected. Sometimes the fans are at 24k RPM while the CPU is at 50° and at other times the CPU is at 53 and the fans are at 8k RPM.

Ive set the system profile in "iDrac" to "performance per watt (OS)" and the workload profile to "unconfigured". The Thermal Profile Optimization is set to "Default" with fan speed offset "off".

Does anybody know how we can manage these fans better? It'd be okay if they run at the same speed constantly, but this revving up and down all the time is really problematic I think.

Thanks for your help!
 
Same server, same here. It did seem to improve a little bit when setting the Thermal Profile Optimization to Maximum Performance.
 
I did some tests with a Live ISO (Rocky Linux) from Dell themselves.
Running sysbench cpu or sysbench memory increases the CPU temp, but the fans stay quiet.
Running iperf with ports from the LOM -> fans go instantly to 100%.
This morning I rebooted to proxmox and freed the LOM network ports (to another network card). I think it got slightly better, but not really. I migrated it back to the LOM for now.
Can you run some tests too lalders? Can you confirm or deny findings wrt to the network cards being the culprit?

Does somebody have the 2U variant (Dell R7615)? Does it have the same problem? Whats its loudness?
 
Can you please confirm which ports you mean by LOM? I have tried both physical controllers but didn't notice any difference either. It didn't even require running iperf, as the fans already started to blow like crazy when logging in to Proxmox.

1741017195797.jpeg

I am reaching out to our supplier to find out if they can confirm that this is a known issue.
 
the LOM is the NIC connected to the Lan-On-Module port. in the case of the illustration its the one with the 4 ports, of which port 4 (1?) is connected.

FTR there are two variants (well, chipset vendors for rj45 port models; there are a bunch of variants) of a lom for these servers- Intel and BCM. there are variants of both that cause Debian issues, although more commonly the broadcom ones.
as the fans already started to blow like crazy when logging in to Proxmox.
If this happens with thermal management on the BIOS NOT SET to application controlled, check for firmware updates and/or open a support ticket with Dell.
 
Oh I thought the LOM were the two ports on the left? Called eno8303 and eno8403 in PVE.
But seems like Im wrong: Weve got the BCM 57508 (2x 100G) LOM plus two BCM 57454 (4x 10G RJ45) network cards.
According to OME all firmware is uptodate.
We have a meeting with DELL today discussing what options we have.