Unexplainable CPU load on Intel NUC

jbr

New Member
Jun 29, 2024
1
0
1
Hi all,

im running a double NUC proxmox environment for about 4 yours now. Love it!

The NUC's (NUC7I7BNH) are nearly identical, only thing thats is different is the SSD.

Anyhow, lately i saw that one the two had a continuous load on the CPU of 25% (as seen in summary-screen of proxmox). Its a system with 2 cores en ht, so 4 virtual cores.
I opened a shell, started top and saw that softirq.d and kworkers were using 25% and 75% (of 1 vcore that is assume), which explaines the 25% overall. There were NO vm's or lxc running that node.

After some furher digging (/proc/interrupts) i saw a whole load of thermal interrupts and saw that the temperature of both cpu's was nearly 100 degrees celcius (according to "sensors"). So, i exchanged the fan, and applied new thermal past.

After that the load was still 25% , but the temperatures didnt get up to 100% , but something in the 70-80 range.

Next thing i did was upgrading the bios. It went from version 0.88 tot 0.93 (i know, should have done that earlier). Rebooted the system.. and voila: only 0.5-1% load on the system! Hurray!

However, when i placed the nuc back into the cabinet (where my router , switches en nas are located) .. hello: the problem was back!

After some more fidlling it turns out that as long as there is a usb-stick attached to the NUC, the problem is gone. Removing it and rebooting, and the problems is there.

I've been working in IT for about 30 years, i've seen a lot, but this one is one of the weirdest things i've seen. I'm on a medium level when it comes to linux btw.

Here is some more info on the system:
root@nuc2:~# pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-2-pve)

root@nuc2:~# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 4: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 8: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 8: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
root@nuc2:~#

root@nuc2:~# modinfo xhci_hcd
filename: /lib/modules/6.8.8-2-pve/kernel/drivers/usb/host/xhci-hcd.ko
license: GPL
author: Sarah Sharp
description: 'eXtensible' Host Controller (xHC) Driver
srcversion: 9C7CA91D1F59F5E4006743C
depends:
retpoline: Y
intree: Y
name: xhci_hcd
vermagic: 6.8.8-2-pve SMP preempt mod_unload modversions

The other NUC, thats has 5 LXC containers on it, doenst have this problem (same version of proxmox)

Anyone here that has a simple (or complex) answer? Of maybe some debugging tips?

Ciao,
Jos
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!