Hi all,
im running a double NUC proxmox environment for about 4 yours now. Love it!
The NUC's (NUC7I7BNH) are nearly identical, only thing thats is different is the SSD.
Anyhow, lately i saw that one the two had a continuous load on the CPU of 25% (as seen in summary-screen of proxmox). Its a system with 2 cores en ht, so 4 virtual cores.
I opened a shell, started top and saw that softirq.d and kworkers were using 25% and 75% (of 1 vcore that is assume), which explaines the 25% overall. There were NO vm's or lxc running that node.
After some furher digging (/proc/interrupts) i saw a whole load of thermal interrupts and saw that the temperature of both cpu's was nearly 100 degrees celcius (according to "sensors"). So, i exchanged the fan, and applied new thermal past.
After that the load was still 25% , but the temperatures didnt get up to 100% , but something in the 70-80 range.
Next thing i did was upgrading the bios. It went from version 0.88 tot 0.93 (i know, should have done that earlier). Rebooted the system.. and voila: only 0.5-1% load on the system! Hurray!
However, when i placed the nuc back into the cabinet (where my router , switches en nas are located) .. hello: the problem was back!
After some more fidlling it turns out that as long as there is a usb-stick attached to the NUC, the problem is gone. Removing it and rebooting, and the problems is there.
I've been working in IT for about 30 years, i've seen a lot, but this one is one of the weirdest things i've seen. I'm on a medium level when it comes to linux btw.
Here is some more info on the system:
root@nuc2:~# pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-2-pve)
root@nuc2:~# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 4: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 8: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 8: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
root@nuc2:~#
root@nuc2:~# modinfo xhci_hcd
filename: /lib/modules/6.8.8-2-pve/kernel/drivers/usb/host/xhci-hcd.ko
license: GPL
author: Sarah Sharp
description: 'eXtensible' Host Controller (xHC) Driver
srcversion: 9C7CA91D1F59F5E4006743C
depends:
retpoline: Y
intree: Y
name: xhci_hcd
vermagic: 6.8.8-2-pve SMP preempt mod_unload modversions
The other NUC, thats has 5 LXC containers on it, doenst have this problem (same version of proxmox)
Anyone here that has a simple (or complex) answer? Of maybe some debugging tips?
Ciao,
Jos
im running a double NUC proxmox environment for about 4 yours now. Love it!
The NUC's (NUC7I7BNH) are nearly identical, only thing thats is different is the SSD.
Anyhow, lately i saw that one the two had a continuous load on the CPU of 25% (as seen in summary-screen of proxmox). Its a system with 2 cores en ht, so 4 virtual cores.
I opened a shell, started top and saw that softirq.d and kworkers were using 25% and 75% (of 1 vcore that is assume), which explaines the 25% overall. There were NO vm's or lxc running that node.
After some furher digging (/proc/interrupts) i saw a whole load of thermal interrupts and saw that the temperature of both cpu's was nearly 100 degrees celcius (according to "sensors"). So, i exchanged the fan, and applied new thermal past.
After that the load was still 25% , but the temperatures didnt get up to 100% , but something in the 70-80 range.
Next thing i did was upgrading the bios. It went from version 0.88 tot 0.93 (i know, should have done that earlier). Rebooted the system.. and voila: only 0.5-1% load on the system! Hurray!
However, when i placed the nuc back into the cabinet (where my router , switches en nas are located) .. hello: the problem was back!
After some more fidlling it turns out that as long as there is a usb-stick attached to the NUC, the problem is gone. Removing it and rebooting, and the problems is there.
I've been working in IT for about 30 years, i've seen a lot, but this one is one of the weirdest things i've seen. I'm on a medium level when it comes to linux btw.
Here is some more info on the system:
root@nuc2:~# pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-2-pve)
root@nuc2:~# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 4: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 8: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 8: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
root@nuc2:~#
root@nuc2:~# modinfo xhci_hcd
filename: /lib/modules/6.8.8-2-pve/kernel/drivers/usb/host/xhci-hcd.ko
license: GPL
author: Sarah Sharp
description: 'eXtensible' Host Controller (xHC) Driver
srcversion: 9C7CA91D1F59F5E4006743C
depends:
retpoline: Y
intree: Y
name: xhci_hcd
vermagic: 6.8.8-2-pve SMP preempt mod_unload modversions
The other NUC, thats has 5 LXC containers on it, doenst have this problem (same version of proxmox)
Anyone here that has a simple (or complex) answer? Of maybe some debugging tips?
Ciao,
Jos
Last edited: