[SOLVED] CPU frequency drops when a certain VM is running

diedenieded

New Member
Nov 12, 2022
5
0
1
Hello all.

I'm running into an issue where my idle clocks on the host is dropping when a certain VM is running, in this case, a VM running TrueNAS, and the clocks even go lower when under load (etc. a stress test).

My setup is a Mini PC with a Ryzen 4500u with 16gbs of RAM, and I have 3 VMs running as shown below.

truenas
1668388773496.png

untruenas

1668388785229.png

services
1668388798311.png

I've done quite a bit of testing myself, and it seems that once the truenas VM is started, the CPU clocks drop to base clocks (~2400 MHz) and even lower (~1400 MHz) when placed under any kind of load.
This does not happen if I have no VMs running, or only untruenas and services are running. The CPU boosts to ~3900 MHz, which is the rated boost clock.
The drop in clock speed does not resolve itself when the truenas VM is stopped and the host needs to be restarted to fix it.

Below are the stress tests I did using s-tui (https://github.com/amanusk/s-tui), and I've confirmed that the CPU is not thermal throttling. The tests in the images below are short to illustrate the boost clocks working, if left longer, the first test settles at around 3000 MHz.
Test with untruenas and services running
1668390003666.png

Test with only truenas running, after a host restart
1668390475139.png

Also notice that the package and core powers are no longer being reported correctly.

The CPU governor is set to performance:
Code:
root@proxmox:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance

The CPU is also not power-limited. Here's the output of RyzenAdj:
Code:
CPU Family: Renoir
SMU BIOS Interface Version: 18
Version: v0.11.1
PM Table Version: 370005
|        Name         |   Value   |      Paramter      |
|---------------------|-----------|--------------------|
| STAPM LIMIT         |    15.000 | stapm-limit        |
| STAPM VALUE         |     5.472 |                    |
| PPT LIMIT FAST      |    30.000 | fast-limit         |
| PPT VALUE FAST      |     3.352 |                    |
| PPT LIMIT SLOW      |    25.000 | slow-limit         |
| PPT VALUE SLOW      |     5.335 |                    |
| StapmTimeConst      |   275.000 | stapm-time         |
| SlowPPTTimeConst    |     5.000 | slow-time          |
| PPT LIMIT APU       |    25.000 | apu-slow-limit     |
| PPT VALUE APU       |     5.335 |                    |
| TDC LIMIT VDD       |    33.000 | vrm-current        |
| TDC VALUE VDD       |     3.119 |                    |
| TDC LIMIT SOC       |    13.000 | vrmsoc-current     |
| TDC VALUE SOC       |     1.370 |                    |
| EDC LIMIT VDD       |    50.000 | vrmmax-current     |
| EDC VALUE VDD       |    15.256 |                    |
| EDC LIMIT SOC       |    17.000 | vrmsocmax-current  |
| EDC VALUE SOC       |     0.000 |                    |
| THM LIMIT CORE      |   100.000 | tctl-temp          |
| THM VALUE CORE      |    39.119 |                    |
| STT LIMIT APU       |     0.000 | apu-skin-temp      |
| STT VALUE APU       |     0.000 |                    |
| STT LIMIT dGPU      |     0.000 | dgpu-skin-temp     |
| STT VALUE dGPU      |     0.000 |                    |
| CCLK Boost SETPOINT |    50.000 | power-saving /     |
| CCLK BUSY VALUE     |    18.130 | max-performance    |


I suspect it has something to do with the PCIe passthrough but I don't know enough to diagnose it, so here is the lspci output of all the PCIe devices that are being passed through on the host.
On untruenas
05:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
On truenas
07:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81) 07:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81)

I have not tested removing the SATA controllers from truenas as I don't want to mess up my storage pools.

Does anyone know what's going on or how to fix this? Thanks in advance.
 
Please post the full output from the PVE-host in code-tags of:
Bash:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

My guess is, that there are also other devices in the IOMMU-group(s) of the SATA-controller(s) and therefore they are also not available to the PVE-host anymore after the SATA-controller(s) actually get PCIe-passthroughed (= at the start of the VM).

This can be mostly expected, as the onboard/integrated SATA-controller(s) are normally ever integrated in or connected to the chipset.
 
  • Like
Reactions: leesteken
Here it is.

Code:
root@proxmox:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 10 01:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. OM3PDP3 NVMe SSD [2646:500d] (rev 01)IOMMU group 11 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
IOMMU group 12 03:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
IOMMU group 13 04:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 01)
IOMMU group 14 05:00.0 USB controller [0c03]: VIA Technologies, Inc. VL805 USB 3.0 Host Controller [1106:3483] (rev 01)
IOMMU group 1 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 4 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 5 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 6 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 7 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 7 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU group 7 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU group 7 06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c3)
IOMMU group 7 06:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
IOMMU group 7 06:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security
Processor [1022:15df]
IOMMU group 7 06:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU group 7 06:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU group 7 06:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2] (rev 01)
IOMMU group 7 06:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
IOMMU group 7 06:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Sensor Fusion Hub [1022:15e4]
IOMMU group 7 07:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81)IOMMU group 7 07:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81)IOMMU group 8 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
IOMMU group 8 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 9 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0 [1022:1448]
IOMMU group 9 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1 [1022:1449]
IOMMU group 9 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2 [1022:144a]
IOMMU group 9 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3 [1022:144b]
IOMMU group 9 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4 [1022:144c]
IOMMU group 9 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5 [1022:144d]
IOMMU group 9 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6 [1022:144e]
IOMMU group 9 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7 [1022:144f]
 
Indeed many devices are in the big Ryzen chipset IOMMU group 7. When doing passthrough of a single device from one group, the host loses all devices from that group.
 
I see, that also explains why s-tui stopped reporting the CPU power once truenas has started. The CPU loses access to IOMMU group 7 06:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Sensor Fusion Hub [1022:15e4].

Is there a way around this, or the SATA controller needs to be removed from the VM.
 
Is there a way around this, or the SATA controller needs to be removed from the VM.
Yes but it will break the (security) isolation between host and VMs: search for pcie_acs_override.
Note that dynamic memory won't work with PCIe passthrough and there is no guarantee that the SATA controller resets properly (search for this particular controller) and is usable inside the VM.
Given that you are using a mini-pc, do you really need to passthrough the controller with all (one?) connected drive(s)? Why not passthrough the disk instead of all the PCIe passthrough troubles.
 
Yes but it will break the (security) isolation between host and VMs: search for pcie_acs_override.
Note that dynamic memory won't work with PCIe passthrough and there is no guarantee that the SATA controller resets properly (search for this particular controller) and is usable inside the VM.
Given that you are using a mini-pc, do you really need to passthrough the controller with all (one?) connected drive(s)? Why not passthrough the disk instead of all the PCIe passthrough troubles.
Thanks, I'll look into that. There are two disks inside my Mini PC that are in a vdev in truenas. I don't think ZFS will be too happy if I pass the disks instead of the controllers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!