[TUTORIAL] Fix always high CPU frequency in proxmox host.

chchia

Active Member
Aug 18, 2020
82
30
38
47
I notice that, even if I set all my CPU governor to powersaving with default intel_pstate drive, still the CPU gets up to maximum frequency as soon as VM started even it is almost idle in VM. CPU report in host show CPU boost up to 4Ghz for all cores. CPU temperature reported at about 50c.

So I replaced the intel_pstate with acpi-cpufreq drive.

since i am using homelab, I will say this is a great improvement, lower CPU voltage, lower temperature. yet not much noticeable performance impact. unless you care about benchmark result.
1613381780818.png


how to do:

1. follow this guide and reboot the host
https://silvae86.github.io/2020/06/13/switching-to-acpi-power/

in summary:
Code:
apt-get update
apt-get install acpi-support acpid acpi

#edit /etc/default/grub and add intel_pstate=disable to GRUB_CMDLINE_LINUX_DEFAULT
GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable"

update-grub

reboot

2. change you proxmox host CPU governor

Code:
#this command set all CPU to conservative mode, most of the CPU available governor mode using acpi will be:
# conservative ondemand userspace powersave performance schedutil
#You can contrab -e and put below command with @reboot
echo "conservative" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor


you can use i7z or cpupower to monitor your temperature. hope this help if you also have high cpu frequency lock issues with promox.


also to mention that, this fix the CU usage report issues for me too, previously in proxmox gui it will report windows 10 VM is using 15% ~ 20% of CPU while the VM is idle, now the gate is much smaller and i hope it is more accurate now. (OS Type: Other)
1613383524992.png
 
Last edited:
I am not sure if the "replace the intel-pstate CPU power management driver with the acpi-cpufreq one" part is still valid for recent CPUs. What I have done is simply set the powersave state and let the CPU do the rest.

Code:
echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

If you are happy with this, you can add it to crontab with
Code:
@reboot echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

You can use cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors to see which governors are available.
 
I am not sure if the "replace the intel-pstate CPU power management driver with the acpi-cpufreq one" part is still valid for recent CPUs. What I have done is simply set the powersave state and let the CPU do the rest.

Code:
echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

If you are happy with this, you can add it to crontab with
Code:
@reboot echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

You can use cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors to see which governors are available.

That's the same I do, but using the configuration file is a bit nicer.

Code:
## cpu scaling
# proxmox uses performance by default change to powersave to enable cpu scaling
# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
apt install cpufrequtils

cat << 'EOF' > /etc/default/cpufrequtils
GOVERNOR="powersave"
EOF
 
  • Like
Reactions: ITT and masgo
well, i tried pstate with powersave, at it still try to kick my CPU to highest frequency and making my CPU 10c higher than ACPI conservative, even though my VM is not doing anything extensive.

1614759159493.png
 
well, i tried pstate with powersave, at it still try to kick my CPU to highest frequency and making my CPU 10c higher than ACPI conservative, even though my VM is not doing anything extensive.

View attachment 24101
The % display of CPU load might be wrong when the CPU is scaling its frequency down. When running on anything other than "performance" you should look at the temperature and power-consumption (~ voltage). I use netdata for this, but you can choose any tool since netdata might be a litte overkill.
 
yeap, i am using i7z to monitor cpu frequency and temperature, and using acpi is significanly lower frequency and temperature yet performance is similar.

1614762648642.png
 
Any luck for AMD? I'm having awful single core perf on a 5950x system.
 
Any luck for AMD? I'm having awful single core perf on a 5950x system.
Sorry, I have no AMD servers with Proxmox/Linux right now. But PVE is based on Ubuntu. Most things that work for ubuntu also work for PVE. Do a search for amd power-management in ubuntu or amd cpu scaling in ubuntu or amd battery life ubuntu. Usually you will find something where people talk about notebooks and battery optimization, but most of it is true for servers as well.

If your results are to old, limit the search to "last year", since Ryzen 9 series is rather new.
 
Slight correction, we base off Debian, not Ubuntu (which is also a derivate of Debian), but yes, we base off the Ubuntu flavoured Linux kernel.

And further I believe that NikoC actually wants the opposite of this thread, i.e., always full frequency (as performance not power usage was noted as to small).
 
  • Like
Reactions: masgo
Any news about the new AMD p-state kernel driver that we can test?
We have got pretty bad performance per watt on AMD platform.
 
Any news about the new AMD p-state kernel driver that we can test?

The amd-pstate driver is not yet accepted in any kernel git tree, and still under review, the last revision was posted only a few days ago FWICT: https://lore.kernel.org/lkml/20210926090605.3556134-1-ray.huang@amd.com/

We do not patch in driver that are still under development and as it seems from comments, seem to require still some change before being ready to go into the kernel.. If you want you can just build a kernel with those patches applied yourself though.
The earliest I see this getting released is v5.16, if they get it in shape for the next merge window that should start in the first half of November and then be released in a kernel version in the start of 2022, it may find its way into PVE then, but when and how cannot be told for sure yet.
We have got pretty bad performance per watt on AMD platform.
There are many AMD platforms, which one do you mean and how do you determine that? As we have various EPYC, Threadripper and Ryzen platforms in use here in servers and workstations and do not see significant issues, but we use the default performance CPU scheduling governor, and that wouldn't see much difference for the p-state changes anyway.
 
The right way to handle Intel cpu frequency state and settings is to use a tool called cpupower. CPUPOWER is an utility that you get once you install linux-tools. In order to do that on ubuntu for example is apt-get install -y linux-tools-$(uname -r). I have tried that on Proxmox but doesn't work.

1645003018947.png

I have also tried to install manually both the following but it seems not included.

1645003088309.png

While if you run same command on ubuntu it works as you can see below.
1645003310150.png

here the output from cpupower command...
1645003376207.png

For Proxmox developers shouldn't be so time consuming to enable it. Thanks.
 
I think the package is called linux-cpupower in debian.

Also the solution posted by H4R0 is quite easy and works well. For servers I want to be able to script the settings, which works well with cpufrequtils. I do not see the advantage of using cpupower apart from having nice dialogs.

Enabeling powersave as a default setting is open for discussion. Since frequency scaling can cause all sorts of behaviour (like CPU % usage being wrong) having performance as a default setting is a reasonable choice. Maybe proxmox could ask during setup which is the preferred setting.
 
Last edited:
  • Like
Reactions: brofids and ChrisDB
I am not sure if the "replace the intel-pstate CPU power management driver with the acpi-cpufreq one" part is still valid for recent CPUs. What I have done is simply set the powersave state and let the CPU do the rest.

If by "let the CPU do the rest" you mean, let the CPU do nothing but run at it's lowest supported frequency, sure, but you're neutering performance, especially if you have a higher-end CPU that supports large CPU scaling factors.

https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt#:~:text=2.2 Powersave :
2.2 Powersave
-------------

The CPUfreq governor "powersave" sets the CPU statically to the
lowest frequency within the borders of scaling_min_freq and
scaling_max_freq.

Without using external apps to modify the scaling factors based on various factors, it would be better to experiment with the profiles: ondemand (aggressively scales on demand), conservative (lazily scales on demand), or the newer schedutil which is the only profile that attempts to integrate directly with the kernel scheduler.

On a fresh install of PVE 7.1-2 on oldish Xeon E3-1230, the default seems to be ondemand and I'd expect the same for any newer Intel proc :

Code:
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: intel_cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us.
  hardware limits: 1.60 GHz - 3.60 GHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil
  current policy: frequency should be within 1.60 GHz and 3.60 GHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 1.60 GHz.

So it seems this might be a rather moot point for all but those who want the absolute minimal power usage and are willing to drop CPU performance to minimal in order to get it.

-=dave
 
Last edited:
If by "let the CPU do the rest" you mean, let the CPU do nothing but run at it's lowest supported frequency, sure, but you're neutering performance, especially if you have a higher-end CPU that supports large CPU scaling factors.

https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt#:~:text=2.2 Powersave :
You might be missing something here. You have to distinguish between the intel_cpufreq driver and the intel_pstate driver. As seen below, you seem to use intel_cpufreq. For this driver, your claim is valid. This governor actually sets the CPU to the lowest frequency. I was talking about the intel_pstate driver. In a nutshell the P-state driver lets the CPU handle the scaling. (more or less). Have a look here for more explanation: https://www.kernel.org/doc/html/v4.19/admin-guide/pm/intel_pstate.html

A quote from the article.
They are not generic scaling governors, but their names are the same as the names of some of those governors. Moreover, confusingly enough, they generally do not work in the same way as the generic governors they share the names with. For example, the powersave P-state selection algorithm provided by intel_pstate is not a counterpart of the generic powersave governor (roughly, it corresponds to the schedutil and ondemand governors).

Here is an article where they compare the different drivers and their governors and how they impact performance and power consumption. Here the cpufreq-powersave performs poor (as you stated), while the p-state-powersave is within ~ 5% of the best score.
https://www.phoronix.com/scan.php?page=article&item=linux50-pstate-cpufreq&num=1

Most Intel Linux users will be best off with either P-State's powersave (the default on most distributions) or performance governors.

Lastly is the geometric mean of all the benchmarks conducted showing similar leading performance between P-State/CPUFreq performance and P-State powersave coming in right behind.
While this article is focused on desktop CPUs and desktop benchmarks, it still matches my experience in server applications. Performance gives the best performance, but I have not one server which is utilized constantly, I am using the P-State powersave "governor" as a default.

This is what it looks like on a server very similar to yours. It has a Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz. The server was initially installed with PVE 6.x and upgraded to 7. intel_pstate performance was the default. I changed it to powersave.

Code:
# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.40 GHz.
analyzing CPU 1:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 1
  CPUs which need to have their frequency coordinated by software: 1
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.30 GHz.
analyzing CPU 2:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 2
  CPUs which need to have their frequency coordinated by software: 2
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.31 GHz.
analyzing CPU 3:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 3
  CPUs which need to have their frequency coordinated by software: 3
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.39 GHz.
 
You might be missing something here. You have to distinguish between the intel_cpufreq driver and the intel_pstate driver. As seen below, you seem to use intel_cpufreq. For this driver, your claim is valid. This governor actually sets the CPU to the lowest frequency. I was talking about the intel_pstate driver. In a nutshell the P-state driver lets the CPU handle the scaling. (more or less). Have a look here for more explanation: https://www.kernel.org/doc/html/v4.19/admin-guide/pm/intel_pstate.html

A quote from the article.


Here is an article where they compare the different drivers and their governors and how they impact performance and power consumption. Here the cpufreq-powersave performs poor (as you stated), while the p-state-powersave is within ~ 5% of the best score.
https://www.phoronix.com/scan.php?page=article&item=linux50-pstate-cpufreq&num=1


While this article is focused on desktop CPUs and desktop benchmarks, it still matches my experience in server applications. Performance gives the best performance, but I have not one server which is utilized constantly, I am using the P-State powersave "governor" as a default.

This is what it looks like on a server very similar to yours. It has a Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz. The server was initially installed with PVE 6.x and upgraded to 7. intel_pstate performance was the default. I changed it to powersave.

Code:
# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.40 GHz.
analyzing CPU 1:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 1
  CPUs which need to have their frequency coordinated by software: 1
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.30 GHz.
analyzing CPU 2:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 2
  CPUs which need to have their frequency coordinated by software: 2
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.31 GHz.
analyzing CPU 3:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 3
  CPUs which need to have their frequency coordinated by software: 3
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 3.50 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.50 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 1.39 GHz.
For a fresh Proxmox 7.1-x install, what are the criteria for determining whether the intel_pstate driver is loaded or the intel_cpufreq (i.e. acpi_cpufreq) driver?

Maybe @t.lamprecht can comment?
 
Last edited:
For a fresh Proxmox 7.1-x install, what are the criteria for determining whether the intel_pstate driver is loaded or the intel_cpufreq (i.e. acpi_cpufreq) driver?

Maybe @t.lamprecht can comment?
You can use cpufreq-info to see which driver is currently used.
 
Sorry, I misunderstood your question. I guess the default depends on the kernel config used to build the kernel. One could override this default with boot flags in grub, but as far as I can see, PVE does not do this.

The PVE kernel is maintained here: https://github.com/proxmox/pve-kernel

According to the documentation they use the Ubuntu kernel and change only the governor to performance:
- set CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
because if not set, it can give some dynamic memory or cpu frequencies
change, and vms can crash (mainly windows guest).
see http://forum.proxmox.com/threads/18...-during-process-termination?p=93273#post93273

So the driver is determined by the defaults chosen by Ubuntu. Which is probably configured here:
https://kernel.ubuntu.com/~kernel-ppa/config/jammy/linux/5.13.0-19.19/amd64-config.flavour.generic
#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_PCC_CPUFREQ=y
CONFIG_X86_ACPI_CPUFREQ=y
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_AMD_FREQ_SENSITIVITY=m
CONFIG_X86_SPEEDSTEP_CENTRINO=y
CONFIG_X86_P4_CLOCKMOD=m

So they enable the intel p-state driver. Since it is enabled it will be the preferred driver for Sandy bridge processors.
See here:
https://git.kernel.org/pub/scm/linu...6?id=8f3d9f354286745c751374f5f1fcafee6b3f3136

# x86 CPU Frequency scaling drivers
#

config X86_INTEL_PSTATE
bool "Intel P state control"
depends on X86
select ACPI_PROCESSOR if ACPI
select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
select CPU_FREQ_GOV_PERFORMANCE
select CPU_FREQ_GOV_SCHEDUTIL if SMP
help
This driver provides a P state for Intel core processors.
The driver implements an internal governor and will become
the scaling driver and governor for Sandy bridge processors.

When this driver is enabled it will become the preferred
scaling driver for Sandy bridge processors.

If in doubt, say N.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!