PVE 100%CPU on all kvm while vms are idle at 0-5% cpu

Hi Fiona,
I have a positive update. It seems that that this issue is not specific to Proxmox. Our CU vendor, who had supplied the instructions on how to setup this virtual machine, have observed a similar behaviour on Vmware as well. Digging further, it seems that this issue comes after executing the last step in below series of steps.

1. Install real-time kernel on the VM and check that it should be 5.14.0-162.12.1.rt21.175.el9_1.x86_64.
2. systemctl disable firewalld --now.
3. sudo setenforce 0.
4. sudo sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config.
5. sudo sed -i 's/blacklist/#blacklist/' /etc/modprobe.d/sctp*.
6. Edit /etc/tuned/realtime-virtual-host-variables.conf and set "isolated_cores=2-15".
7. tuned-adm profile realtime-virtual-host.
8. Edit /etc/default/grub and set
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=rl/root processor.max_cstate=1 intel_idle.max_cstate=0 intel_pstate=disable idle=poll default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_iommu=on iommu=pt selinux=0 enforcing=0 nmi_watchdog=0 audit=0 mce=off".
9. Add below 2 entries to grub file:
GRUB_CMDLINE_LINUX_DEFAULT="${GRUB_CMDLINE_LINUX_DEFAULT:+$GRUB_CMDLINE_LINUX_DEFAULT}\$tuned_params" GRUB_INITRD_OVERLAY="${GRUB_INITRD_OVERLAY:+$GRUB_INITRD_OVERLAY }\$tuned_initrd"
10. grub2-mkconfig -o /boot/grub2/grub.cfg.
11. reboot

This is where both Proxmox and Vmware show the same behaviour and show that the CPU utilization of the VM is 100%.

They are currently working on trying to figure out why this is happening. In case you spot something unusual in above commands (especially for a VM), please let me know. I would be really greatful.

Will keep you posted.


Regards,
Vikrant
 
  • Like
Reactions: fiona
Hi Fiona,
I have a positive update. It seems that that this issue is not specific to Proxmox. Our CU vendor, who had supplied the instructions on how to setup this virtual machine, have observed a similar behaviour on Vmware as well. Digging further, it seems that this issue comes after executing the last step in below series of steps.

1. Install real-time kernel on the VM and check that it should be 5.14.0-162.12.1.rt21.175.el9_1.x86_64.
2. systemctl disable firewalld --now.
3. sudo setenforce 0.
4. sudo sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config.
5. sudo sed -i 's/blacklist/#blacklist/' /etc/modprobe.d/sctp*.
6. Edit /etc/tuned/realtime-virtual-host-variables.conf and set "isolated_cores=2-15".
7. tuned-adm profile realtime-virtual-host.
8. Edit /etc/default/grub and set
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=rl/root processor.max_cstate=1 intel_idle.max_cstate=0 intel_pstate=disable idle=poll default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_iommu=on iommu=pt selinux=0 enforcing=0 nmi_watchdog=0 audit=0 mce=off".
9. Add below 2 entries to grub file:
GRUB_CMDLINE_LINUX_DEFAULT="${GRUB_CMDLINE_LINUX_DEFAULT:+$GRUB_CMDLINE_LINUX_DEFAULT}\$tuned_params" GRUB_INITRD_OVERLAY="${GRUB_INITRD_OVERLAY:+$GRUB_INITRD_OVERLAY }\$tuned_initrd"
10. grub2-mkconfig -o /boot/grub2/grub.cfg.
11. reboot

This is where both Proxmox and Vmware show the same behaviour and show that the CPU utilization of the VM is 100%.

They are currently working on trying to figure out why this is happening. In case you spot something unusual in above commands (especially for a VM), please let me know. I would be really greatful.
There is a lot of modification of kernel commandline. Just a wild guess, but maybe the cstate-related settings? Otherwise, you'll probably have to "bisect" the settings somehow to find the problematic one(s).
 
There is a lot of modification of kernel commandline. Just a wild guess, but maybe the cstate-related settings? Otherwise, you'll probably have to "bisect" the settings somehow to find the problematic one(s).
Hi,

There are some further updates on this matter, just in case someone is curious to know.
The exact issue causing this lies in step 8.
If i leave out "idle=poll" from this step and set the corresponding parameters as below, the issue gets resolved.

GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=rl/root processor.max_cstate=1 intel_idle.max_cstate=0 intel_pstate=disable default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_iommu=on iommu=pt selinux=0 enforcing=0 nmi_watchdog=0 audit=0 mce=off"


1711538661781.png
 
  • Like
Reactions: fiona

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!