PVE 100%CPU on all kvm while vms are idle at 0-5% cpu

Flaxplax

New Member
Dec 14, 2023
2
0
1
Not sure what is happening, did a quick upgrade and restarted my proxmox nodes.
After that i noticed that they randomly spiked to 100%
Can see all kvm's are trying to get as much juice as possible for some reason, but when logging to the vms and seeing whats up, they do nothing and dont use any cpu at all.

Have anyone else have any similar problem?
 
I have similar issue, updated PVE host and VMs (Debian) yesterday.
Everything seems to work fine after reboot, but after scheduled VM backup to PBS proxmox starts reporting VM cpu usage at ~50% instead of 0-5%
- when I SSH into the VM it reports 0-5% CPU usage as it should be

Rebooting that VM fixes the high CPU usage issue but it happens again after backup

edit:
Just noticed, my Home assistant VM (downloaded from their site) does not suffer from this issue, cpu stays at ~1% after backup
 
Last edited:
Debian VM
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=ide2;scsi0;net0
cores: 2
cpu: x86-64-v2-AES
efidisk0: Data-PVE1:vm-101-disk-0,efitype=4m,size=1M
ide2: none,media=cdrom
machine: q35
memory: 1024
meta: creation-qemu=8.0.2,ctime=1697132753
name: Managment
net0: virtio=82:57:1E:09:F8:93,bridge=vmbr0,firewall=1,tag=30
numa: 0
onboot: 1
ostype: l26
scsi0: Data-PVE1:vm-101-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=9cda2559-8f0b-4dbd-b14a-e8d3393997b7
sockets: 1
startup: order=2
vmgenid: 2b994b0c-d8dd-4749-8e3e-04e14dddd5ba

Home assistant VM
Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 4
cpu: x86-64-v2-AES
efidisk0: Data-PVE1:vm-111-disk-0,size=4M
localtime: 1
machine: q35
memory: 4096
meta: creation-qemu=8.0.2,ctime=1699533837
name: HomeAssistant
net0: virtio=02:C8:75:3C:9B:57,bridge=vmbr0,tag=30
net1: virtio=DE:08:A1:06:EB:62,bridge=vmbr0,tag=70
numa: 0
onboot: 1
ostype: l26
scsi0: Data-PVE1:vm-111-disk-1,cache=writethrough,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=99b6ad16-7d62-4f25-acf4-ad0ae86f15f2
sockets: 1
tablet: 0
tags: 
vmgenid: 4ec63779-1e47-4b3c-ae55-494f000b3095
 
Hi,
as well as sharing the VM configs (qm config <ID>), please also post the output of pveversion -v.

The following command takes 10 seconds and logs which syscalls take how much time, maybe there is a hint there:
Code:
timeout 10 strace -c -p $(cat /var/run/qemu-server/<ID>.pid)

For both commands, replace <ID> with the actual ID of the VM.
 
Code:
root@pve1:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1

Code:
root@pve1:~# timeout 10 strace -c -p $(cat /var/run/qemu-server/101.pid)
strace: Process 55654 attached
strace: Process 55654 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 92.65    0.020594          37       542           ppoll
  6.17    0.001371           0      2040           write
  0.63    0.000140           0       499           recvmsg
  0.54    0.000119           0       525           read
  0.01    0.000002           1         2           accept4
  0.00    0.000001           0         2           close
  0.00    0.000001           0        10           sendmsg
  0.00    0.000000           0         2           getsockname
  0.00    0.000000           0         4           fcntl
  0.00    0.000000           0         1           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.022228           6      3627           total

Code:
root@pve1:~# timeout 10 strace -c -p $(cat /var/run/qemu-server/111.pid)
strace: Process 4099 attached
strace: Process 4099 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 76.93    0.035196          31      1128         1 ppoll
 14.25    0.006519           1      4098           write
  4.17    0.001908           1       998           recvmsg
  3.77    0.001724           1      1072           read
  0.48    0.000218           6        35           io_uring_enter
  0.21    0.000097           4        20           sendmsg
  0.07    0.000032           8         4           close
  0.04    0.000020           1        15           ioctl
  0.04    0.000020           5         4           accept4
  0.02    0.000010           1         8           fcntl
  0.01    0.000004           1         4           getsockname
  0.01    0.000004           1         3           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.045752           6      7389         1 total
 
Code:
root@pve1:~# timeout 10 strace -c -p $(cat /var/run/qemu-server/101.pid)
strace: Process 55654 attached
strace: Process 55654 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 92.65    0.020594          37       542           ppoll
  6.17    0.001371           0      2040           write
  0.63    0.000140           0       499           recvmsg
  0.54    0.000119           0       525           read
  0.01    0.000002           1         2           accept4
  0.00    0.000001           0         2           close
  0.00    0.000001           0        10           sendmsg
  0.00    0.000000           0         2           getsockname
  0.00    0.000000           0         4           fcntl
  0.00    0.000000           0         1           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.022228           6      3627           total

Code:
root@pve1:~# timeout 10 strace -c -p $(cat /var/run/qemu-server/111.pid)
strace: Process 4099 attached
strace: Process 4099 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 76.93    0.035196          31      1128         1 ppoll
 14.25    0.006519           1      4098           write
  4.17    0.001908           1       998           recvmsg
  3.77    0.001724           1      1072           read
  0.48    0.000218           6        35           io_uring_enter
  0.21    0.000097           4        20           sendmsg
  0.07    0.000032           8         4           close
  0.04    0.000020           1        15           ioctl
  0.04    0.000020           5         4           accept4
  0.02    0.000010           1         8           fcntl
  0.01    0.000004           1         4           getsockname
  0.01    0.000004           1         3           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.045752           6      7389         1 total
Is this while the CPU usage is spiking? It spends less than 0.1 seconds of the 10 seconds inside the syscalls, so no hint unfortunately.
 
101 was spiking to 50% according to proxmox, when I ssh into the VM htop reports load average 0.00 - 0.01 (this VM is supposed to be idle most of the time)
111 was at <1%
 
Thank you for the reports. @Chris and I were able to reproduce the issue now and we are investigating. Most likely it is a regression introduced by a recent attempt to fix another issue: https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=6b7c1815e1c89cb66ff48fbba6da69fe6d254630

Your Home Assistant VM was probably not affected because it isn't using iothread. Downgrading to pve-qemu-kvm=8.1.2-4 should be a workaround. If you already got an affected VM, migrate it to a node which is already downgraded or start it again.

EDIT: Running qm suspend <ID> && qm resume <ID> or equivalently, pause and resume in the UI is another workaround for an already affected VM.
 
Last edited:
I can confirm the issue as well especially when I backup VM's. Since it is reprocuded I guess it is just a matter of time?
 
  • Like
Reactions: VictorSTS
I also can confirm that problem happens with direct_sync/native, it seem to be not io_uring bound.

This happend during backup but luckily the cpu is only 50% not 100%.
 
Last edited:
I think I have the saem problem. Running 8.1.3 pve versions:

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0

on my Intel n100 nuc. On the web interface CPU usage is 70% around but htop shows much lower:
1702745321938.png
View attachment 59909

The VMs themselves have a load that is proportionete to the htop so somehow I think the web interface is broken.
Happy to post more info if it helps anyone.
 
I think I have the saem problem. Running 8.1.3 pve versions:

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0

on my Intel n100 nuc. On the web interface CPU usage is 70% around but htop shows much lower:
View attachment 59908

The VMs themselves have a much lower load than what is show in the web interface.
Happy to post more info if it helps anyone.
 
I think I have the saem problem. Running 8.1.3 pve versions:

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0

on my Intel n100 nuc. On the web interface CPU usage is 70% around but htop shows much lower:
View attachment 59908
View attachment 59909

The VMs themselves have a load that is proportionete to the htop so somehow I think the web interface is broken.
Happy to post more info if it helps anyone.
The webinterface is not broken, your vm literally is creating that usage. I can proof it by the fact that my server under my desk started to screem because of the condition your screenshot is representing. qm suspend <vmid> && qm resume <vmid> made it quite again.
 
Just let it your package manager handle for you:
Code:
apt install pve-qemu-kvm_8.1.2-4
I recommend a reboot afterwards.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!