PVE 100%CPU on all kvm while vms are idle at 0-5% cpu

Hi,
as well as sharing the VM configs (qm config <ID>), please also post the output of pveversion -v.

The following command takes 10 seconds and logs which syscalls take how much time, maybe there is a hint there:
Code:
timeout 10 strace -c -p $(cat /var/run/qemu-server/<ID>.pid)

For both commands, replace <ID> with the actual ID of the VM.
Excuse me, dear Professor Fiona, may I ask for your advice on the hardware export issue of Proxmox.
Here is my question link:
https://forum.proxmox.com/threads/h...f-all-virtual-machines-in-the-cluster.138730/
 
Hi,
8.1.2-6 does not resolve the issue whatsoever for me.
Only reprieve seems to be a full shutdown of a VM, then starting again. Lasts a few hours at most.
what exactly are the symptoms you are seeing? Please also post the VM configuration.
 
Hi,

what exactly are the symptoms you are seeing? Please also post the VM configuration.
CPU usage on guest VMs in the 60-70% range, when guest usage is <10%.
Nothing special about the VMs either, default CPU type (kvm64), 1x socket, 4x core, everything else default.
SCSI controller is "VirtIO SCSI single"
 
CPU usage on guest VMs in the 60-70% range, when guest usage is <10%.
Nothing special about the VMs either, default CPU type (kvm64), 1x socket, 4x core, everything else default.
SCSI controller is "VirtIO SCSI single"
Can you check with e.g. htop which thread is using the CPU? Is iothread enabled on the disk? Does the issue start after you do a backup? Is it also present when you downgrade with apt install pve-qemu-kvm=8.1.2-4 and stop/start the VM (to have it use the now installed QEMU binary)?
 
Can you check with e.g. htop which thread is using the CPU? Is iothread enabled on the disk? Does the issue start after you do a backup? Is it also present when you downgrade with apt install pve-qemu-kvm=8.1.2-4 and stop/start the VM (to have it use the now installed QEMU binary)?
I will check the next time i observe it happening.
 
I installed the latest version of pve-qemu-kvm (8.1.2-6) and enabled the VirtIO SCSI single controller again and now it works correctly also after backup.
No need to restart the PVE node
Not here, I have 8.1.2-6 installed and also VirtIO SCSI single enabled but still randomly freezing after a backup. Today it was so bad that I could only power down, nothing worked anymore.
 
Hi,
Not here, I have 8.1.2-6 installed and also VirtIO SCSI single enabled but still randomly freezing after a backup. Today it was so bad that I could only power down, nothing worked anymore.
do you mean the guest/VM froze? Is there high CPU load for the kvm process? It might be the rarer issue the problematic commit (which introduced the CPU load issue) tried to address: https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=6b7c1815e1c89cb66ff48fbba6da69fe6d254630
A proper fix for that is still being worked on: https://lists.nongnu.org/archive/html/qemu-devel/2024-01/msg00117.html

You can try using qm suspend <ID> && qm resume <ID> and see if that helps.
 
Hi,

do you mean the guest/VM froze? Is there high CPU load for the kvm process? It might be the rarer issue the problematic commit (which introduced the CPU load issue) tried to address: https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=6b7c1815e1c89cb66ff48fbba6da69fe6d254630
A proper fix for that is still being worked on: https://lists.nongnu.org/archive/html/qemu-devel/2024-01/msg00117.html

You can try using qm suspend <ID> && qm resume <ID> and see if that helps.
Hi,

Thanks for the links. However, yesterday the whole Proxmox host was frozen and completely unresponsive. I was in a shell but any command I was trying resulted in nothing. GUI was completely frozen. So the qm command wouldn't have worked in this case.

I tried changing my backup settings for in fast to see if that has any effect. Since it's not directly related to VMs freezing it looks like. Maybe I'm wrong though.
 
Hi,

Thanks for the links. However, yesterday the whole Proxmox host was frozen and completely unresponsive. I was in a shell but any command I was trying resulted in nothing. GUI was completely frozen. So the qm command wouldn't have worked in this case.

I tried changing my backup settings for in fast to see if that has any effect. Since it's not directly related to VMs freezing it looks like. Maybe I'm wrong though.
Is there anything in the system logs/journal around the time the issue happened?
 
Is there anything in the system logs/journal around the time the issue happened?
I just checked, the only thing I can find that out of the blue my NFS is unreachable.
kernel: nfs: server 10.10.250.50 not responding, still trying

After a power off and startup of Proxmox, it's again able to reach the NFS.
But I can imagine why that happens, after the freeze everything in my network is unable to communicate. That's because my Proxmox is hosting pfSense and the NFS is in a seperate network which goes through pfSense. That isn't the problem since that always worked but the 'freeze' of Proxmox (or the VM, however I can't find anything weird about the specific VM) makes it impossible for network traffic to pass pfSense. Makes sense of course. I can reach the admin interface of Proxmox because I made that one available outside pfSense. What I find weird however that I can't kill the backup task since everything I try to do in Proxmox shell, won't work. So I can connect to Promox GUI, shell and probably SSH (need to try if it happens next time) but I can't do much. Even shutdown didn't respond yesterday.

I have to add the backup already succesfully passed the pfSense VM, it was busy with a HomeAssistant VM at the time Promox stopped responding.
 
Last edited:
Hm, after installing this update I immediately had a freeze on the VM where IOthread was off. I didn't have the issue for a while but today it came back. Was a reboot needed after this update? It didn't say so after installing.
Any system related should have a reboot. Especially any stuff touching io... But to have a stable and working system: disconnect inet , iso install, restore vm : all will do work. a server is not a phone to update every 2min.
 
Hm, after installing this update I immediately had a freeze on the VM where IOthread was off. I didn't have the issue for a while but today it came back. Was a reboot needed after this update? It didn't say so after installing.
Yes. You need to shutdown+start the guest, live migrate to an updated node or use the Reboot button in the web UI. Reboot inside the guest is not enough. Otherwise, the VM will still be running with the old QEMU binary. You can use qm status <ID> --verbose | grep running-qemu to check the currently running version.
 
Hello, unfortunately 8.1.2-6 didn't fix it for me?
Code:
agent: 1
boot: order=virtio0;net0
cores: 2
cpu: kvm64
memory: 2048
name: unifi-controller
net0: virtio=FA:83:81:15:07:9D,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=8d5d167a-b271-4d41-8ff9-88b0492f208e
sockets: 1
virtio0: data2:vm-108-disk-0,size=32G
vmgenid: 41bd8451-75f3-4e5d-b666-f08091009616

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-9
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1

CPU Usage at the pve gui says roundabout 10-11%

1707978666205.png

while htop reports 1-3%

1707978782291.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!