Proxmox 8/8.1, high cpu on host, idle in guest

neko_code

New Member
Aug 28, 2022
3
0
1
Good night/evening everyone!

I tried to search for such issue everywhere at this forum but I couldn't find a solution that could help me.
My issue: Idle cpu % in Windows 11 guest, while host shows over 100-600% usage for that VM. Windows VM got GPU passthrough. Ballooning is disabled. I checked Windows 11 stats to see IO/net usage, but it is very low and lower than 4% while doing nothing.
I tried to disable all software including rdp (moonlight).

In some cases host shows that guest cpu usage went below 50% but after a minute it rises to large amount again.

pveversion output:
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-6
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-14-pve: 6.2.16-14
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.15.53-1-pve: 5.15.53-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx7
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.3
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
Though I don't understand why it says pve-kernel-5.15 (update: apt autoremove got rid of that)

Windows 11 version: 10.0.22000
Virtio version: 0.1.240 (Got updated from 0.1.171 yesterday, thinking that the issue would go away - nope)

Host specs:
i7-13700, 128gb 2666 mhz, Intel's integrated CPU is being passthrough to another linux VM (and there are no issues), RTX 4080 (I can post more specs but I don't think that it is the reason)

VM specs:
Code:
args: -uuid 00000000-0000-0000-0000-000000000101 -machine hpet=off -rtc driftfix=slew -global kvm-pit.lost_tick_policy=discard -cpu 'host,-hypervisor,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_synic,hv_stimer,hv_vpindex,hv_runtime,hv_relaxed,kvm=off,hv_vendor_id=intel'
balloon: 0
bios: ovmf
boot: order=virtio0;net0
cores: 12
cpu: host,flags=+pcid
efidisk0: data-2:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1,x-vga=1
machine: pc-q35-8.1
memory: 42000
meta: creation-qemu=7.0.0,ctime=1663465040
name: win11
net0: virtio=76:03:84:99:4B:3F,bridge=vmbr0,tag=70
numa: 0
ostype: other
scsihw: virtio-scsi-single
smbios1: uuid=58741d44-5349-4bff-975c-7bc584445f37
sockets: 1
tablet: 0
tpmstate0: data-2:vm-101-disk-1,size=4M,version=v2.0
vga: none
virtio0: data-2:vm-101-disk-2,iothread=1,size=128G
vmgenid: 5cf688dd-9de3-45f4-8ac5-7d169327e336

I tried to completely remove args to check if it is the reason but I still have large host cpu usage.

What I tried so far and none of that helped:
- Changing CPU type to kvm64 and tried to set some specific cpu type
- -machine no-hpet option, clock skew
- Tried to change VirtIO SCSI to VirtIO SCSI Single
- Removing PCI device (gpu passthrough)
- Changing disk parameters (with iothread and without)
- Changing machine type from pc-q35-7.1 to 8.1 (Can't change to another because Proxmox about passthrough stating that I should use q35)
- Tried to lower core count (I don't use NUMA because its a single cpu board)
- Lowering RAM for that VM
- Changing OS type in Proxmox settings to "Other"
- Disabling trackpad tracking


One thing I noticed for sure, that strace output shows futex with a lot of errors:
Code:
$ /etc/pve/qemu-server# strace -c -p $(cat /var/run/qemu-server/101.pid)
strace: Process 17587 attached
strace: Process 17587 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 78.57    2.233183          16    132816           ppoll
  7.04    0.200061           6     30691           read
  6.90    0.196116           2     80337           ioctl
  6.35    0.180434           1     99245     10854 futex
  0.95    0.026892           0     34287           write
  0.19    0.005493           0      8347           recvmsg
  0.00    0.000076           2        34           accept4
  0.00    0.000023           0       170           sendmsg
  0.00    0.000015           0        34           close
  0.00    0.000006           0        68           fcntl
  0.00    0.000004           0        34           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00    2.842303           7    386063     10854 total

Host is based on Debian 12 with Proxmox on top with several VMs and only Windows 11 VM is behaving like that.
Screenshot 2024-01-04 at 21.07.28.png


Host grub parameters:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset intel_iommu=on iommu=pt initcall_blacklist=sysfb_init pcie_acs_override=downstream,multifunction video=simplefb:off video=vesafb:off video=efifb:off video=vesa:off disable_vga=1"

Such parameters are required for gpu passthrough so host won't use my GPU to avoid problems with VM. (Taken from docs if I recall correctly)
I got another Linux (Debian 12/6.15 kernel) based VM with exact same parameters for GPU (no device sharing, I got to shutdown the Windows VM to use Linux VM) and there's no such issue so its not related to GPU passthrough I think.

Any idea is welcome:)
 
Last edited:
Did you shutdown all VMs or reboot after upgrading "pve-qemu-kvm 8.1.2-5" to "pve-qemu-kvm 8.1.2-6"? As this sounds like the bug of "pve-qemu-kvm 8.1.2-5" that got fixed with your "pve-qemu-kvm 8.1.2-6" but VMs will still use the old "pve-qemu-kvm 8.1.2-5" until you restart them.
 
Last edited:
Did you shutdown all VMs or reboot after upgrading "pve-qemu-kvm 8.1.2-5" to "pve-qemu-kvm 8.1.2-6"? As this sounds like the bug of "pve-qemu-kvm 8.1.2-5" that got fixed with your "pve-qemu-kvm 8.1.2-6" but VMs will still use the old "pve-qemu-kvm 8.1.2-5" until you restart them.
I had to shutdown the whole node to replace the fan, so yes
 
Turns out it was win11 fault (lol, not surprised), when cloned (using Clonezilla) to barebones configuration I had weird stuff going on. Task Manager shows around 1-4% of cpu usage while the whole os was very laggy. Had to reinstall the whole os (currently 22h3) and its running smooth with no cpu bumps. I don't know what was going on, once again task manager didn't show any process with high io/net/cpu usage

Sorry for inconvenience, I thought it would be related to VM host software