plötzlich höhere CPU-Last bei allen Linux-VMs?!

gfngfn256 · Mar 19, 2025

Sorry for my English. I recently also updated to pve-qemu-kvm: 9.2.0-2 & just found this thread from here (my post). But here are my stats:

Code:

Node CPU:  i7-13700H

qm config 100
agent: 1
bios: ovmf
boot: order=scsi0
cores: 4
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
localtime: 1
memory: 8192
meta: creation-qemu=7.2.0,ctime=1679436314
name: haos15.0
net0: virtio=02:E9:C7:F9:B3:54,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-1,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=ec2d7b15-7128-4217-8ce3-96d361d1e323
sockets: 1
tablet: 0
vmgenid: e398d458-3713-4388-89cd-4ec7c9c1511f
vmstatestorage: local-lvm

As you can see, I also have guest agent configured - so Neobin may have pinpointed the issue area.
I tried top - but can't say I found anything extraordinary, just python3 showing between 0.7 - 2.3 % CPU usage, but that maybe normal.

Neobin · Mar 19, 2025

gfngfn256 said:
As you can see, I also have guest agent configured - so Neobin may have pinpointed the issue area.

Later I will also test that VM (1) with 9.0 and without GA. That datapoint is currently missing.

I was only wondering, if it might have something to do with the GA, but although the CPU utilization decreased from ~2,25% to ~1,5% without the GA, it is still by a factor of three (~1,5% vs. ~0,5%) higher compared to 9.0 (with even the GA active).

To be clear:
I do not want to put someone (especially from the developers) on the wrong track with that GA topic!
It was only something I was curious about and therefore tested and mentioned it here as an additional datapoint, but the result might be fully normal and expected and might have nothing to do with the actual problem at all!

HalloWelt · Mar 19, 2025

May I ask what "GA" is and how I can en/disable it?

gfngfn256 · Mar 19, 2025

Neobin said:
That datapoint is currently missing.

I totally agree that it is still early to make any definite conclusion, as more data gathering is required. Note your VM3 (no GA) that appears unaffected by qemu update.

I believe these pointers are important:

OP suggests that Windows did not suffer from the update (GA installed?).
Posters will have to check if they have guest agent configured & running in their VMs.
Posters will only be able to confirm data - if they fully shutdown the VM & restart (not just reboot).

fiona · Mar 19, 2025

Ist die höhere CPU-Auslastung auch innerhalb des Gastes sichtbar? Falls ja, für welche Prozesse?

Is the higher CPU usage also visible from within the guest? If yes, for which processes?

fiona · Mar 19, 2025

HalloWelt said:
May I ask what "GA" is and how I can en/disable it?

That refers to the QEMU guest agent. It is enabled/disabled via the VM configuration options, and for enabling also needs to be installed inside the VM.

gfngfn256 · Mar 19, 2025

HalloWelt said:
May I ask what "GA" is and how I can en/disable it?

GA stands for guest agent. See here for further info.

Can you check in the Proxmox Web GUI on the VMs you posted in your OP - in the Summary screen if you see the listed IP for that VM. If you can the GA is running, otherwise you will se "No Guest Agent configured".

gfngfn256 · Mar 19, 2025

fiona said:
Is the higher CPU usage also visible from within the guest? If yes, for which processes?

From inside VM:

gfngfn256 said:
I tried top - but can't say I found anything extraordinary, just python3 showing between 0.7 - 2.3 % CPU usage, but that maybe normal.

fiona · Mar 19, 2025

Seems like I can see the issue for a VM with guest-agent now too. Thanks to @Neobin for the pointer! Will investigate, but feel free to share new insights/information of course!

Neobin · Mar 19, 2025

Neobin said:
Later I will also test that VM (1) with 9.0 and without GA. That datapoint is currently missing.

I have added that datapoint now to my initial posting. (9.0.2-5 without GA: ~0,35%)

gfngfn256 said:
Note your VM3 (no GA) that appears unaffected by qemu update.

The crucial difference with VM 3 is, that it is a FreeBSD one. This is, why I included it.

gfngfn256 · Mar 19, 2025

Neobin said:
I have added that datapoint now to my initial posting. (9.0.2-5 without GA: ~0,35%)

So magnitudes of 4.3X (no GA) to 4.5X (GA) - so probably non-GA related at least for this VM.

fiona · Mar 19, 2025

English:

The issue seems to be caused by the following fixes/changes in the implementation of the hpet timer: https://gitlab.com/qemu-project/qemu/-/commit/f0ccf770789e48b7a73497b465fdc892d28c1339

And for me, qm set 1234 --args '-machine hpet=off' is indeed a workaround (for VM 1234). Would be nice if other people could quickly confirm this to see if it is indeed the same issue.

EDIT: In particular, the following sounds like it might explain the increased usage:

- the timer must be kept running even if not enabled, in
order to set the ISR flag, so writes to HPET_TN_CFG must
not call hpet_del_timer()

Deutsch:
Das Problem scheint durch folgende Verbesserungen/Änderungen in der Implementation vom hpet-Timer ausgelöst worden zu sein: https://gitlab.com/qemu-project/qemu/-/commit/f0ccf770789e48b7a73497b465fdc892d28c1339

Bei mir hilft qm set 1234 --args '-machine hpet=off' als Workaround (für die VM 1234). Wäre gut zu wissen, ob das auch in Eurem Fall hilft, um festzustellen, ob es wirklich das gleiche Problem ist.

EDIT: Insbesondere die dritte Änderung, also dass der Timer ständig laufen muss, um ein bestimmtes Flag zu setzen, klingt als würde sie die erhöhte CPU-Last erklären:

- the timer must be kept running even if not enabled, in
order to set the ISR flag, so writes to HPET_TN_CFG must
not call hpet_del_timer()

gfngfn256 · Mar 19, 2025

fiona said:
Would be nice if other people could quickly confirm this

Sorry I just reverted to pve-qemu-kvm=9.0.2-5 which fixed the issue. If I get another chance I may try your suggested workaround.

gfngfn256 · Mar 19, 2025

fiona said:
The issue seems to be caused by the following fixes/changes in the implementation of the hpet timer: https://gitlab.com/qemu-project/qemu/-/commit/f0ccf770789e48b7a73497b465fdc892d28c1339

And for me, qm set 1234 --args '-machine hpet=off' is indeed a workaround (for VM 1234).

Out of interest. Assuming this is the issue; how long would it take to fix, update & release the new pve-qemu-kvm 9.2.x ?

Ernst T. · Mar 19, 2025

fiona said:
Wäre gut zu wissen, ob das auch in Eurem Fall hilft, um festzustellen

I~~ch hab den Befehl getestet und bin im Rescue-Mode der VM gelandet (weil keine fstab mounts möglich waren).~~

Die CPU-Auslastung war dabei aber super niedrig

Edit: Auch bei mir ist mittlerweile alles OK. Das Problem war, dass ich schon einen "args" Eintrag in der config hatte, der mit dem Befehl gelöscht wurde. Manuell editieren hat den Erfolg gebracht!

Thx

t.lamprecht · Mar 19, 2025

gfngfn256 said:
Out of interest. Assuming this is the issue; how long would it take to fix, update & release the new pve-qemu-kvm 9.2.x ?

We cannot really give any time guarantees here, depends on what the options are and how involved the fix is.
Could be days, could be weeks and FWIW, it could also mean that the new behavior is correct and optimizations need to happen in other parts (e.g. guest kernel).
Feedback here about workarounds from devs like the one from Fiona definitively helps to shorten the time to have a more definitive fix, or at least answer to what it's gonna be, as that ensures that the correct change causing the regression was determined.

Pasty89 · Mar 19, 2025

Hatte das selbe "Problem" nach dem Update auf qemu 9.2.
Alle VMs hatten laut Graph höhere CPU Last. Auf der VM direkt war nichts zu sehen.
Also genau das was hier schon andere gemeldet haben.
Die Gesamtlast (Graph) auf dem PVE-Host ging im selben Zeitraum allerdings nicht hoch, was ich merkwürdig finde.

Zum hpet Workaround. Hab ihn auf allen meinen VMs aktiviert und die VMs gestoppt/neu gestartet. Die Graphen zeigen nun wieder die Werte von vor dem Update an. Die Gesamtlast auf dem PVE-Host ist gleich niedrig geblieben.
Der WA scheint also zu "helfen". Auch wenn ich keinerlei Auswirkungen hatte, bis auf die höherern Werte im Graphen der VMs

Edit: nutze auf allen VMs Debian Bookworm 12.10

JensF · Mar 19, 2025

fiona said:
Bei mir hilft qm set 1234 --args '-machine hpet=off' als Workaround (für die VM 1234). Wäre gut zu wissen, ob das auch in Eurem Fall hilft, um festzustellen, ob es wirklich das gleiche Problem ist.

Scheint bei mir zu helfen. CPU-Last im Leerlauf wieder wie vor dem Update auf meiner alten Testmaschine.

fiona · Mar 19, 2025

Ernst T. said:
Ich hab den Befehl getestet und bin im Rescue-Mode der VM gelandet (weil keine fstab mounts möglich waren).

Die CPU-Auslastung war dabei aber super niedrig

Was war die genaue Fehlermeldung? Wie schaut die VM-Konfiguration aus? Was für ein Kernel wird im Gast benutzt? Klingt ein bisschen seltsam, dass das Ausschalten von dem HPET-Timer-Device das verursachen würde. Linux sollte einfach auf andere Timer zurückgreifen soweit ich weiß.

isi · Mar 19, 2025

Auch hier verrringert sich die CPU Last nach dem Workaround. Vielen Dank dafür.

plötzlich höhere CPU-Last bei allen Linux-VMs?!

Distinguished Member

Distinguished Member

Member

Distinguished Member

Proxmox Staff Member

Proxmox Staff Member

Distinguished Member

Distinguished Member

Proxmox Staff Member

Distinguished Member

Distinguished Member

Proxmox Staff Member

Distinguished Member

Distinguished Member

Renowned Member

Proxmox Staff Member

New Member

Renowned Member

Proxmox Staff Member

Member

We value your privacy