CPU usage spikes on host but not on guest

xfls

New Member
Mar 25, 2023
7
0
1
I am trying to figure out what causes cpu spikes on my host server for a few weeks now without any success. The host is running Proxmox 7.3-6 and kernel 5.15.102-1-pve. The guest is running Ubuntu 20.04 with Intel X520-SR2 PCIe passthrough and processes up to 3.5gigabits of streaming traffic. During the spikes the increased CPU usage is taken by the kvm processes and there is no other process with raised CPU usage. I did not manage to relate the spikes with anything happening on the host or guest. The spikes happen at seemingly random times every few hours.

What I have tried:
  • Update host na guest with the latest updates
  • Move guest to another host (the same hardware) running only the troublesome guest. The screenshots below are from that host/guest
  • Test i440fx and q35 machine types
    • Tried PCIe passthrough option when using q35
  • Verify kernal options are applied and needed modules load based on https://pve.proxmox.com/wiki/PCI(e)_Passthrough
  • Disable "Use tablet for pointer"
  • Use VirtIO NIC for a day which removed the spikes but raised cpu usage on the host by around 50% of the one when using PCIe passthrough which is not acceptable in my case
  • Move the management traffic to another NIC and blacklist the ixgbe kernel module for my Intel X520-SR2 NICs on the host
  • Move the X520-SR2 NIC to another PCIe port. Verify it is running on x8
  • Veirifed no spikes are present when I run the guest on another host without traffic running through it
  • Tried kvm64, qemu64 and host CPU types
CPU usage on guest and host during a spike on the host:
https://www.dropbox.com/s/ua2s5xg4lohzf8q/guest.png?dl=0
https://www.dropbox.com/s/rsj4iiei9ki31lp/host.png?dl=0

I am adding a couple of screenshots of top on the host. One during a CPU usage spike and one in normal condition. It is very weird for me that the CPU usage of the kvm processes is basically the same on both but the total "user" cpu usage during the spike is almost 50% over the normal one. All the rest of the CPU usage type metrics are the same on both.

Normal: https://www.dropbox.com/s/effpufv2v00g2xc/image_2023_03_24T11_57_15_376Z.png?dl=0
Spike: https://www.dropbox.com/s/41nl47rtqscnpi3/image_2023_03_24T12_40_28_781Z.png?dl=0

Current VM config:
Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0
cores: 12
hostpci0: 0000:2c:00.1
memory: 12288
meta: creation-qemu=7.1.0,ctime=1675065563
name: Restreamer
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-103-disk-0,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=73b0bfb0-52d0-4422-a14f-64100606e0e3
sockets: 1
tablet: 0
vmgenid: d6c192e8-2dc0-482b-b0c0-60a59cf1d47e

Any help would be appreciated.
 
Last edited:
I have updated the nodes to Proxmox 7.4-3 but there are not changes in this behavior.
One more thing to note. During those spikes the number of cpu ticks reported by snmp goes over 100 multiplied by the number of cores 32 (16 physical and 16 logical). In the case of the attached graph below it goes up to ~3500

https://www.dropbox.com/s/d30b710cqbpw7z0/cpu-usage.png?dl=0

The system load average on the host also spikes during those periods. Nothing spikes on the guest.
 
Last edited:
I have updated to Proxmox 8 hoping that this will not be present on it but this even happens a bit more frequently.

I deployed node_exporter on the host and guest and absolutely nothing more spikes from all the metrics available on both isntances excpet the CPU usage on the host. The only extra information visiable now is that at the begining and in the end of the "user" CPU usage spike, there is a short "System" CPU usgae spike as well.
 

Attachments

  • Untitled.png
    Untitled.png
    22.2 KB · Views: 7
Last edited:
Looks like that my previous tests when I did not see the spikes when using VirtIO interface for a day were eather based on luck or the update to version 8 changed that. Now when using a VirtIO inteface (multiqueue) the same spikes are still present.

I missed to mention that the VM is a multicast video restreamer + web server. I suspect that the multicast is causing those spikes for some reason. My next idea is to move the multicast restreaming to a different host and see what happens with the spikes. No CPU spikes were present when the same work was done on a bare metal server.

Any ideas are welcome.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!