I am trying to figure out what causes cpu spikes on my host server for a few weeks now without any success. The host is running Proxmox 7.3-6 and kernel 5.15.102-1-pve. The guest is running Ubuntu 20.04 with Intel X520-SR2 PCIe passthrough and processes up to 3.5gigabits of streaming traffic. During the spikes the increased CPU usage is taken by the kvm processes and there is no other process with raised CPU usage. I did not manage to relate the spikes with anything happening on the host or guest. The spikes happen at seemingly random times every few hours.
What I have tried:
https://www.dropbox.com/s/ua2s5xg4lohzf8q/guest.png?dl=0
https://www.dropbox.com/s/rsj4iiei9ki31lp/host.png?dl=0
I am adding a couple of screenshots of top on the host. One during a CPU usage spike and one in normal condition. It is very weird for me that the CPU usage of the kvm processes is basically the same on both but the total "user" cpu usage during the spike is almost 50% over the normal one. All the rest of the CPU usage type metrics are the same on both.
Normal: https://www.dropbox.com/s/effpufv2v00g2xc/image_2023_03_24T11_57_15_376Z.png?dl=0
Spike: https://www.dropbox.com/s/41nl47rtqscnpi3/image_2023_03_24T12_40_28_781Z.png?dl=0
Current VM config:
Any help would be appreciated.
What I have tried:
- Update host na guest with the latest updates
- Move guest to another host (the same hardware) running only the troublesome guest. The screenshots below are from that host/guest
- Test i440fx and q35 machine types
- Tried PCIe passthrough option when using q35
- Verify kernal options are applied and needed modules load based on https://pve.proxmox.com/wiki/PCI(e)_Passthrough
- Disable "Use tablet for pointer"
- Use VirtIO NIC for a day which removed the spikes but raised cpu usage on the host by around 50% of the one when using PCIe passthrough which is not acceptable in my case
- Move the management traffic to another NIC and blacklist the ixgbe kernel module for my Intel X520-SR2 NICs on the host
- Move the X520-SR2 NIC to another PCIe port. Verify it is running on x8
- Veirifed no spikes are present when I run the guest on another host without traffic running through it
- Tried kvm64, qemu64 and host CPU types
https://www.dropbox.com/s/ua2s5xg4lohzf8q/guest.png?dl=0
https://www.dropbox.com/s/rsj4iiei9ki31lp/host.png?dl=0
I am adding a couple of screenshots of top on the host. One during a CPU usage spike and one in normal condition. It is very weird for me that the CPU usage of the kvm processes is basically the same on both but the total "user" cpu usage during the spike is almost 50% over the normal one. All the rest of the CPU usage type metrics are the same on both.
Normal: https://www.dropbox.com/s/effpufv2v00g2xc/image_2023_03_24T11_57_15_376Z.png?dl=0
Spike: https://www.dropbox.com/s/41nl47rtqscnpi3/image_2023_03_24T12_40_28_781Z.png?dl=0
Current VM config:
Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0
cores: 12
hostpci0: 0000:2c:00.1
memory: 12288
meta: creation-qemu=7.1.0,ctime=1675065563
name: Restreamer
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-103-disk-0,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=73b0bfb0-52d0-4422-a14f-64100606e0e3
sockets: 1
tablet: 0
vmgenid: d6c192e8-2dc0-482b-b0c0-60a59cf1d47e
Any help would be appreciated.
Last edited: