VM uses 100 % CPU

showiproute

Well-Known Member
Mar 11, 2020
610
32
48
36
Austria
Hello everyone,

for a couple of months my 2nd PVE server is having a strange behaviour: From time to time a VM consumes all the CPU and the VM is no longer responsive.
I need to stop it and start it again.

This usually happens on a Win 2022 Standard Server VM which acts as a file server with several SMB shares.
One day I had the same behaviour on an Ubuntu 22.04 VM.

On my 1st PVE server such issues never occur.


My hardware would be:
Intel(R) Xeon(R) CPU E5-2637 v3
128 GB RAM

Kernel version: Linux 6.2.11-2-pve x86_64


Any ideas what happens?
 
Windows VM

Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-lvm:vm-109-disk-2,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: none,media=cdrom
machine: pc-q35-7.2
memory: 16384
name: WindowsServer
net0: virtio=3E:0E:15:D6:44:5A,bridge=vmbr0,tag=10
numa: 1
onboot: 1
ostype: win11
scsi0: local-lvm:vm-109-disk-0,discard=on,size=200G,ssd=1
scsi1: RAID_Storage_4TB:vm-109-disk-0,discard=on,size=2T
scsi2: Storage_12TB:vm-109-disk-1,backup=0,discard=on,size=8T
scsi3: Storage_10TB:vm-109-disk-0,backup=0,discard=on,size=3T
scsi4: Storage_10TB:vm-109-disk-1,discard=on,size=4T
scsi5: Storage_14TB:vm-109-disk-1,backup=0,discard=on,size=10T
scsi6: Storage_10TB:vm-109-disk-3,backup=0,discard=on,size=500G
scsihw: virtio-scsi-pci
smbios1: uuid=4e7d194d-448c-4585-b349-766632e7df21
sockets: 2
tablet: 0
vga: qxl
vmgenid: 87ec90a4-46cd-41d8-989d-d1cbbb51b0f0
vmstatestorage: local-lvm


Ubuntu VM
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: host
efidisk0: SSD_1TB:vm-107-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=7.1.0,ctime=1671557450
name: Gitlab
net0: virtio=3E:BE:94:BE:5B:85,bridge=vmbr0,firewall=1,tag=20
numa: 1
onboot: 1
ostype: l26
scsi0: SSD_1TB:vm-107-disk-1,discard=on,format=raw,iothread=1,size=70G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=b1bce2cb-a489-4f32-9729-4af30f9afc40
sockets: 2
tpmstate0: SSD_1TB:vm-107-disk-2,size=4M,version=v2.0
vga: qxl
vmgenid: ea971713-28db-4b81-ad15-3921aa649a1f
 
Last edited:
config look good, but maybe you could try to enable iothread on your windows vm disks ?

make sense with a lot of disk, as by default only 1 core is managing all disk access vs 1core is managing each disk with iothread.

It could remove some contention.

Also with discard, windows have a schedule task planned each week (you can configure it in windows defrag tool). I'm not sure if windows is not launching the trim of all disk at the same time... (maybe can you try to disable the task to see if it impact the storage)
 
I had this configuration for a year or so without any problems.
Not sure if this is somehow linked but I installed the newest 6.2 PVE kernel and used it instead of the 5.15 one.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!