[SOLVED] Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

jens-maus · Jul 17, 2023

Hi,

we are running quite a large Proxmox cluster environment here (23 nodes) and have recently updated all our nodes to Proxmox 8.0 according to the official documentation. Since then we are having severe issues with all our qemu-driven Windows Server 2019 VMs, which we use to provide users with a connection-broker driven RDP Terminal-Server cluster. All other VMs (Linux, Windows10, WindowsXP, etc. run smoothly).

The issues we are seeing are, that out of a sudden only the Windows2019 VMs are ending up in 100% CPU usage (CPU usage jumping up&down all the time, but especially when more and more users are using a certain Terminal-Server VM). This is also visible on the used resources statistics of the Proxmox host itself where the kvm process also starts to consume almost all free CPU time of the Proxmox host system at a certain time until the VM itself becomes unresponsive - even network wise. Thus, running a constant ICMP ping against such a Windows2019 VM we can see that while usually ICMP ping requests show normal ping request times (< 1 ms) ping request times, as soon as the CPU time of these VMs go wild the ping times raise up to 80-100 seconds or even for a period of 20-30 seconds end up with no ping reply at all (VM being unresponsive). Furthermore, when this happens even on the Proxmox console we can see that the mouse pointer cannot be moved anymore, thus the VM stalls completely. However, after some time (20-30 seconds or even minutes) the VM returns to almost normal until the same happens after a while.

After some more investigation on that matter we found:

We have the same issue on all our Windows2019 VMs with different underlying Proxmox host hardware.
We can easily reproduce the issue using the JetStream2 browser benchmark suite under Google Chrome (https://browserbench.org/JetStream/). Most of the time as soon as the "crypto-sha" crypto test starts running the cpu usage jumps to 100% and the issue starts to manifest until the whole VM suddenly stalls and become unresponsive.
Changing Processor/CPU Type does not solve the issue (also not when using 'host').
Disabling memory ballooning does not solve the issue.
Changing machine type (i440fx vs q35) or version does not solve the issue.
Updating virtio tools or trying to use older versions does not solve the issue.
Trying to perform a fresh Windows2019 installation on such affected Proxmox 8 host ends up in the same 100%CPU usage problem during the installation process.
The affected VMs run smoothly under the old Proxmox 7 environment which we could test by having downgraded one node to Proxmox 7.

However, after some last investigation before starting to completely downgrading the cluster to Proxmox 7 we simply booted the affected proxmox hosts (which usually host these Windows2019 VMs) into the still installed kernel 5.15 (5.15.108-1-pve) kernel. This immediately solved the issues and the Windows2019 VMs run smoothly now within the Promox 8 environment, but using kernel 5.15 instead of the new 6.2 version which usually comes with Proxmox 8.

Thus, we now used the proxmox-boot-tool command to pin these Windows2019 VM hosting hosts to kernel 5.15 for the time being. Thus, while other Proxmox 8 hosts run smoothly with kernel 6.2 in our cluster (because they only host non-Windows2019 VMs), we keep the affected Proxmox nodes to run kernel 5.15 until the issue is understood and hopefully fixed.

Nevertheless, I am highly curious if others which are running Windows2019 VMs are having the same issue or can reproduce this issue. Furthermore, it would be nice if someone of the Proxmox staff could assist in further investigation, simply because it seems that this issue is rather kernel 6.2 related than qemu or debian bookworm related since simply booting into kernel 5.15 seems to completly solve the issue without having to downgrade to Proxmox 7.

Thus, any help would be highly appreciated and if I should provide more technical details or tests, please let me know since we have a perfect reproducible case here. All I need to do is reboot into kernel 6.2 and the issue manifests itself immediately.

andrewrf · Jul 17, 2023

I believe we are having this same issue. After migrating Windows2019 VMs to cluster running Proxmox 8 the CPUs are consistently pegging and causing unresponsiveness.

jens-maus · Jul 17, 2023

andrewrf said:
I believe we are having this same issue. After migrating Windows2019 VMs to cluster running Proxmox 8 the CPUs are consistently pegging and causing unresponsiveness.

Good to hear that we are not the only ones with that issue. Please try to boot the affected proxmox8 nodes with kernel 5.15 and see if this solves your issues the same way like it solved it for us.

And if so I would be curious to hear from someone of the Proxmox developers/staff if they can escalate it and try to investigate themselves why Windows2019 VMs suddenly go wild with kernel 6.2 but run smoothly with kernel 5.15. As mentioned, this is perfectly reproducible here.

andrewrf · Jul 17, 2023

This was a new cluster so we weren't able to just roll back the kernel since the files weren't there; but we were able to install the kernel and then move the boot.

We tested the same things as you; changing the cpu type, installing updated virtio drivers
Additionally we attempted to disable numa.
None of these things made a difference.

apt install kernel 5.15.74-1-pve
proxmox-boot-tool kernel pin 5.15.74-1-pve

resolved the issue; however we will likely need to move back to proxmox 7 asap as there are some CEPH things etc that are not fully compatible with the older kernel; and it'll mess with future updates

We are using dual socket intel xeon cpu e5-2699 v4 @ 2.20 ghz

Ramalama · Jul 17, 2023

First, good to know that you tested and tryed out everything to find the root case.

Second, it's very hard to believe that this is an kernel issue, but it looks like that it affects everyone, means i need to try that out either on all my 5 nodes to rule out if it's a general issue or a specific platform issue.
Means i can test on an Ryzen 5800x, on an pretty old xeon E3-1275V5, on an 13th gen i3-1315, and on an 11th gen i5.

Usually i have only good experiences with newer kernels, never needed an older one in any situation, but i experienced some kernel bugs already between some kernels that got fixed later on with even newer kernels either.

However, the only experience i had with windows where it sucked all cpu power, was actually outlook, and only because i blocked via dns some Microsoft domains.
Outlook somewhat tryed to resolve them constantly which leaded to 80% cpu consumption

Do you have some more info, if it happens only on windows server 2019 or with some other versions either, or if they are fully updated and so on?

What is the easiest way to reproduce it? Just installing ws2019 and play a bit with rdp while checking cpu utilization between kernel 6.2 and 5.15?

Cheers

Ramalama · Jul 17, 2023

All hosts are on PVE 8.0.3
Quote: "We can easily reproduce the issue using the JetStream2 browser benchmark suite under Google Chrome (https://browserbench.org/JetStream/)."
--> All tests are made with Google Chrome and https://browserbench.org/JetStream/

Well, first Machine 13th gen I3-1315 / 64GB (Kernel: 6.2.16-4)
VM Settings: Q35/OVMF, 2 Cores, host, 8GB Ram, Virtio SCSI Single + Virtio Block + Virtio NET (Backend Storage LVM)
- WS2019 Build 1809 -> Browserbench JetStream 2.0 Score -> 215.539
- Issues: None -> Everything is fine, idle uses 0,82% Cpu utilization, During Benchmarking spikes to 100% Cpu utilization, but thats a benchmark...
- Everything is during benchmarking fluid and perfect.
- I won't test here with 5.15 kernel, as it makes no sense since it runs perfect.

Continiung to next node, Ryzen 5800x / 128gb ECC (Kernel: 6.2.16-3)
VM Settings Same as above (did an online migration), but backend Storage is now ZFS
- Browserbench JetStream 2.0 Score -> 215.314
- Issues: None -> Everything is fine, idle uses 0,41% Cpu utilization, During Benchmarking spikes to 100% but only for like 3 Seconds...
- Everything is during benchmarking fluid and perfect.
- I won't test here with 5.15 kernel, as it makes no sense since it runs perfect.

Continiung to next node, Xeon E3-1275V5 / 64gb ECC (Kernel: 6.2.16-3)
VM Settings: Q35/OVMF, 2 Cores, host, 8GB Ram, Virtio SCSI Single + Virtio Block + Virtio NET (Backend Storage ZFS)
- WS2019 Build 1809 -> Browserbench JetStream 2.0 Score -> 113.185
- Issues: None -> Everything is fine, idle uses 1,22% Cpu utilization, During Benchmarking spikes to 100% Cpu utilization, slow cpu anyway...
- Everything is during benchmarking fluid and perfect.
- I won't test here with 5.15 kernel, as it makes no sense since it runs perfect.

Same node but newer Kernel, Xeon E3-1275V5 / 64gb ECC (Kernel: 6.2.16-5)
VM Settings: Same as Above
- WS2019 Build 1809 -> Browserbench JetStream 2.0 Score -> 124.897
- Issues: None -> Everything same as above

--> VM Config of the VM on Xeon E3-1275V5

Code:

cat /etc/pve/qemu-server/230.conf
agent: 1
bios: ovmf
boot: order=virtio0
cores: 2
cpu: host
efidisk0: SSD_Mirror:vm-230-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
machine: pc-q35-8.0
memory: 8092
meta: creation-qemu=8.0.2,ctime=1689620452
name: WS2019test
net0: virtio=12:D0:A1:D8:BA:48,bridge=vmbr0
numa: 0
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=54927f9b-3b04-4427-a731-24bf3fa64897
sockets: 1
virtio0: SSD_Mirror:vm-230-disk-1,iothread=1,size=32G
vmgenid: 262d4489-c43b-4ae9-a530-bbe142d6ad24

--> VM Config of the VM on Ryzen 5800X & 13th gen i3-1315 (Thats a Cluster)

Code:

cat /etc/pve/qemu-server/220.conf
agent: 1
bios: ovmf
boot: order=virtio0
cores: 2
cpu: host
efidisk0: NVME_ZFS_R10:vm-220-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
machine: pc-q35-8.0
memory: 8092
meta: creation-qemu=8.0.2,ctime=1689617652
name: WS2019test
net0: virtio=E2:24:65:0E:30:27,bridge=vmbr0
numa: 0
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=a80a6ca6-ca97-40ee-9212-539e1c8772eb
sockets: 1
virtio0: NVME_ZFS_R10:vm-220-disk-1,format=raw,iothread=1,size=32G
vmgenid: 49f95f20-edf1-4fc2-a4f6-b39f3c53d263

So it doesn't look to me like an Kernel issue, you guys need to find out what similaritys you have both.
Hopefully this was worth my time xD

Cheers

jens-maus · Jul 17, 2023

andrewrf said:
This was a new cluster so we weren't able to just roll back the kernel since the files weren't there; but we were able to install the kernel and then move the boot.

We tested the same things as you; changing the cpu type, installing updated virtio drivers
Additionally we attempted to disable numa.
None of these things made a difference.

apt install kernel 5.15.74-1-pve
proxmox-boot-tool kernel pin 5.15.74-1-pve

We are using the latest kernel 5.15.108-1-pve from proxmox 7, btw.

andrewrf said:
resolved the issue; however we will likely need to move back to proxmox 7 asap as there are some CEPH things etc that are not fully compatible with the older kernel; and it'll mess with future updates

Luckily we are not using ceph on these nodes, thus we are not having any issues with keeping kernel 5.15 running. However, we also would like to use kernel 6.2.x in future of course on all our nodes.

andrewrf said:
We are using dual socket intel xeon cpu e5-2699 v4 @ 2.20 ghz

Same here, dual socket xeon CPUs with ~125GB of RAM available. But we also have a AMD EPYC 7313 node also with ~250GB of RAM and we are seeing the same issues here on that node as well.

What I actually recently noticed is, that the issue manifests itself more prominent when assigning several hundreds of GB of RAM to the Windows2019 VMs. Everything seems to run more smoothly when just assigning something like 32/64GB of RAM to the VM. However, as soon as I assign, e.g. several hundreds of GB of RAM to the VM the issue manifests itself more easily. See for example the following ICMP ping results when simply logging in to the Windows2019 VM when 128GB of RAM has been assigned to it:

Code:

64 bytes from testvm (192.168.72.192): icmp_seq=2313 ttl=128 time=0.412 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2314 ttl=128 time=0.321 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2315 ttl=128 time=9512 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2316 ttl=128 time=8488 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2317 ttl=128 time=7464 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2318 ttl=128 time=6436 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2319 ttl=128 time=5416 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2320 ttl=128 time=4392 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2321 ttl=128 time=3368 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2322 ttl=128 time=2345 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2323 ttl=128 time=1321 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2324 ttl=128 time=297 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2325 ttl=128 time=0.344 ms
64 bytes from testvm (192.168.72.192): icmp_seq=2326 ttl=128 time=0.340 ms

Note the almost 10 seconds response delay.

If I repeat the same ping reply test with the same VM but only assigning 32GB of RAM to it the ping times stay reasonable.

Can you reproduce this by limiting the RAM to a lower amount and check if the issue vanishes or isn't so prominent anymore.

andrewrf · Jul 17, 2023

jens-maus said:
If I repeat the same ping reply test with the same VM but only assigning 32GB of RAM to it the ping times stay reasonable.

Can you reproduce this by limiting the RAM to a lower amount and check if the issue vanishes or isn't so prominent anymore.

the vms running into the issue were all running RDS and had 40-64gb; and we can confirm that we weren't having the issue to the level that you were. ie users were complaining of notable slowness; but not to the point that the systems completely locked up.

jens-maus · Jul 17, 2023

Ramalama said:
Second, it's very hard to believe that this is an kernel issue, but it looks like that it affects everyone, means i need to try that out either on all my 5 nodes to rule out if it's a general issue or a specific platform issue.
Means i can test on an Ryzen 5800x, on an pretty old xeon E3-1275V5, on an 13th gen i3-1315, and on an 11th gen i5.

As said in my other recent post, I can also perfectly reproduce the issue on a AMD EPYC driven Proxmox8/kernel 6.2 node. So to me this looks like being unrelated to any specific CPU/hardware type but being related to memory management, etc.

Ramalama said:
Do you have some more info, if it happens only on windows server 2019 or with some other versions either, or if they are fully updated and so on?

As said earlier, the issue currently only manifests itself on our windows server 2019 VMs and not on any other windows10, windows7 or even windowsxp VM. However, the windows server 2019 vms are of course the "largest" ones where in most cases they are actually the only VMs running on the underlying proxmox host system with all CPU/cores being assigned to that exclusive VMs as well as almost all RAM being assigned to them so that they can manage up to a dozen terminal server users without any issues.

Ramalama said:
What is the easiest way to reproduce it? Just installing ws2019 and play a bit with rdp while checking cpu utilization between kernel 6.2 and 5.15?

In our case, the easiest way to reproduce the issue is to keep a permanent ping command running against the main IP of the windows 2019 VM and then start it while monitoring the ping response time deviations very closely especially when logging into the proxmox console as an administrator. In our case especially the login through the proxmox console shortly results in prolonged ICMP response times of 10 seconds or more. And while the ICMP ping replies start to stall for some time the proxmox console also completely stalls and the VM CPU raises to 100% until it becomes responsive again. As said in my last post, we just recently discovered, that with increasing RAM assignment the issues seems to be more prominent. Thus, if you can afford it, try to increase the assigned RAM and retry with the constant ICMP ping approach.

Ramalama · Jul 17, 2023

jens-maus said:
As said in my other recent post, I can also perfectly reproduce the issue on a AMD EPYC driven Proxmox8/kernel 6.2 node. So to me this looks like being unrelated to any specific CPU/hardware type but being related to memory management, etc.

As said earlier, the issue currently only manifests itself on our windows server 2019 VMs and not on any other windows10, windows7 or even windowsxp VM. However, the windows server 2019 vms are of course the "largest" ones where in most cases they are actually the only VMs running on the underlying proxmox host system with all CPU/cores being assigned to that exclusive VMs as well as almost all RAM being assigned to them so that they can manage up to a dozen terminal server users without any issues.

In our case, the easiest way to reproduce the issue is to keep a permanent ping command running against the main IP of the windows 2019 VM and then start it while monitoring the ping response time deviations very closely especially when logging into the proxmox console as an administrator. In our case especially the login through the proxmox console shortly results in prolonged ICMP response times of 10 seconds or more. And while the ICMP ping replies start to stall for some time the proxmox console also completely stalls and the VM CPU raises to 100% until it becomes responsive again. As said in my last post, we just recently discovered, that with increasing RAM assignment the issues seems to be more prominent. Thus, if you can afford it, try to increase the assigned RAM and retry with the constant ICMP ping approach.

I have only 128gb atm. So I won't be able to reproduce it.
As you seen i did some testing, but at that time no one mentioned that it could be a ram issue.

However, i don't think i will be able to reproduce that issue with the hardware i have atm.

jens-maus · Jul 17, 2023

Ramalama said:
I have only 128gb atm. So I won't be able to reproduce it.
As you seen i did some testing, but at that time no one mentioned that it could be a ram issue.

However, i don't think i will be able to reproduce that issue with the hardware i have atm.

Well, as I said, increasing RAM just increases probability to manifest the issue, but it is also visible with, e.g. 64GB or less. What I can spot here is, that ICMP ping times even during normal windows2019 VM startup varies widely between kernel 6.2 and kernel 5.15 uses. What I do here is, I start a ping command which constantly pings the IP of the windows2019 VM. Then I start the windows2019 VM and also monitor the startup using the Proxmox console view. When running kernel 6.2 I can see that the ping response times increase at a certain point up to 10-20 seconds and also during login on the proxmox console once the VM is up&running. And also on the proxmox console a smart stalling is visible by seeing that the spinning points animation during startup stops at a certain point and then continues. However, when starting the VM on a Proxmox 8 host with kernel 5.15 the ICMP ping time don't increase as much as under kernel 6.2 and also the proxmox console runs more smooth during VM startup.

And during normal operation with multiple RDS users usually connecting to the windows2019 terminal server this symptom (increased ICMP times) is usually a good indication if the VM tends to manifest the 100%CPU issue or not during normal operation.

jens-maus · Jul 17, 2023

andrewrf said:
the vms running into the issue were all running RDS and had 40-64gb; and we can confirm that we weren't having the issue to the level that you were. ie users were complaining of notable slowness; but not to the point that the systems completely locked up.

A complete lockup is also very rarely visible here. However, as mentioned, during startup I can easily spot very long ICMP ping times when running kernel 6.2 and as soon as I boot into kernel 5.15 the Windows2019 is not only more responsive during normal operation but also ICMP ping times don't vary so much.

Ramalama · Jul 18, 2023

EDIT: Deleted the screenshots for privacy xD As no one cares anyway and it shows no issues anyway.

There you go...

My ping never raised above 1ms, no matter if i logged in, booted vm, or did the Jetstream Benchmark.
Even the Score didn't changed from 8gb Ram (Benchmarks above) to 48gb Ram (Benchmark in the screenshot)
210 or 215k is in the range of tolerances in such browser benchmarks.

The only thing that wonders me is, i thought that ballooning would free up the memory on the Host.
But the full 48gb ram that is assigned to the VM gets consumed on the host with ballooning....
@Dunuin do you have a clue? Is that normal?
Shit Soorry @Dunuin ignore this... i setted min and max to 48gb :-(
Stupid layer 8 error xD

emunt6 · Jul 18, 2023

Hi!
Yesterday upgraded to the latest pve8 on the test system, after Windows Server 2019 VMs are became super slow, even using the "installer dvd: en_windows_server_2019_updated_jan_2021_x64_dvd_5ef22372.iso" take ages to load.

The symptom: The loading circle doing 2x around, then became extra slow almost stops.

Code:

root@proxmox2:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-4-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.3
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.16-4-pve: 6.2.16-4
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx3
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.6
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.4
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.2
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

jens-maus · Jul 19, 2023

emunt6 said:
Yesterday upgraded to the latest pve8 on the test system, after Windows Server 2019 VMs are became super slow, even using the "installer dvd: en_windows_server_2019_updated_jan_2021_x64_dvd_5ef22372.iso" take ages to load.

The symptom: The loading circle doing 2x around, then became extra slow almost stops.

That's exactly what we are seeing here as well when trying a fresh install. Can you try to downgrade to kernel 5.15 and see if this improves the situation? In addition, please state which underlying host hardware you are using and what are the VM config settings especially regarding number of assigned CPU sockets and cores. In recent tests it looks like that limiting the windows2019 VM to only a single CPU socket improves the VM performance also under kernel 6.2. That would explain why @Ramalana did not see any problem since he has just a single CPU on his host hardware.

der-berni · Jul 19, 2023

Hi, today my VM with Ubuntu 20.04.1 became unreachable with 100% CPU usage after 21 days.

PVE Host:
Intel Xeon E-2336
128GB RAM
Linux 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z)
pve-manager/8.0.3/bbf3993334bfa916

Linux VM config:
agent: 1
balloon: 0
bootdisk: virtio0
cores: 4
description: Distributor ID%3A Ubuntu%0ADescription%3A Ubuntu 20.04.1 LTS%0ARelease%3A 20.04%0ACodename%3A focal
ide2: none,media=cdrom
memory: 16384
name: mail
net0: virtio=06:E3:97:46:31:A6,bridge=vmbr0,tag=20
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=1bdfebfb-76a4-4198-b54a-02f69661532c
sockets: 1
startup: order=10
virtio0: local-zfs:vm-810-disk-0,discard=on,size=65G
virtio1: irontank:vm-810-disk-0,backup=0,discard=on,size=150G
vmgenid: f441c6fc-c4a5-4d8b-a5f0-95d149e18245

###

On another PVE host I have the same problem and the VM (Windows Server 2022) became unreachable after about 5 days. This happened 3 times and now i have pinned this host to kernel 5.15 for testing.

PVE Host:
Intel Xeon E-2236 CPU
64GB RAM
Linux 5.15.108-1-pve #1 SMP PVE 5.15.108-1 (2023-06-17T09:41Z)
pve-manager/8.0.3/bbf3993334bfa916

Windows VM config:
agent: 1
balloon: 0
boot: order=sata0;ide2;net0
cores: 6
cpu: x86-64-v2-AES
ide2: none,media=cdrom
machine: pc-i440fx-8.0
memory: 20480
name: app-srv
net0: virtio=EA:4F:CE:F7:23:00,bridge=vmbr0
numa: 1
onboot: 1
ostype: win11
sata0: local-zfs:vm-100-disk-0,discard=on,size=256G,ssd=1
sata1: data-pool:vm-100-disk-0,size=2T
sata2: data-pool:vm-100-disk-1,backup=0,size=4T
scsihw: virtio-scsi-pci
smbios1: uuid=c1e396e1-30a7-424d-a450-c3bceebb5b2e
sockets: 1
startup: order=10
tpmstate0: local-zfs:vm-100-disk-1,size=4M,version=v2.0
vga: virtio
vmgenid: e4bd8800-48c9-48a4-89e1-f4c562d024c0

Neobin · Jul 19, 2023

emunt6 said:
the "installer dvd: en_windows_server_2019_updated_jan_2021_x64_dvd_5ef22372.iso" take ages to load.

Maybe?:

der-berni · Jul 19, 2023

Here is another Thread, maybe the same issue?:
https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459

They all have issues with kernels > 5.15

jens-maus · Jul 19, 2023

der-berni said:
Here is another Thread, maybe the same issue?:
https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459

They all have issues with kernels > 5.15

This sounds interesting. Especially the last post mentioning the possibility that the kernel command line mitigations=off could potentially solve the issue. This is something that should be probably tested to see if this might solve the issue with kernel > 5.15.

ITT · Jul 19, 2023

Wow,
upgraded some different Servers to PVE8 (standalone/ceph-cluster and so on...) and the 2019-VMs (RDS mostly) runs better/faster than with PVE7.
The CPU´s are all differently, few old Xeons to modern Xeons, AMD Milans etc.....

The only one where they are similar, the CPU-Governor is always SCHEDUTIL.

[SOLVED] Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

Member

New Member

Member

New Member

Renowned Member

Renowned Member

Member

New Member

Member

Renowned Member

Member

Member

Renowned Member

Active Member

Member

New Member

Distinguished Member

New Member

Member

Renowned Member

We value your privacy