High CPU

Earias · Monday at 06:50

Hello

I have an odd issue that started after my server crashed. My current setup is as followed:

256 core EPYC Milan
512GB memory

I have this split into three Vms (windows servers) and it's running the latest proxmox version.

My server crashed last week, not sure what happened but I had to force shut it down. Once it came back online, one of my Vms runs terrible now. CPU immediately goes to 50-60 percent and stays there everytime I start it. When I remote in, it feels so slow. Any tips or tricks to see what could be causing this poor performance from one of my vms after the crash?

Earias · Monday at 07:26

Here is a screenshot of what the usage looks like with nothing running on it.

IsThisThingOn · Monday at 07:48

Looks like a problem within your VM.

Earias · Monday at 07:54

IsThisThingOn said:
Looks like a problem within your VM.

I agree although there is nothing installed on it. It's pretty barebones. I suppose I could start over but will need to do some back ups. Just very odd how it started.

Earias · Monday at 08:12

What's the best way to check the logs to see what caused the server to crash?

IsThisThingOn · Monday at 08:59

Depends on the OS

Earias · Monday at 09:00

IsThisThingOn said:
Depends on the OS

Was referring to the initial proxmox server crash.

IsThisThingOn · Monday at 14:07

Proxmox is mostly Debian, so maybe under /var/logs.

These are just wild guesses but my guess would be that your VM has a memory leak, and because of that Proxmox used up all its RAM, ARC had a to low swappiness, and/or since there is no swap file for ZFS the system crashed.

Earias · Monday at 18:04

IsThisThingOn said:
Proxmox is mostly Debian, so maybe under /var/logs.

These are just wild guesses but my guess would be that your VM has a memory leak, and because of that Proxmox used up all its RAM, ARC had a to low swappiness, and/or since there is no swap file for ZFS the system crashed.

Thanks. Tried setting up a new VM this morning and similar issue. Stuck at 100 percent once it loads the setup files.

Earias · Monday at 20:40

Looks like it is related to the Windows Server 2025 ISO. 2019 works fine. Has anyone successfully loaded Server 2025 on a proxmox VM? Not sure why it would be any different than an old server image.

VictorSTS · Monday at 21:09

Do you have qemu-guest-tools enabled in the VM config, installed and running in the guest OS?
Do you have the balloon driver installed and running in the guest OS?

The former is needed for QEMU to give accurate memory usage information to PVE with Windows OS and the later for QEMU to be able to reclaim for the host memory pages no longer in use by the guest (if needed).

Earias · Monday at 21:19

VictorSTS said:
Do you have qemu-guest-tools enabled in the VM config, installed and running in the guest OS?
Do you have the balloon driver installed and running in the guest OS?

The former is needed for QEMU to give accurate memory usage information to PVE with Windows OS and the later for QEMU to be able to reclaim for the host memory pages no longer in use by the guest (if needed).

Hi. I do have the tools running (I can see the ipv4 and ipv6 of the guest vm via proxmox dashboard), I don't recall installing the balloon driver but I disabled memory ballooning to see if that's what was causing it and it didn't make a difference.

IsThisThingOn · Tuesday at 08:24

Earias said:
Stuck at 100 percent once it loads the setup files.

Stuck at what? Loading a Windows setup? RAM usage? CPU usage?

So just to get this right, you create a new naked VM, you followed the Windows Server 2025 best practices and try to boot into Windows Server 2025 installation?

Earias · Tuesday at 18:39

IsThisThingOn said:
Stuck at what? Loading a Windows setup? RAM usage? CPU usage?

So just to get this right, you create a new naked VM, you followed the Windows Server 2025 best practices and try to boot into Windows Server 2025 installation?

I setup a fresh VM, went through the initial installation, after the reboot the CPU usage goes crazy:

Stuck on this:

Using these settings:

Thanks!

VictorSTS · Tuesday at 18:42

Earias said:
Hi. I do have the tools running (I can see the ipv4 and ipv6 of the guest vm via proxmox dashboard), I don't recall installing the balloon driver but I disabled memory ballooning to see if that's what was causing it and it didn't make a difference.

The snapshot you posted seems to indicated that balloon was off and/or the balloon service wasn't running correctly. That may not be related to the issue, just got my attention the high mem usage shown.

Earias said:
Using these settings:

Change CPU type to host and try again. Also try with CPU type "x86-64-v2-AES".

Why such a high CPU count for a testing machine?

Earias · Tuesday at 19:37

VictorSTS said:
The snapshot you posted seems to indicated that balloon was off and/or the balloon service wasn't running correctly. That may not be related to the issue, just got my attention the high mem usage shown.

Change CPU type to host and try again. Also try with CPU type "x86-64-v2-AES".

Why such a high CPU count for a testing machine?

Thank you. I am starting to notice some odd problems here. I did change the CPU type to your recommendations - thanks for that. I still had the same issue BUT once I bumped the cores down to 8, it worked. It seems like it doesnt like the high core settings. As soon as I add more cores, it freezes up to high CPU. Quite an odd issue here. When I had it set to 8 cores, worked fine. What do you think would cause that odd issue?

IsThisThingOn · Tuesday at 20:13

IsThisThingOn said:
you followed the Windows Server 2025 best practices?

?

Earias · Tuesday at 20:24

IsThisThingOn said:
?

I followed this guide: https://pve.proxmox.com/wiki/Windows_2025_guest_best_practices . Its just odd that as I increase the core count, the CPU usage increases. I dont quite understand that.

VictorSTS · Tuesday at 23:58

Earias said:
What do you think would cause that odd issue?

I would:

Try ticking NUMA in the VM CPU settings.
Install updated microcode [1].
I don't see any mention about your PVE version, running kernel, etc, so another option could be to use an updated kernel or even a previous one (maybe after the crash it booted with a different kernel? Check /var/log/apt/history.log* about when packages where installed/upgraded).
Check NPS setting in BIOS (Numa nodes per socket) and try with different settings [2].
Triple check that the storage is ok at low level, PVE host level and inside the VMs.

[1] https://pve.proxmox.com/wiki/Firmware_Updates
[2] https://infohub.delltechnologies.com/fr-fr/p/numa-configuration-settings-on-amd-epyc-2nd-generation/

Earias · 2025-02-19T01:43:43+0100

VictorSTS said:
I would:

Try ticking NUMA in the VM CPU settings.

Install updated microcode [1].

I don't see any mention about your PVE version, running kernel, etc, so another option could be to use an updated kernel or even a previous one (maybe after the crash it booted with a different kernel? Check /var/log/apt/history.log* about when packages where installed/upgraded).

Check NPS setting in BIOS (Numa nodes per socket) and try with different settings [2].

Triple check that the storage is ok at low level, PVE host level and inside the VMs.

[1] https://pve.proxmox.com/wiki/Firmware_Updates
[2] https://infohub.delltechnologies.com/fr-fr/p/numa-configuration-settings-on-amd-epyc-2nd-generation/

Thank you very much for your suggestions.

- Tried enabling NUMA, rebooted VM no change
- Installed latest AMD microcode, rebooted server and no change
- Running on Proxmox version 8.3.3 and kernal version 6.8.12-8-pve

You mentioned check the storage at low level. All devices are "Passed" for S.M.A.R.T. under the disks.

Another odd thing is I have two other VMS on this cluster which are working fine.

High CPU

New Member

New Member

Active Member

New Member

New Member

Active Member

New Member

Active Member

New Member

Attachments

New Member

Famous Member

New Member

Active Member

New Member

Attachments

Famous Member

New Member

Active Member

New Member

Famous Member

New Member

We value your privacy