100% Memory Spikes on Guest

kavaa

New Member
Mar 28, 2024
7
1
3
Facing an issue with a Windows Server 2022 install that migrated from VMWare to Promox
That the Memory Usage spikes to 100% and goes back down.
The VM is super slow. Opening Task manager will take a long time.
We then see the system.exe process on the top.
Bu Guest memory is around +/-

Guest Agent is not running comes from the moment the Memory Spikes.
The Guest agent and drivers are installed

Host is a Dell R640 - with 2x Gold 6126 and 256GB Memory

1754575460945.png

Normal usage:

1754575543259.png

Windows Task Manager
1754575564382.png

1754575574384.png

Already ran sfc /scannow etc. no issues there and Windows is up to date.

Proxmox version 9.0.3 but this was already present on V8

Is there anyone that has had this issue before or can give us some pointers?

HTOP on the Host
VM is called VEEAM-BM365-01 <- Backup for Microsoft 365
1754575638050.png


Hardware overview of the VM:

1754575811906.png

Video of the Spikes

https://app.screencast.com/87SDFU0q5lBqT
 
Last edited:
Can you post the config of the VM? Run the following command on the host: qm config 110 and post the output within [code][/code] blocks (or use the formatting buttons of the editor.

Regarding "memory usage spikes" in the VMs summary panel: If the "Ballooning Agent" is not running in the VM, Proxmox VE can only take the "host view" of the memory consumption into account. So whenever you see the "Memory usage" being the same as "Host memory usage", the VM didn't report back detailed memory usage infos. That could be because the Ballooning Agent in a Windows VM isn't running or too slow to respond.
 
Can you post the config of the VM? Run the following command on the host: qm config 110 and post the output within [code][/code] blocks (or use the formatting buttons of the editor.

Regarding "memory usage spikes" in the VMs summary panel: If the "Ballooning Agent" is not running in the VM, Proxmox VE can only take the "host view" of the memory consumption into account. So whenever you see the "Memory usage" being the same as "Host memory usage", the VM didn't report back detailed memory usage infos. That could be because the Ballooning Agent in a Windows VM isn't running or too slow to respond.
Here you go

Code:
agent: 1,freeze-fs-on-backup=0
bios: ovmf
boot: order=sata1
cores: 6
cpu: x86-64-v2-AES
efidisk0: proxmox-ssd:110/vm-110-disk-0.qcow2,efitype=4m,size=528K
machine: pc-i440fx-9.2+pve1
memory: 16384
meta: creation-qemu=9.2.0,ctime=1753523009
name: VEEAM-BM365-01
net0: virtio=00:50:56:80:38:7c,bridge=vmbr0
numa: 0
onboot: 1
ostype: win11
sata1: proxmox-ssd:110/vm-110-disk-1.qcow2,size=150G
sata2: proxmox-ssd:110/vm-110-disk-2.qcow2,size=30G
scsihw: virtio-scsi-single
smbios1: uuid=4200bc4e-d8a8-f743-5389-a7a1231e4149
sockets: 2
startup: order=4
tags: veeam-m365
vmgenid: a9815932-bf22-4361-bf3e-d87568480ab5

Regarding the ballooning. Understood but the VM is also very very slow. And there is no reason to it tbh.
Other Windows VM's running on the same storage (TrueNAS with SSD's) Served storage over NFS with 10Gbit links are Fast.

Also checked the Windows Services.
Ballooning agent is running;

1754576707121.png
 
Last edited:
One last step is usually to switch the Windows installation over to VirtIO drivers instead of SATA. As mentioned in the Post migration steps here: https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE

For Windows to boot from a VirtIO SCSI disk, a bit of a dance is needed and explained here: https://pve.proxmox.com/wiki/Paravirtualized_Block_Drivers_for_Windows

If you end up in the bluescreen with the inaccessible boot device, you can switch everything back to SATA and everything should be fine. Don't forget to update the boot order in the options panel of the VM though!

Since you got two disks, you could skip the dummy disk, and switch the large data disk (the 150G?) to SCSI before doing the same with the smaller boot disk (30G?).
The SCSI controller is already set to what it should be.

Other disk options that you should set:
  • IO Threading -> helps to place IO handling of the virtual disks into separate threads
  • Discard -> to let the underlying storage know if parts of the disk image can be nulled. Since this is qcow2 on a network share, this might work or not. But overall a good setting to have, especially should the VM be migrated to another thin provisioned storage.
Try those things and check if the performance got better. There might be other reasons too that could be checked if it is still this bad and you see those "spikes" in the memory usage. Because I do think they are a symptom of the VM being slow, where the Ballooning Agent cannot respond in time.
 
  • Like
Reactions: kavaa
Thanks going to give that a go. At least I do have backups with PBS :-D

The 150GB disk is the C:\ drive in Windows.
And the 30GB drive is a scratch drive for when we need to upgrade software we put the installers in there.
 
The 150GB disk is the C:\ drive in Windows.
And the 30GB drive is a scratch drive for when we need to upgrade software we put the installers in there.
Ah okay, well then reverse my recommendation ;-). It is just important that Windows has seen a drive actually using the VirtIO SCSI driver before you can switch the boot drive over. Otherwise you will run into the inaccessible boot device blue screen. :)
 
  • Like
Reactions: kavaa
Hi!
That the Memory Usage spikes to 100% and goes back down.
The VM is super slow. Opening Task manager will take a long time.

There is 2 different problem:
- Memory spikes,
- VM is slow: Your CPU is downclocked ( running in PowerSave mode ) why do you expect speed/performance? , nothing new here. You need to change to CPU mode to "Performance Mode" in server BIOS. ( See the attached CPU graph cores in 999/1000mhz instead of base clock: 2600/3700mhz -> https://www.intel.com/content/www/u...sor-19-25m-cache-2-60-ghz/specifications.html
 
Hi!


There is 2 different problem:
- Memory spikes,
- VM is slow: Your CPU is downclocked ( running in PowerSave mode ) why do you expect speed/performance? , nothing new here. You need to change to CPU mode to "Performance Mode" in server BIOS. ( See the attached CPU graph cores in 999/1000mhz instead of base clock: 2600/3700mhz -> https://www.intel.com/content/www/u...sor-19-25m-cache-2-60-ghz/specifications.html
Yep, already a known thing. The Datacenter this server is in requires it. Something with environment / energy saving. Unless you have good reason to run Performance mode and can prove it, you can set. They do checks. So server will be moved datacenters asap.

Changing the disk type to VirtIO seems to work - i’ll keep monitoring it. If not i’ll come back to this topic.
For now performance is back to expected.
 
  • Like
Reactions: aaron