Proxmox host randomly start incorrectly reporting 90% ram usage in windows vm

Jsingh

Well-Known Member
Oct 23, 2018
46
3
48
I have two windows 10 vms on a proxmox 8.4 host with exactly the same specs.
CPU: 10 cores host
Ram: 24 GB
Balloon : 8 Gb
virtio io drivers installed
connectx-4 lx virtual device passthrough
Tesla P40 virtual device passthrough
Disk options:
Virtio scsi single : no cache, ssd emulation on
OVMF bios

Randomly, the proxmox host starts reporting 90% ram usage in one of them, even though windows reports about 3 GB ram usage.
I tried updating the virtio drivers , removing the pcie passthrough, yet nothing has worked.
They both were using 248 driver, I updated to the 285 on the problematic vm but still facing the same issue

I have noticed this happening on a number of windows vms (10 , 11 , server 2022) randomly. The only solution I've found is to remove the vm entirely and create a new or an os clean install (I think)

My system spec:
Amd epyc 7531
256 gb ram
Zpool for vm storage.

Any pointers how to sort his out is going to great, since I can't do a clean install on certain important VM's

On the problematic VM:

Code:
qm monitor 301

qm> info balloon
balloon: actual=24576 max_mem=24576 last_update=1765571446
qm>


On the good VM
Code:
qm monitor 305
qm> info balloon
balloon: actual=24576 max_mem=24576 total_mem=24554 free_mem=18794 mem_swapped_in=4888313856 mem_swapped_out=0 major_page_faults=223759 minor_page_faults=43613791 last_update=1765597543
qm>

UPDATE:

SOLUTION:
I could not find this anywhere on the forums, so I am writing the solution here

If there is a spec change on the mahcine, ( host processor change , disabling numa... ) can result in windows throwing PerfOS erros . This can be seen in the windows event viewer. This error is thrown when the balloon driver is unable to collect the physical memory utilization data. I have shared a screen shot of the issue

The solution worked as shown on the link below

Solution 1 (This worked for me):


A possible solution might be to rebuild the performance counter via lodctr (run inside a “run as admin” CMD):

lodctr /R

If this caused the error message:

Error: Unable to rebuild performance counter setting from system backup store, error code is 2

run it a 2nd time until it shows:

Info: Successfully rebuilt performance counter setting from system backup store

Solution 2:

Another possible solution might be to check the page file (it shouldn´t be fully disabled) on the affected server and re-adjust that if needed.


Sources and reference:
https://www.admin-enclave.com/compu...-of-the-data-section-contains-the-status-code
https://social.technet.microsoft.co...0e4cf/2012-r2-numa-warning?forum=winservergen

1765599326972.png
 
Last edited:
  • Like
Reactions: Onslow
A VM will use memory as it needs. The VM knows how much it is currently using. The host knows the peak ever allocated because it doesn’t get that back. Except via ballooning. Or shutdown/“power off”…I don’t actually remember if a VM restart is sufficient as we don’t overallocate RAM.

Ballooning (still not recommended on Windows for performance reasons AFAIK) takes effect when the host reaches 80% RAM usage by default.
 
  • Like
Reactions: Johannes S