ballooning with windows guest issues

even accounting cache
PVE host summary display RAM usage accounting allocated RAM instead used RAM reported by qemu-agent.
Screenshots attached RAM used within Windows guest.
 

Attachments

  • 2022-07-28 23_30_46-Greenshot.png
    2022-07-28 23_30_46-Greenshot.png
    17.4 KB · Views: 22
  • 2022-07-28 23_28_33-Greenshot.png
    2022-07-28 23_28_33-Greenshot.png
    14.1 KB · Views: 21
Last edited:
You can check what that guest agent reports by running qm monitor <VMID> on the host and then typing in info balloon.
This will show you what the guest agent is reporting. Result should look like this for a Windows VM:
Code:
balloon: actual=16324 max_mem=16384 total_mem=15982 free_mem=10239 mem_swapped_in=0 mem_swapped_out=0 major_page_faults=1694 minor_page_faults=5590676 last_update=1632407624
 
Windows, even empty VM, ends up with MAX ram consumed from the host, while VM graph reporting what task manager is reporting, which is minimal.
 
for vm 3021 :
balloon: actual=4096 max_mem=4096 total_mem=4095 free_mem=3234 mem_swapped_in=782090240 mem_swapped_out=0 major_page_faults=34122 minor_page_faults=3254111
 
Last edited:
So Win reports 3234MB RAM free and 4095MB total, so PVE should show 4095-3234=861MB used which out knowing what the RAM is acutually used for by the guest.
 
used RAM of each VM is correctly reported in PVE
the problem is the total RAM usage reported in HOST/node section RAM Usage summary, which account allocated RAM instead used RAM.
 
Last edited:
Yes, but PVE will show in the nodes summary how much of the physical RAM is actually used. And this might be way higher than the sum of what all VMs are reporting, because WinVM always report wrong numbers. If a WinVM reports 10GB used and 22GB free but windows is actually using 10GB for processes and 20GB for caching, then atleast 30GB of physical RAM will be used by the KVM process running that Win VM. So10GB used RAM is wrong from the hosts point of view where it is only important hwo much of physical RAM is used. Thats why I asked for caching.

And then there is the point that without ballooning stealing RAM from the VM the RAM that the KVM process will use won't shrink over time. Lets say I got a 32GB RAM VM that only needs 4GB of RAM. KVM process will be 4.5GB or something like that. Then I start a workload inside the VM and RAM usage inside the guest will climb up from 4 to 20 GB. The KVM process will also use more RAM and maybe will need 22GB. Now the workload inside the VM stops and the guests RAM usage drop from 20GB to 4GB again. But the RAM usage of the KVM process will not drop, it will stay at 22GB RAM usage until you restart the VM, even if the guest isn't using all that RAM.
If you want to get that physical RAM back ballooning would have to kick in and steal the RAM from the VM and give it back to the host so it can be used for something else. And that will only happen if you got your Min RAM allocated is less then your max RAM allocated with ballooning enabled and the nodes total RAM usage over 80%.
 
Still very unclear how come each VM showing one utilization, but host much more. And I noticed there is a spike during windows VM startup so according to last comment it will never give memory back until ballooning will take it away?
 
the problem/question is why HOST RAM usage isn't the sum of VM RAM usage reported by qemu-agent ?
qemu-agent in WinVM report wrong numbers ? if cached ram need to be accounting , why qemu-agent doesn't do ?

(sorry for my english, I don't want be agressive but I havn't the corrects words :()
 
  • Like
Reactions: smelikov
First the VM can't know what the actual RAM usage of the VM is, because it is isolated from the host. It can only calculate with the RAM it can see and that isn't all RAM the VM is using. You always get virtualization overhead. The KVM process running the VM itselfs needs RAM, writeback caching needs RAM and so on. All the VMs OS is seeing is virtual RAM, not real physical RAM. So no matter what a guest is reporting, its never the real RAM usage.

And then there is the question what you call free or used RAM. See https://linuxatemyram.com for an example.
Referring to that example, the guest agent in a Win VM will report "available" as free RAM. The guest agent in a Linux VM will report "free" RAM as free RAM.
So Win VMs are just wrong from the viewpoint of the hypervisor which only cares how much physical RAM is either free or used. It won't care how the used RAM is used. RAM used by the guest for caching is as bad as RAM used by processes, as the host can't directly drop caches of a guest. So for the PVE host it isn't useful to know how much RAM is "available".

And then there is the problem that freeing up virtual RAM in a VM doesn't mean that this will also free up physical RAM. Thats what I tried to explain, that the KVM process still reverves the physical RAM and won't free it, even if that virtual RAM isn't in use by the guest OS anymore.

I also would like to hear why the KVM process isn't freeing up physical RAM after that virtual RAM is free again. Atlest this is the behavior that I'm observing for all kinds of VMs since PVE 6.X till now. No VM, neither Linux, Win or FreeBSD ever gave RAM back to the host witout restarting the VM or using ballooning.
Maybe someone of the staff can explain it.
 
Last edited:
  • Like
Reactions: smelikov
Finally an update to this, I only have some 2016 servers that oddly aren't affected and the rest have upgraded to 2022 and the leaks are gone.

With 2019 the leaks would constantly reappear every couple of weeks and were terribly annoying. Looks like MS was the fix on this one!
 
So, just experienced a memory leak on a 2022 server on an AMD CPU.

I installed the most recent baloon driver *.225 and it immediately corrected the leak, even without a reboot.

@_gabriel
VM Disk settings are all similar to this for the 2022 VMS. Write back
1669812795234.png
 
Are you sure that is a memory leak and not just page file caching? When ballooning helps and no reboot is required, then it sounds more like normal caching behaviour.
 
Last edited:
It's definitely a memory leak. If you read through this thread in detail, you'll see the full symptom report -- meaning that if not corrected and memory pushes over 100 %, the machine crashes. Windows itself will report that it is using 95% of ram, but if you add all values listed in the task manager list they'll equate to ~40 to 50 percent only. Nonetheless, it's finally getting much better than it was. I'll update if I see it recur again.
 
We are back to a memory leak on windows vms on AMD hosts.

I have the latest virtio drivers and have updated to QEMU 9.0 implementation, but the leak persists. There had never been leak on this server until a recent kernel update.

CPU(s)

24 x AMD Ryzen 9 5900X 12-Core Processor (1 Socket)
Kernel Version

Linux 6.8.4-3-pve (2024-05-02T11:55Z)
Boot Mode

EFI
Manager Version

pve-manager/8.2.2/9355359cd7afbae4

I can confirm that my Intel XEON servers are not experiencing the same memory leaks.

Thoughts?
 
All of the same symptoms as before, no need to repost. Just review the beginning of this thread. Hoping it is resolved in the next QEMU or Proxmox update. I've shifted almost all windows VMS off of AMD Proxmox hosts and set the remaining machines to no longer balloon as a temporary measure.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!