Memory usage accuracy has never been accurate and I still cannot find why after many years

BloodyIron

Renowned Member
Jan 14, 2013
302
27
93
it.lanified.com
I'm rocking PVE v8.2.7 in my cluster, and I've upgraded it in-place since v2.3, and in all this time (over a decade?) I have almost never found the "Memory usage" metric at the webGUI scope of VMs to be accurate.

I namely run Linux VMs, but even for Windows, FreeBSD, whatever, it's _never_ accurate.

I just built six new Ubuntu 24.04 VMs from scratch, and qemu-guest-agent is installed and started, but the numbers are still insanely off.

Here's just one example.

I'm looking at htop right now in one of the fresh nodes, it's doing literally nothing that isn't on by default in a fresh install.

PVE webGUI would have me believe it's using 6.45GB/8GB (RAM), except it's using ... 500MB/8GB. And yes, that's accounting for Linux kernel caching.

I genuinely do not understand how the PVE environment gets these metrics so egregiously off the mark. It's been like that for VMs I've been running and upgrading for many years, in addition to brand-spanking-new-sparkly-VMs. And I think this really warrants discussion here because I'm pretty sure this has been bug reported plenty of times before on Proxmox's Bugzilla. I'd rather not open yet another bug report that won't get this problem sorted.

Can we PLEASE get accurate "Memory usage" metrics in the webGUI already? What's it going to take?
 
  • Like
Reactions: RolandK
The memory usage has been discussed so many times on the forums. Simple answer: your guest is lying to you. The data is coming from QEMU and it knowns what memory is or has been used. It's the same logic that applies to storage, on which you need to move data to the front of the virtual disk and overwrite (or trim) the unused data in order to get it really shown as free. With RAM it's the same. Fragmentation is a big problem with any memory storage - RAM or disk. The memory is not always mapped from the host to guest in a simple 4k block fashion 1:1, which would be VERY inefficient, therefore larger chunks are mapped from the hypervisor to the guest and if the guest is currently only using e.g. 1 block of a block of 128 that has been mapped, the guest sees 1 block used and the hypervisor 128. If you want to go deeping in memory fragmentation, read about slab allocation.

If you consider the actual memory usage that your hypervisor really uses, this is even larger and more complicated than the reported numbers due to virtualized hardware and normal process overhead for qemu. There is the gpu memory that adds to the total amount and also disk cache (if applicable) and buffers. And finaly, there are also the cornercases like ballooning and passthrough, which are special and mess up the memory consumption even more. And to make it even more complicated, QEMU itself uses also a lot of libraries, which are shared and count towards the overall memory allocation, but only once, because they are shared read only.

In the end it boils down to what you would do with a more correct interpretation of the memory usage inside of the guest, there is no information actually visible anywhere how much memory is actually used on the host, which can be 10-25% more (in my experiments) than the maximum configured memory of the VM. This would however be important to know in order to size your VMs properly. You currently end up monitoring the swap rate of all VMs and hypervisors, and monitor your overall memory usage on the hypervisor itself.
 
  • Like
Reactions: cheiss
Is the Ballooning Device enabled in the Ubuntu VMs configuration (under Hardware > Memory > Advanced)?
I've intentionally turned Ballooning off for all RAM for all VMs because it has caused significant performance issues.

But this problem also persisted when Ballooning was on for some of them.
 
The memory usage has been discussed so many times on the forums. Simple answer: your guest is lying to you. The data is coming from QEMU and it knowns what memory is or has been used. It's the same logic that applies to storage, on which you need to move data to the front of the virtual disk and overwrite (or trim) the unused data in order to get it really shown as free. With RAM it's the same. Fragmentation is a big problem with any memory storage - RAM or disk. The memory is not always mapped from the host to guest in a simple 4k block fashion 1:1, which would be VERY inefficient, therefore larger chunks are mapped from the hypervisor to the guest and if the guest is currently only using e.g. 1 block of a block of 128 that has been mapped, the guest sees 1 block used and the hypervisor 128. If you want to go deeping in memory fragmentation, read about slab allocation.

If you consider the actual memory usage that your hypervisor really uses, this is even larger and more complicated than the reported numbers due to virtualized hardware and normal process overhead for qemu. There is the gpu memory that adds to the total amount and also disk cache (if applicable) and buffers. And finaly, there are also the cornercases like ballooning and passthrough, which are special and mess up the memory consumption even more. And to make it even more complicated, QEMU itself uses also a lot of libraries, which are shared and count towards the overall memory allocation, but only once, because they are shared read only.

In the end it boils down to what you would do with a more correct interpretation of the memory usage inside of the guest, there is no information actually visible anywhere how much memory is actually used on the host, which can be 10-25% more (in my experiments) than the maximum configured memory of the VM. This would however be important to know in order to size your VMs properly. You currently end up monitoring the swap rate of all VMs and hypervisors, and monitor your overall memory usage on the hypervisor itself.

The example VMs are fresh VMs, there should not be such a huge discrepancy. 500MB used vs 6.5GB used...

Yes, I know this has been discussed to death, clearly the problem persists, and as such it defeats the whole point of even having this metric presented. And I was already accounting for disk cache, I literally mentioned that above. These VMs have only been "up" for maybe an hour or two before seeing this discrepancy, they are idyllic examples of this problem.

Clearly the problem still needs an answer and not to be just kicked down the road like a can.

Again, 10-25% increase does NOT explain the huge gap between 500MB and 6.5GB... Also, these VMs have swap fully turned off (as they are going to be kubernetes nodes)

Let's stop making excuses and start figuring out actual solutions here.
 
I've intentionally turned Ballooning off for all RAM for all VMs because it has caused significant performance issues.
There is a difference between enabling ballooning (by setting the Minimum to less than the Memory) and only adding the Ballooning Device. I do believe the latter is necessary to see that the OS inside the VM reports as used (which is different from what Proxmox from the outside detects by how much memory is touched).
But this problem also persisted when Ballooning was on for some of them.
Then I don't know. I assumed the Ballooning Device allowed for reporting the inside memory, but I now see that it is not exactly the same as free -m (total memory - free) but it matches more closely than yours. It thereforr does not bother me much. Maybe management/data-collecting/graphing tools will give you better reports (from the inside)?
 
Last edited:
furthermore - memory usage on the host level is completerly inaccurate too, because it has never been highly prioritized to improve memory display.

that is especially true, if you use local ZFS storage (since ZFS ARC cache needs extra handling)

https://bugzilla.proxmox.com/show_bug.cgi?id=1454

i'm currently consoldiating a serverfarm nobody dared to touch till know "because memory usage is so high already". apparently, half of the memory is for arc/cache.

i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.
 
Last edited:
  • Like
Reactions: UdoB
There is a difference between enabling ballooning (by setting the Minimum to less than the Memory) and only adding the Ballooning Device. I do believe the latter is necessary to see that the OS inside the VM reports as used (which is different from what Proxmox from the outside detects by how much memory is touched).

Then I don't know. I assumed the Ballooning Device allowed for reporting the inside memory, but I now see that it is not exactly the same as free -m (total memory - free) but it matches more closely than yours. It thereforr does not bother me much. Maybe management/data-collecting/graphing tools will give you better reports (from the inside)?

The performance problems I saw with Ballooning is so substantial that I'm never going to re-enable that, but certainly worth raising as a relevant aspect.
 
furthermore - memory usage on the host level is completerly inaccurate too, because it has never been highly prioritized to improve memory display.

that is especially true, if you use local ZFS storage (since ZFS ARC cache needs extra handling)

https://bugzilla.proxmox.com/show_bug.cgi?id=1454

i'm currently consoldiating a serverfarm nobody dared to touch till know "because memory usage is so high already". apparently, half of the memory is for arc/cache.

i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.

In my case I don't have storage on any of the PVE Nodes, it's over NFS on a dedicated NAS.
 
Yes, I know this has been discussed to death, clearly the problem persists, and as such it defeats the whole point of even having this metric presented.
I concur. This metric is totally useless.


I was already accounting for disk cache, I literally mentioned that above.
Yes, in your VM, but it is also used for QEMU if you set the disk mode to any cache level and that is what I've meant. That counts towards the actual memory usage that is not even displayed yet still there.


i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.
Nice thought ;)
 
I concur. This metric is totally useless.



Yes, in your VM, but it is also used for QEMU if you set the disk mode to any cache level and that is what I've meant. That counts towards the actual memory usage that is not even displayed yet still there.



Nice thought ;)

Well I'm rocking the default cache setting of no cache. I'm not a fan of losing data in-flight ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!