Memory usage accuracy has never been accurate and I still cannot find why after many years

BloodyIron · Nov 15, 2024

I'm rocking PVE v8.2.7 in my cluster, and I've upgraded it in-place since v2.3, and in all this time (over a decade?) I have almost never found the "Memory usage" metric at the webGUI scope of VMs to be accurate.

I namely run Linux VMs, but even for Windows, FreeBSD, whatever, it's _never_ accurate.

I just built six new Ubuntu 24.04 VMs from scratch, and qemu-guest-agent is installed and started, but the numbers are still insanely off.

Here's just one example.

I'm looking at htop right now in one of the fresh nodes, it's doing literally nothing that isn't on by default in a fresh install.

PVE webGUI would have me believe it's using 6.45GB/8GB (RAM), except it's using ... 500MB/8GB. And yes, that's accounting for Linux kernel caching.

I genuinely do not understand how the PVE environment gets these metrics so egregiously off the mark. It's been like that for VMs I've been running and upgrading for many years, in addition to brand-spanking-new-sparkly-VMs. And I think this really warrants discussion here because I'm pretty sure this has been bug reported plenty of times before on Proxmox's Bugzilla. I'd rather not open yet another bug report that won't get this problem sorted.

Can we PLEASE get accurate "Memory usage" metrics in the webGUI already? What's it going to take?

leesteken · Nov 15, 2024

Is the Ballooning Device enabled in the Ubuntu VMs configuration (under Hardware > Memory > Advanced)?

LnxBil · Nov 15, 2024

The memory usage has been discussed so many times on the forums. Simple answer: your guest is lying to you. The data is coming from QEMU and it knowns what memory is or has been used. It's the same logic that applies to storage, on which you need to move data to the front of the virtual disk and overwrite (or trim) the unused data in order to get it really shown as free. With RAM it's the same. Fragmentation is a big problem with any memory storage - RAM or disk. The memory is not always mapped from the host to guest in a simple 4k block fashion 1:1, which would be VERY inefficient, therefore larger chunks are mapped from the hypervisor to the guest and if the guest is currently only using e.g. 1 block of a block of 128 that has been mapped, the guest sees 1 block used and the hypervisor 128. If you want to go deeping in memory fragmentation, read about slab allocation.

If you consider the actual memory usage that your hypervisor really uses, this is even larger and more complicated than the reported numbers due to virtualized hardware and normal process overhead for qemu. There is the gpu memory that adds to the total amount and also disk cache (if applicable) and buffers. And finaly, there are also the cornercases like ballooning and passthrough, which are special and mess up the memory consumption even more. And to make it even more complicated, QEMU itself uses also a lot of libraries, which are shared and count towards the overall memory allocation, but only once, because they are shared read only.

In the end it boils down to what you would do with a more correct interpretation of the memory usage inside of the guest, there is no information actually visible anywhere how much memory is actually used on the host, which can be 10-25% more (in my experiments) than the maximum configured memory of the VM. This would however be important to know in order to size your VMs properly. You currently end up monitoring the swap rate of all VMs and hypervisors, and monitor your overall memory usage on the hypervisor itself.

BloodyIron · Nov 15, 2024

leesteken said:
Is the Ballooning Device enabled in the Ubuntu VMs configuration (under Hardware > Memory > Advanced)?

I've intentionally turned Ballooning off for all RAM for all VMs because it has caused significant performance issues.

But this problem also persisted when Ballooning was on for some of them.

BloodyIron · Nov 15, 2024

LnxBil said:
The memory usage has been discussed so many times on the forums. Simple answer: your guest is lying to you. The data is coming from QEMU and it knowns what memory is or has been used. It's the same logic that applies to storage, on which you need to move data to the front of the virtual disk and overwrite (or trim) the unused data in order to get it really shown as free. With RAM it's the same. Fragmentation is a big problem with any memory storage - RAM or disk. The memory is not always mapped from the host to guest in a simple 4k block fashion 1:1, which would be VERY inefficient, therefore larger chunks are mapped from the hypervisor to the guest and if the guest is currently only using e.g. 1 block of a block of 128 that has been mapped, the guest sees 1 block used and the hypervisor 128. If you want to go deeping in memory fragmentation, read about slab allocation.

If you consider the actual memory usage that your hypervisor really uses, this is even larger and more complicated than the reported numbers due to virtualized hardware and normal process overhead for qemu. There is the gpu memory that adds to the total amount and also disk cache (if applicable) and buffers. And finaly, there are also the cornercases like ballooning and passthrough, which are special and mess up the memory consumption even more. And to make it even more complicated, QEMU itself uses also a lot of libraries, which are shared and count towards the overall memory allocation, but only once, because they are shared read only.

In the end it boils down to what you would do with a more correct interpretation of the memory usage inside of the guest, there is no information actually visible anywhere how much memory is actually used on the host, which can be 10-25% more (in my experiments) than the maximum configured memory of the VM. This would however be important to know in order to size your VMs properly. You currently end up monitoring the swap rate of all VMs and hypervisors, and monitor your overall memory usage on the hypervisor itself.

The example VMs are fresh VMs, there should not be such a huge discrepancy. 500MB used vs 6.5GB used...

Yes, I know this has been discussed to death, clearly the problem persists, and as such it defeats the whole point of even having this metric presented. And I was already accounting for disk cache, I literally mentioned that above. These VMs have only been "up" for maybe an hour or two before seeing this discrepancy, they are idyllic examples of this problem.

Clearly the problem still needs an answer and not to be just kicked down the road like a can.

Again, 10-25% increase does NOT explain the huge gap between 500MB and 6.5GB... Also, these VMs have swap fully turned off (as they are going to be kubernetes nodes)

Let's stop making excuses and start figuring out actual solutions here.

leesteken · Nov 15, 2024

BloodyIron said:
I've intentionally turned Ballooning off for all RAM for all VMs because it has caused significant performance issues.

There is a difference between enabling ballooning (by setting the Minimum to less than the Memory) and only adding the Ballooning Device. I do believe the latter is necessary to see that the OS inside the VM reports as used (which is different from what Proxmox from the outside detects by how much memory is touched).

BloodyIron said:
But this problem also persisted when Ballooning was on for some of them.

Then I don't know. I assumed the Ballooning Device allowed for reporting the inside memory, but I now see that it is not exactly the same as free -m (total memory - free) but it matches more closely than yours. It thereforr does not bother me much. Maybe management/data-collecting/graphing tools will give you better reports (from the inside)?

RolandK · Nov 15, 2024

furthermore - memory usage on the host level is completerly inaccurate too, because it has never been highly prioritized to improve memory display.

that is especially true, if you use local ZFS storage (since ZFS ARC cache needs extra handling)

https://bugzilla.proxmox.com/show_bug.cgi?id=1454

i'm currently consoldiating a serverfarm nobody dared to touch till know "because memory usage is so high already". apparently, half of the memory is for arc/cache.

i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.

BloodyIron · Nov 15, 2024

leesteken said:
There is a difference between enabling ballooning (by setting the Minimum to less than the Memory) and only adding the Ballooning Device. I do believe the latter is necessary to see that the OS inside the VM reports as used (which is different from what Proxmox from the outside detects by how much memory is touched).

Then I don't know. I assumed the Ballooning Device allowed for reporting the inside memory, but I now see that it is not exactly the same as free -m (total memory - free) but it matches more closely than yours. It thereforr does not bother me much. Maybe management/data-collecting/graphing tools will give you better reports (from the inside)?

The performance problems I saw with Ballooning is so substantial that I'm never going to re-enable that, but certainly worth raising as a relevant aspect.

BloodyIron · Nov 15, 2024

RolandK said:
furthermore - memory usage on the host level is completerly inaccurate too, because it has never been highly prioritized to improve memory display.

that is especially true, if you use local ZFS storage (since ZFS ARC cache needs extra handling)

https://bugzilla.proxmox.com/show_bug.cgi?id=1454

i'm currently consoldiating a serverfarm nobody dared to touch till know "because memory usage is so high already". apparently, half of the memory is for arc/cache.

i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.

In my case I don't have storage on any of the PVE Nodes, it's over NFS on a dedicated NAS.

RolandK · Nov 15, 2024

memory ballooning

BloodyIron said:
The performance problems I saw with Ballooning is so substantial that I'm never going to re-enable that, but certainly worth raising as a relevant aspect.

ballooning also sometimes kicks in when there is no need for it, e.g. https://bugzilla.proxmox.com/show_bug.cgi?id=3859

BloodyIron · Nov 15, 2024

RolandK said:
memory ballooning

ballooning also sometimes kicks in when there is no need for it, e.g. https://bugzilla.proxmox.com/show_bug.cgi?id=3859

Yes Memory Ballooning... thought that was pretty obvious.

LnxBil · Nov 15, 2024

BloodyIron said:
Yes, I know this has been discussed to death, clearly the problem persists, and as such it defeats the whole point of even having this metric presented.

I concur. This metric is totally useless.

BloodyIron said:
I was already accounting for disk cache, I literally mentioned that above.

Yes, in your VM, but it is also used for QEMU if you set the disk mode to any cache level and that is what I've meant. That counts towards the actual memory usage that is not even displayed yet still there.

RolandK said:
i bet we could save some thousands of needless servers in this world if proxmox would improve their memory usage display.

Nice thought

BloodyIron · Nov 15, 2024

LnxBil said:
I concur. This metric is totally useless.

Yes, in your VM, but it is also used for QEMU if you set the disk mode to any cache level and that is what I've meant. That counts towards the actual memory usage that is not even displayed yet still there.

Nice thought

Well I'm rocking the default cache setting of no cache. I'm not a fan of losing data in-flight

Search

Search

Memory usage accuracy has never been accurate and I still cannot find why after many years

BloodyIron

Renowned Member

leesteken

Distinguished Member

LnxBil

Distinguished Member

BloodyIron

Renowned Member

BloodyIron

Renowned Member

leesteken

Distinguished Member

RolandK

Famous Member

BloodyIron

Renowned Member

BloodyIron

Renowned Member

RolandK

Famous Member

BloodyIron

Renowned Member

LnxBil

Distinguished Member

BloodyIron

Renowned Member

We value your privacy