Can PVE ignore the cached RAM of a VM in it's usage calculation?

Aug 30, 2023
87
12
8
Luxembourg
Since migrating all my VMs from an old Hyper-V to PVE v8 I've had very little troubles which is great, but there is one thing I cannot seem to get sorted.

I have one VM, running Fedora Server 38 that hosts a few containers and a MariaDB server, where PVE over time alerts me to memory use being over 90%. I've tried various ways to mitigate the issue e.g. reducing the caching of MariaDB, but the issue persists. I start the VM and for hours the memory use is around 4GB of 10GB, but then according to PVE it rises back up again to what you see below:

1694759721773.png
1694759772017.png

1694759807254.png

1694760205210.png

If the swap was under heavy use then I would be concerned, but it's not, so PVE telling me there's a problem is wrong (I use Zabbix to monitor my servers including PVE, and that's where I get the alerts from). My choices so far have been to reboot, use the monitor to lower then raise the max balloon size, or force a cache cleanup on the server. I know I could "adjust" the trigger on the PVE checks, but that just hides the issue and would rather PVE was reporting like everything else.

I never had do this when the VM was running on Hyper-V with RAM usage always reporting the non-cached amount.
 
Please read this thread why it's displayed the way it is.

If the swap was under heavy use then I would be concerned, but it's not, so PVE telling me there's a problem is wrong (I use Zabbix to monitor my servers including PVE, and that's where I get the alerts from). My choices so far have been to reboot, use the monitor to lower then raise the max balloon size, or force a cache cleanup on the server.
Please try to understand that this is NORMAL and nothing wrong. You WANT to have all of your RAM used.
 
Please read this thread why it's displayed the way it is.


Please try to understand that this is NORMAL and nothing wrong. You WANT to have all of your RAM used.

I'm perfectly aware that we want to use all the RAM, I was glad to see many years ago Windows finally adopting this attitude, but cached is not the same as in use, it's transient and can be used at any time by processes that actually use it, being relinquished by the cache as a priority.

Yes, seems to be normal for PVE, but as I said in Hyper-V it's not and nor is for VMware which is what I manage in my day job.

1694766835064.png

1694766726365.png
 

Attachments

  • 1694766705871.png
    1694766705871.png
    26.7 KB · Views: 3
Do those two images correlate? VMware shows 1 GB used, the VM itself 5 GB used.
It shows 1Gb active, not used, aka it is excluding the cache as that is transient. This seems to be a philosophy difference in what is the important metric. The OP is arguing there is no point alerting memory is 90% when the bulk is cache. It should only alert when there is true memory exhaustion. This is the way hyper-v and VMware do it. Reading the thread you posted it sounds like this would be an upstream change in the guest tools?
 
Or you just monitor the Guest with Zabbix, where you get RAM-Usage and Cache separate and disable that trigger from PVE-Side-View.....
 
  • Like
Reactions: UdoB
Or you just monitor the Guest with Zabbix,
Indeed, i don't have a horse in the race. Just articulating what I believe the OP is getting at.

For my HomeLab i have been wondering what i should use, sounds like you favor zabbix over things like ceckmk etc. Is it easy to get started with it?

--edit-- scratch that, i remember this being a ridiculous number of containers, with crappy compose examples, and an insane number of mount points. So I moved on... if there is an easy button version i would be interested
 
Last edited:
It shows 1Gb active, not used, aka it is excluding the cache as that is transient.
The cache is already excluded in the 5 GBs, the cache filled the whole ram up to 20 GB (as seen in the htop output, including swapping), so the VMware 1 GB makes absolutely no sense.

This seems to be a philosophy difference in what is the important metric. The OP is arguing there is no point alerting memory is 90% when the bulk is cache. It should only alert when there is true memory exhaustion. This is the way hyper-v and VMware do it.
A hypervisor should not do alerting, and PVE does not. Monitoring should always be extern.

I really don't like to display this "memory value" at all, because even the one that is shown, is too low (was seen in the other threads). You have more RAM like for GPU and other device virtualization, the KVM process overhead, etc. that can blow your VM up to 110-115% depending on your configured memory. On the other side, you have KSM that potentially deduplcates it again, so you're totally lost. So, it can only be wrong, the question is only what kind of wrong you prefer.

IMHO: The important metric for a hypervisor is the ACTUAL used memory. That's what is relevant, not was the guest thinks it uses. This is the memory that runs out first. Imagine you have 10 Windows VMs with such "fake numbers" and think, yeah, I can add 10 more because they only use 10% of what I configured, but you get OOMs or heavy swapping by adding just 2 because you compared the wrong numbers. As an hypervisor administrator, I just want to see how much space I can spare for new VMs without overprovisioning it. If you're the actual service or VM admin, I can see that you want to have your numbers, that you would also have inside of your VM guest windows taskmanager.

Reading the thread you posted it sounds like this would be an upstream change in the guest tools?
The guest tools are around for decades and that this has not happend yet is a good indicator that it never will. People like me (and others that answered) are not interesed in the "lies" (harsch, I know ;) ) VMware and Hyper-V are telling about the memory usage. Only because they do this, why should other hypervisors do this? For Windows, you need(ed?) to install this strange service that it'll report the memory value windows want to see. I never understood this.
 
  • Like
Reactions: Dunuin
what's your preferred poison when it comes to monitoring tools?
Proxmox comes with an integrated and pre-prepared "outlet" for a metric server. --> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#external_metric_server

The obvious approach is to run an InfluxDB and a Grafana instance. (Off-cluster if possible.) There might be easier solutions than this; I run it just because Dashboards like this --> https://grafana.com/grafana/dashboards/19119-proxmox-ve-cluster-flux/ are both beautiful and helpful.

Most other solution (I am using Zabbix) require modifications on each node by installing an "agent" or at least activation/configuration of SNMP.


Probably all of this is overkill for just keeping an eye on limits like Temperature too high / Disk full / Sysload skyrocketing...

Have fun!
 
  • Like
Reactions: scyto
Do those two images correlate? VMware shows 1 GB used, the VM itself 5 GB used.
Yeah, they normally do just that VMware tends to be very dynamic, often to it's detriment e.g. when rebooting a VM and VMware tools is not running our alerting kicks off far too often and it's quite difficult to fine tune. Just wish that it knew the difference between "dead" and "not running because of a reboot, so it will back soon".

But back to the original topic, thanks all for the feedback, I suppose as someone coming from two environments where because these alerts rarely happen to where it happens all the time, I need to adjust the monitoring. Giving the VM more RAM won't help as it will simply use it all up - which I tried by going from 8GB to 10GB. I don't want to simply stop monitoring the memory usage of each VM through PVE, so I will probably change in Zabbix to warn at 98% and critical at 99% - see how that goes and hopefully one less alert to worry about.
 
  • Like
Reactions: scyto

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!