Proxmox VE rocks, except for this one thing, the memory usage graph.

Idea: what if the RAM indicator/graph was not available unless you have a guest agent enabled in the VM and the OS-specific probe into "available" memory was built into the agent?
 
We are all users, regardless of whether we pay a subscription, or not. Business users are not first-class citizens and homelaber's are not 2nd-class citizens. What some users do not offer in terms of monetary support to the Proxmox project, they make up in volumes in terms of patches being sent and documentation/content being created (as most of Proxmox's wide reach is indeed created by the vast amount of content created for it, mostly by homelab'ers).

I'm not sure about the exact numbers but as far I know about 90% of development is done by staff members of Proxmox Server Solutions GmbH. Now this could mean that it's simply to hard to get patches accepted. Then however it shouldn't be hard to start a fork with all community-patches included (OpenBSD started like that as a fork of NetBSD, X11.org or libreoffice would be other examples). Up to now there is not such a thing so I guess that there isn't actually a demand for it. But my point wasn't that typical homelabber requests are invalid because they don't pay for PVE (there are enough companys who are freeloaders too), my point was that it's not worth the effort to fix a problem which isn't a problem in a professional run environment since PVE design is for usage in corporate or government infrastructure. It's the same reason why I don't think that it would be a good idea to invest large times to reduce wear amplification on consumer SSDs except low-effort changes.

I hence wholeheartedly invite you to reconsider this logic. We are all users and we must be able to consolidate and litigate issues taking our combined interest into mind. Not only that this dichotomy of users unethical, it is also inaccurate, as many people started as homelab'ers before they upgraded their use to start making money off Proxmox.

I sincerly hope that they have a basic understanding how everything works before they are charging money for it. BTW: I'm a homelabber myself, I'm not even part of the virtualization team at my workplace (and they run VMware anyhow sadly). But I'm using Linux as my daily driver on my private and professional infrastucture since I got my first own computer two decades ago so I'm not surprised that ProxmoxVE works exactly the same way like any other Linux distribution I know. I don't want this to change to make live easier for non-Linux-savy homelabbers, sorry, not sorry. There is simply no such thing as "our combined interest" in that regard, why should my wishes less legitimate as any other non-customer of Proxmox Server Solutions GmbH?

I wholeheartedly agree with this definition of what free means. Regardless of what is right to do. Failing to engage and acknowledge this simple perspective to me seems to speak of a problem in our community, which is manifesting in this thread.

Well in my book it's a "problem of our community" that people don't di research before asking questions already answered several times before (often enough even on the same day or week), click-baity videos how to do not-supported stuff like attaching NFS or CIFS networks storages to PBS over WAN or Reddit recommentations for using "hellish scripts" to run docker inside LXC containers despite the Proxmox developers recommend against Docker inside lxc. With other words: I'm not a fan of the way the homelab community at reddit handles things and I like that this forums community strifes for handling things in a professional way. People who start with a homelab before earning their living profit from this "problem of our community" too, because they learn how things actually work and are doing them right instead of "It works somehow/mostly, no idea why".

Some people are clearly prepared to go to incredible lengths to convince everyone that since the VM is using a great part of this RAM for caching, and since not allowing it to use as much cache would effect its function, then this amount of cache is "used memory", and hence, it is "unavailable memory". We disagree. Many people disagree (https://www.linuxatemyram.com/). Please acknowledge that this is not a universal point of view.

Meh, LnxBil actually referenced that page to show, that your "valid concerns" are actually not very valid.

Now that we know that it is misleading, we will start ignoring it, in favour of an internal probing method, but then, what use is this RAM utilisation bar indicator? maybe the right solution would be to remove it entirely.

Like explained by @LnxBil and @BobhWasatch in the thread it's not wrong, it simply not what people expect because they have no previous background in Linux. Now there is no shame in not knowing how Linux modus operandi for RAM and caches works but "ESX/Windows are different thus ProxmoxVE/Linux should behave like them" isn't a good argument.

I did not know what ballooning is until I did some reading. As far as I understood, this is an optional feature that requires an agent running inside the VM which is disabled by default.

Yes because operating systems (especially Windows) don't have them installed by default. Non the less it's best practice is to install them and is something the admin needs to do anyway. The guest tools or ballooning are not a replacement for a monitoring software though. Even the ballooning function to free memory on the fly is more like last resort and shouldn't be taken as the correct approach to deal with over-assigned memory to vms.

Refusing to acknowledge the quite-valid perspective of many people here is a blocker of innovation because it is preventing us from thinking of simple ways to deal with this misleading UI problem. Indeed, there might be very simple fixes but to be able to figure them out, we need to agree that the current situation can benefit from some improvement.

There is indeed a simple fix: Running a monitoring software inside the VM like it's best practice in any professional context. And with that the admins would see that their VM has way more memory assigned than it actually needs and could use the next maintenance window to adjust the assignment appropriately. I won't repeat the arguments, why the "misleading overview" is actually not missleading if you know how it works.

And a monitoring software (together with the guest tools) would be needed to be installed anyway.
I also think that when someone's point of view is acknowledged, they feel more understood which leads to more constructive dialogue. At the end of the day, you might be paying a subscription and you might think that your use of Proxmox more legit but I doubt that you would be prepared to spend a week diving into Proxmox's source code to fix an issue, whereas many homelab'ers (hobby users) are prepared to do such a thing.

If my employer would expect me to fix a problem I would definitively do that and I guess other professionals too. If my boss however thinks that I should ask the support instead of spending my time because we pay for support , I will follow these commands. Actually (not with ProxmoxVE but some commercial software) I did both in the past, depending on the situation and whether support could help us or not.
 
Last edited:
Meh, LnxBil actually referenced that page to show, that your "valid concerns" are actually not very valid.

I do not have a concern. I am just trying to understand the situation for myself, and the conclusion I am reaching is that:

1. Some users understand that these gauges, when red, do not imply that the VM is short of memory.
2. Some users are misled by these gauges.

From a usability/support point of view, it is likely that new comers will always fall in category 2 before they move to category 1. In the process of transitioning, there will be noise, and with this noise, there will be overhead.

My 2 cents would be that the removal of the gauges is the best since it eliminates this confusion and does not seem to subtract any functionality. Those who know not to look, will not have to look, and those who are confused, won't be confused.
 
Thank you for this. It is true. Yet, from a user standpoint, it would make more sense to display the buffer cache as free. Failing to acknowledge this, in my opinion, can come across (and indeed does come across) as dismissing user input.
I think this would be misleading, whilst the cache can be reallocated as and when its needed, it isnt unutilised memory.
Perhaps what you want is a 3rd metric on the graph for available memory?
Also with things like memory fragmentation, memory paging, all of this RAM is not necessarily available for new virtual machines. For a while I ran proxmox without no swap, and then started getting OOM's whilst proxmox was reporting multiple gigs as unutilised. I fixed it now with a very small zram page, and also a somewhat larger low priority page area backed by SSD, although as of yet, the SSD part has never been utilised, its happy just allocating a few 10s of MB to the zram.
 
Last edited:
  • Like
Reactions: Johannes S
I think this would be misleading, whilst the cache can be reallocated as and when its needed, it isnt unutilised memory.
Perhaps what you want is a 3rd metric on the graph for available memory?
Also with things like memory fragmentation, memory paging, all of this RAM is not necessarily available for new virtual machines. For a while I ran proxmox without no swap, and then started getting OOM's whilst proxmox was reporting multiple gigs as unutilised. I fixed it now with a very small zram page, and also a somewhat larger low priority page area backed by SSD, although as of yet, the SSD part has never been utilised, its happy just allocating a few 10s of MB to the zram.

True. Available, not free. My bad.

It still fascinates me that someone had to make a dedicated page for this (https://www.linuxatemyram.com/). Still grateful for this link.
 
I think this would be misleading, whilst the cache can be reallocated as and when its needed, it isnt unutilised memory.
Once more, RAM that is owned by a VM is not "available" to the hypervisor. It is only "available" within the VM. Outside of the special case of KSM memory is not shared between VM's. If you want that, use a container.

Why is this so hard for people to grasp?
 
  • Like
Reactions: Johannes S
Once more, RAM that is owned by a VM is not "available" to the hypervisor. It is only "available" within the VM. Outside of the special case of KSM memory is not shared between VM's. If you want that, use a container.

Why is this so hard for people to grasp?
Umm, I never said VM memory was available.
 
  • Like
Reactions: Johannes S
Why is this so hard for people to grasp?
I really don't know. Maybe they're just used to the lying all other products do? It's like the ever-green management traffic lights ... never displaying red.

I think this would be misleading, whilst the cache can be reallocated as and when its needed, it isnt unutilised memory.
The problem is, that the hypervisor cannot reclaim it, the guest has to reclaim it and therefore there is the discrepancy between the internal-vm and hypervisor view. The guest will not free it properly (e.g. by writing zeros), because that would not make any sense for a OS on real hardware and is therefore not done. The data is just overwritten on the next write and that's it. That's my whole point. PVE will display the actual data used, because that is important to the hypervisor. If you've overcommitted the memory, you will end up with at least a slow system.

Unless the guest OS will implement some cache clearing mechanisms, the memory utilization will always just be wrong. Or it could just lie to make you feel better, like other implementation do.


Once more, RAM that is owned by a VM is not "available" to the hypervisor. It is only "available" within the VM. Outside of the special case of KSM memory is not shared between VM's. If you want that, use a container.
Also, KSM is ONLY available to qemu, because it uses a special type of allocation to get memory that is only implemented in QEMU (via madwise syscall). This will therefore not work with containers. VMs will scale better (with large numbers) due to KSM than containers, even if they have a much smaller footprint. For smaller stuff, containers are much better because the whole disk cache part is not part of the container memory, so you will see only the "actual" memory usage. It is even more true than without containers, because the cgroup memory jail will ensure that you won't share data with other containers (good or bad, depends on your viewpoint), whereas your default linux will share low level libraries among all programs and make it very hard to actually count the memory usage due to the sharing.


I do not have a concern. I am just trying to understand the situation for myself, and the conclusion I am reaching is that:

1. Some users understand that these gauges, when red, do not imply that the VM is short of memory.
2. Some users are misled by these gauges.

From a usability/support point of view, it is likely that new comers will always fall in category 2 before they move to category 1. In the process of transitioning, there will be noise, and with this noise, there will be overhead.
The problem is that it just depends on the guest os and its settings. In some use cases it just works as it should and in other it does not. You really have to know what you're dealing with in order to extract information out of this, e.g. after starting a VM, the footprint is very small, because memory is still empty, caches have not been filled and memory is not fragmented. In this instance, the graph is almost accurate (some caching has been done on boot). With time, it gets more and more inacurate unless you do some memory cleanup, compating, etc. Also tweaks like swapiness will influence how much cache is used, so it heavily depends on the guest and its settings.

For Windows, you have been able to change the view in PVE to reflect the Windows Taskmanager view. I don't know if this still works, yet that worked in the past. I don't know why this is not implemented with other guests, yet I guess that a lot of people - as the OP - wanted this and the QEMU people gave in. It's still wrong and all of that but some people were able to see what they wanted to see. Most other guest operating system will not have this, so you will end up with a multitute of different metrics displayed as the same, which is even worse than having it wrong ALL the time (so consitent).

The only important metric with respect to "do I have enough memory for this VM" is the swap-out metric. Unless you swap out (read from swap into memory) a lot, you will have enough memory. The question "do I have too much memory for this VM" is just iteratively solvable by using theswap-out metric and finding the memory sweet spot. This can only be done from within the guest. The hypervisor cannot distinguish this.

My 2 cents would be that the removal of the gauges is the best since it eliminates this confusion and does not seem to subtract any functionality. Those who know not to look, will not have to look, and those who are confused, won't be confused.
Yet you still need to look. It's not that there is no value in the gauge, it's just that it's not the value you might think. Imagine a VM is running for a month and only shows 50% usage. You can just lower the memory because it's like a high watermark value due to the caching.
 
  • Like
Reactions: Johannes S and UdoB
Yet you still need to look. It's not that there is no value in the gauge, it's just that it's not the value you might think. Imagine a VM is running for a month and only shows 50% usage. You can just lower the memory because it's like a high watermark value due to the caching.

I have not had that ever happen, even with the most underutilised VMs. The one I showed eating up 19GB was really not running anything at all.
 
I have not had that ever happen, even with the most underutilised VMs. The one I showed eating up 19GB was really not running anything at all.
The cache has to be filled by something, it does not magically fill itself up and "not running anything at all" does not matter, I thought we have explaint this in great detail already. The disk cache will fill up the remaining memory. Every file you ever read will go through the buffer cache (unless it's e.g. a O_DIRECT operation), so that every read you ever did ended up in your main memory and will be evicted if it has not enough space. In short: Unless you have more RAM than disk space, your memory will always be fully used by the buffer cache. That is exactly what https://www.linuxatemyram.com/ describes.
 
Unless you have more RAM than disk space, your memory will always be fully used by the buffer cache. That is exactly what https://www.linuxatemyram.com/ describes.

I agree, and since this is almost never the case (that the RAM is larger than the disk), then we are reaching the consensus that the current RAM gauge is not very useful as it stands today. More accurately, its usefulness relies heavily on the OS inside of the VM, and making it more useful requires additional information from an OS-specific agent inside the VM. Additionally, it is fair to say that the RAM gauge can be confusing or misleading for new users (when the OS uses memory for caching and the gauge becomes red in color). I also think that we agree that since Proxmox has permeated hobbyist circles (i.e. homelabers and others wanting to experiment with servers) more than any other virtualization platform, that the user experience and point of view of amateurs should be taken into account, without being the center point of focus of course, but should not be disregarded because "they are unable to understand something" you think they should be able to understand.

I think that we also agree that even though unused or available memory inside the VM cannot be used by other VMs, that the RAM gauge was originally intended to give the Proxmox manager some visibility as to whether the VM might need more RAM (to ensure that processes are not OOM killed). This is the central use case of this gauge so if it is no longer achieving this then it is not very useful. Also, if the information it gives can only be interpreted in light of which OS is inside the VM, then again, it is not very useful.

There are a lot of great points that have been made in this post but again, it is leading us down to the same conclusion in my opinion.

We may not agree about what should be done about the RAM gauge, but I think being able to agree on some things is very productive and opens the grounds for community solutions that do not cost Proxmox or you guys anything. I am one of those people who like to fix things out of interest and the desire to learn. So there might be a missed opportunity when people oppose the diagnosis because they think they may not like the proposed solution. It is not very productive in my opinion.
 
Last edited:
I'm not going to go down this rabbit-hole of the virtues of pro/against changes to RAM reporting in PVE, but maybe the graph bar in PVE could be graphically changed to something similar to htop's inside a VM:

1737634490591.png

Where the yellow-bar section designates available RAM being used as cache, so the administrator "gets an idea" of what the VM is doing with the RAM.
This would at least be most useful to stop numerous (duplicate) posts on these forums concerning HV/VM RAM consumption stats!
(The above would obviously be VM OS-dependent).
 
  • Like
Reactions: orwadira
I'm not going to go down this rabbit-hole of the virtues of pro/against changes to RAM reporting in PVE, but maybe the graph bar in PVE could be graphically changed to something similar to htop's inside a VM:

View attachment 81205

Where the yellow-bar section designates available RAM being used as cache, so the administrator "gets an idea" of what the VM is doing with the RAM.
This would at least be most useful to stop numerous (duplicate) posts on these forums concerning HV/VM RAM consumption stats!
(The above would obviously be VM OS-dependent).
Yes, this would be better, yet the hypervisor does not know of the actual usage like htop does and it does not take into account everything (e.g. ARC and hugepages), so that it'll also be not the jack of all trades views, yet better than before. You would need to have this view for every guest OS and if not available return to the current view. I think this is the main cause for no better display, because it would not be a general solution, yet only one that is harder to maintain.
 
Adding cache to the node screen is a good idea, but whats the point for guests? If you allocate e.g. 8 gigs of memory to a guest, then you should assume the full 8 gigs is not available for anything else, dont over commit RAM.
 
@BobhWasatch - Very good point and thanks for the clarification! Makes PERFECT sense from the Physical Host's point of view.
HOWEVER, from the VM's point of view, or more specifically from the POV of the person tasked with managing the VM, I do think it would be useful to "break out" the memory usage into buffer/cache and "real" usage.