High VM Memory Usage On Newer PVE Versions?

phil2987

Member
Mar 17, 2023
21
0
6
I manage a decent-sized Proxmox private cloud for a client consisting of two sites:

Site A (Prod):
PVE 7.4-3
250 VMs (Windows 10, 6GB RAM allocated)
3x Dell R630 w/768GB RAM each

Site B (DR):
PVE 8.3.4 (latest as of Mar 2025)
0 VMs
3x Dell R630 w/768GB RAM each

Recently, we migrated all VM's from Site A to Site B for a DR exercise using Proxmox Backup Server and it worked very well (aside from some slowness with PBS, but that's another thread). The issue is:

At Site A, Host #1 has around 110VM's on it and it's using around 369GB RAM. Looking at the VM status, I see that most of the VM's are using less than half of the allocated 6GB RAM, so I assume that the VirtIO balloon driver is releasing the RAM back to PVE.

After the VMs from Site A Host 1 were copied to Site B Host 1, the memory usage doubled to over 680GB used as Site B, and then the swap filled up. Looking at the status of the individual VM's, I again see very low RAM usage from the VM's themselves, but significantly increased host memory usage.

Why would the same number of VM's on the same hardware use double the RAM on PVE 8.3.4 vs. the older 7.4-3? The only thing I can think of is I would need to potentially update the VirtIO drivers in each VM...does this make sense?
 
I again see very low RAM usage from the VM's themselves, but significantly increased host memory usage.
What underlying storage are you using?

If its ZFS, you might need to limit the ZFS ARC memory.
That's a common thing, there a loads of threads about this topic.
 
The storage is on a separate dedicated server running TrueNAS. The Proxmox hosts have no local storage (other than the boot drives).
 
From one of the hosts:

root@vhost1:~# cat /sys/module/zfs/parameters/zfs_arc_max
0
root@vhost1:~# cat /sys/module/zfs/parameters/zfs_arc_min
0
 
What is the state of ksm on both? Is ksmtuned configured the same on both?

EDIT: This is on the "breaking changes" list from the Proxmox 7 -> 8 upgrade notes:
  • Kernels based on 6.2 have a degraded Kernel Samepage Merging (KSM) performance on multi-socket NUMA systems.
    • Depending on the workload this can result in a significant amount of memory that is not deduplicated anymore.
    • This issue went unnoticed for a few kernel releases, making a clean backport of the fixes made for 6.5 hard to do without some general fall-out.
    Until we either find a targeted fix for our kernel, or change the default kernel to a 6.5 based kernel (planned for 2023'Q4), the current recommendation is to keep your multi-socket NUMA systems that rely on KSM on Proxmox VE 7 with it's 5.15 based kernel.
I don't know what ever happened with this (I don't personally use the Proxmox kernel, partially for reasons just like this - my PVE 8.3.4 kernel is Debian's stock 6.1), but it does read like what you're seeing.
 
Last edited:
Crap....ok...thank you so much for that information -- it does sound exactly like what's happening.

I think I may have to add a new host to the cluster based on Bookworm and play Musical Chairs with the VM's, eventually re-installing the entire cluster using Bookworm.

Thanks again; this is definitely the issue.
 
I think I may have to add a new host to the cluster based on Bookworm and play Musical Chairs with the VM's, eventually re-installing the entire cluster using Bookworm.
I'm aware of that documentation. I ran into it myself.

Check this thread where I first questioned and then worked around this limitation. It's actually not too hard to convince PXE to stop demanding its own kernel. It will be slightly more complicated for you since I suspect you will need ZFS, but this is available in Debian contrib repositories.
 
EDIT: This is on the "breaking changes" list from the Proxmox 7 -> 8 upgrade notes:
I find it unlikely this is relevant to the OP's issue. His site A is running on 7.4.3 smoothly (don't know which kernel), his site B is running 8.3.4 so kernel 6.8.x, and as the linked notes state this KSM issue is not relevant as of kernel 6.5.
 
Is there any documentation that the issue was fixed in 6.5?

I thought about it and installing it over top of Debian but using the non-PVE kernel is a little bit "hacky". I can't risk instability with this many VM's and an SLA in place. I think I will need to simply add another host so I have more resources and I can't go back to 7.4.
 
Is there any documentation that the issue was fixed in 6.5?
The already linked release note clearly states:
  • This issue went unnoticed for a few kernel releases, making a clean backport of the fixes made for 6.5 hard to do without some general fall-out.
so they did not backport it to the previous kernel (6.2) - but it was inherently fixed (by Linux) in the 6.5 kernel. A line later, you will see the same thing:
or change the default kernel to a 6.5 based kernel (planned for 2023'Q4),
So from kernel 6.5.x it is fixed (inherently), we are now on 6.8x.

I didn't search extensively on general Linux kernel changes, but found this note on the Red Hat docs chapter 7.KSM:
Note

Starting in Red Hat Enterprise Linux 6.5, KSM is NUMA aware. This allows it to take NUMA locality into account while coalescing pages, thus preventing performance drops related to pages being moved to a remote node. Red Hat recommends avoiding cross-node memory merging when KSM is in use. If KSM is in use, change the /sys/kernel/mm/ksm/merge_across_nodes tunable to 0 to avoid merging pages across NUMA nodes. Kernel memory accounting statistics can eventually contradict each other after large amounts of cross-node merging. As such, numad can become confused after the KSM daemon merges large amounts of memory. If your system has a large amount of free memory, you may achieve higher performance by turning off and disabling the KSM daemon. Refer to the Red Hat Enterprise Linux Performance Tuning Guide for more information on NUMA.

I imagine this is the fix, alluded to in the above Proxmox docs.
 
I appreciate everyone's input. Ultimately, here is the scenario: I have a SLA with a client and in the old environment, 3 hosts were enough to sustain a host failure -- there was enough RAM with 3x768GB to keep all vm's up if a host failed. In the new 8.3.4 environment, due to the fact that memory usage is now almost double, I am in trouble if a host fails. For this reason, I am building another host with 768GB RAM, to be delivered to the datacenter tomorrow, which will allow me to sustain a host failure. Once I connect this host, I will run some tests with KSM and report back.

VA1DER, I appreciate the knowledge share but the fact that you are basing it off hosting a single host with a few vm's is inapplicable to this scenario. It could very well be that I am the only person in the world running 250 production vm's off Proxmox, but I find it hard to believe. The facts are:

On PVE 7.4-3, 110 vm's (Win 10, 6GB allocated), on a single host with 768GB RAM used around 380GB RAM.

On PVE 8.3.4, with the same specs, THE SAME 110 vm's on a single host with 768GB RAM use almost 700GB RAM and KSM starts swapping.

There is no local ZFS -- all vm's are stored on TrueNAS via 10GB NFS backed by NVME drives.

I have to add a new host to the 8.3.4 cluster in order to be able to sustain a possible host failure. It is unfortunate but it is what it is. Once the host is in place I will report back but as it stands right now, I can't risk playing around with random settings on hosts because if something breaks , I am screwed. Also, installing PVE on top of Debian and keeping the old kernel is risky.

I've been doing this for a LONG time and I am still stupid -- I always assume that the newer version of something is better than the old. I should have just stayed with 7.4.-x on the new cluster as well, but it's too late. I will add another host to the cluster tomorrow.
 
I don't see any reference to it being fixed -- just people arguing about whether it has been fixed or not. I am attaching a screenshot of a direct comparison between the same 94 VM's currently running in my environment on one of my hosts. The old host is 7.4-3 and the new host is 8.3.4. With the same number of identical VM's running, the memory usage in the new environment is more than 2x.

Simply put, there is NO WAY that this has been fixed. The VM's on the old host initially used 500+GB when first booted, then "settled in" overnight. I powered them on last night specifically to show the screenshot. The VM's in the new environment have been up for about a week. Notice the KSM and SWAP values on both.

I am contemplating scheduling downtime and rebuilding the new cluster with 7.4-x.

old_vs_new_mem_usage.png
 
KSM merge only after 80% used RAM.

1st host during first boot, fill more than 80% of RAM so KSM merge many RAM.

Other host during first boot doesn't fully fill the RAM so KSM merge only a small amount.

It's expected for PVE ( and OS in general) to use all available RAM.
If you need more free RAM, ksm 80% threshold can be adjusted.
 
Strange how it is fine on all 3 hosts in the old environment, but OK, I will migrate 20 more VM's to the new host and report back.
 
Last edited:
I migrated 20 more vm's to this host....I really hope KSM kicks in soon. FYI, I never saw this high memory usage or swap usage in the 7.4 environment.

new_mem_usage.png
 
KSM problem was with 6.2 kernel with multi sockets , fixed since later 2023 with kernel 6.5 => topic

Here with PVE 8.1 with 6.8 kernel , will update this week-end.
Single socket node , running a single Windows VM , after uptime 4h
1742664723879.png
 
So, after two hours here's what I noticed:

1. /etc/ksmtuned.conf is the same on both hosts -- everything is commented out. No changes from default.
2. KSM kicked in after RAM was 80% full and slowly started lowering the amount of used RAM. HOWEVER, it seems that once the RAM usage dropped to 80%, KSM completely stopped. Also, SWAP usage remains high. This is different behavior that with 7.4. After 2 hours, I see the below. KSM has stopped trying to lower the memory below 80%.

new_mem_usage2.png
 
Below is a direct comparison between what is currently running in the old environment versus the new. This is 114 VM's (Windows 10, 6GB allocated), taken a few minutes ago:

old_vs_new_mem_usage2.png