Host memory exhaustion: Shmem ~= AnonPages, no ZFS

krtek

New Member
Apr 9, 2026
1
0
1
Brno
Hello,
I am currently trying to solve a problem with OOM/memory exhaustion on 2 of our proxmox hosts. The problem is extensive shared and buff/cache memory usage, which none of our other hosts have. The 2 problematic nodes are in 3-node cluster running pve-manager/9.0.3/025864202ebb6109 (running kernel: 6.14.8-2-pve) on 2 sockets with Intel Xeon Gold 6330N.
The problem here is obvious:
Bash:
free -h
               total        used        free      shared  buff/cache   available
Mem:           377Gi       371Gi       5,1Gi       182Gi       185Gi       5,4Gi
Swap:          6,1Gi       6,1Gi        36Ki
but the total sum of all memory assigned to VMs is only around 200GiB. All our other nodes in different clusters have memory consumption roughly equal to the sum of memory assigned to VMs.

For reference, here is info from a machine with the same HW configuration and VM memory sum of 238GiB.
Bash:
free -h
               total        used        free      shared  buff/cache   available
Mem:           377Gi       247Gi       128Gi       5,5Gi       8,8Gi       129Gi
Swap:          6,1Gi          0B       6,1Gi
All our nodes are installed via ansible with same configuration and are running the mentioned PVE version and kernel. This is not a ZFS problem since all have zfs_arc_max 17179869184 and arcstats is virtually zero. I will provide data for only one of the bad nodes since the other is virtually in the same situation.

Here is few selected metrics from /proc/meminfo
MetricBad nodeGood nodeUnit
MemTotal395389696395390644kB
MemFree5426088134788944kB
MemAvailable5761788136034968kB
Cached1937233888878728kB
SwapCached798760kB
Unevictable206348288816kB
SwapTotal64225246422524kB
SwapFree366422524kB
AnonPages191609476247222708kB
Mapped217564250364kB
Shmem1916107325751292kB
Slab19794401422216kB
SReclaimable598088222448kB
KernelStack2955223424kB
PageTables406380507012kB
CommitLimit204117372204117844kB
Committed_AS423805608261909520kB
Percpu390656427840kB
AnonHugePages171546624245547008kB
ShmemHugePages00kB
ShmemPmdMapped00kB

So on the bad host Shmem is ~191 GiB, AnonPages is also ~191 GiB. On the good host AnonPages is even higher, but Shmem stays small.

Did some more investigative work and came up with this:
  • /dev/shm is almost empty
  • ipcs -m shows nothing useful
  • RssShmem / Pss_Shmem for processes is small
  • kvm processes mostly show:
    • high Pss_Anon
    • high Anonymous
    • high AnonHugePages
    • almost zero Pss_Shmem
  • ruled out CPU problems since the machines with the same CPU are running good
Example of large KVM on bad host (and i don't think there is anything bad with it since on the good host there are similar VMs with almost identical numbers).
Bash:
PID 3075114:
Rss:            50242852 kB
Pss:            50231151 kB
Pss_Anon:       50228792 kB
Pss_File:           2315 kB
Pss_Shmem:            44 kB
Anonymous:      50228792 kB
AnonHugePages:  35942400 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB

We tried to turn off balooning on the biggest VMs on the problematic hosts. There was around 80GiB of free mem after we restarted the VMs but it came back to almost nothing again. The workload is very mixed: haproxy, dns, traefik, k8s (biggest VMs memorywise), appcloud, keycloack, ldap, pfsense. I suspected the workload to be the problem but the restart of the biggest VMs freed only 80GiB not the 170GiB there should be. I assume all VMs on this node add to the Shmem their own part.

The only differnece from the other clusters is that the bad node VMs have disks on local storage SSDs with VirtIO SCSI controller and not network storage.

I am trying to solve this for a few weeks now and did not find this particular problem anywhere. The only lead is that the Shared memory and the Anon pages are almost same size and I suspect the VMs memory is somehow cached/duplicated there, but I donť know why or how.

I would very much appreciate any insight or help with this problem. I can provide more outputs if needed.