Suspected memory leak in Proxmox CE 9.0.6

solevi

New Member
Jul 19, 2024
9
0
1
Hello,
I am facing memory exhaustion with two out of three Proxmox nodes in the same HA cluster.
I did a lot of searching, but failed to find my exact situation, so below I extracted the info asked in the other similar cases.
To rule out the first suspect, I am not using ZFS.

arcstat shows:
Bash:
root@ovh-px-01:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
16:27:44     0       0     0       0     0      0    0   2.8K   2.0G  -1.2G

arc_summary -s arc shows:
Bash:
root@ovh-px-01:~# arc_summary -s arc

------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Sep 03 16:32:01 2025
Linux 6.14.11-1-pve                                           2.3.4-pve1
Machine: ovh-px-01 (x86_64)                                   2.3.4-pve1

ARC status:
        Total memory size:                                      62.4 GiB
        Min target size:                                3.1 %    2.0 GiB
        Max target size:                               19.2 %   12.0 GiB
        Target size (adaptive):                       < 0.1 %    2.0 GiB
        Current size:                                 < 0.1 %    2.8 KiB
        Free memory size:                                      860.2 MiB
        Available memory size:                         -1441892224 Bytes

ARC structural breakdown (current size):                         2.8 KiB
        Compressed size:                                0.0 %    0 Bytes
        Overhead size:                                  0.0 %    0 Bytes
        Bonus size:                                     0.0 %    0 Bytes
        Dnode size:                                     0.0 %    0 Bytes
        Dbuf size:                                      0.0 %    0 Bytes
        Header size:                                  100.0 %    2.8 KiB
        L2 header size:                                 0.0 %    0 Bytes
        ABD chunk waste size:                           0.0 %    0 Bytes

ARC types breakdown (compressed + overhead):                     0 Bytes
        Data size:                                        n/a    0 Bytes
        Metadata size:                                    n/a    0 Bytes

ARC states breakdown (compressed + overhead):                    0 Bytes
        Anonymous data size:                              n/a    0 Bytes
        Anonymous metadata size:                          n/a    0 Bytes
        MFU data target:                               37.5 %    0 Bytes
        MFU data size:                                    n/a    0 Bytes
        MFU evictable data size:                          n/a    0 Bytes
        MFU ghost data size:                                     0 Bytes
        MFU metadata target:                           12.5 %    0 Bytes
        MFU metadata size:                                n/a    0 Bytes
        MFU evictable metadata size:                      n/a    0 Bytes
        MFU ghost metadata size:                                 0 Bytes
        MRU data target:                               37.5 %    0 Bytes
        MRU data size:                                    n/a    0 Bytes
        MRU evictable data size:                          n/a    0 Bytes
        MRU ghost data size:                                     0 Bytes
        MRU metadata target:                           12.5 %    0 Bytes
        MRU metadata size:                                n/a    0 Bytes
        MRU evictable metadata size:                      n/a    0 Bytes
        MRU ghost metadata size:                                 0 Bytes
        Uncached data size:                               n/a    0 Bytes
        Uncached metadata size:                           n/a    0 Bytes

ARC hash breakdown:
        Elements:                                                      0
        Collisions:                                                    0
        Chain max:                                                     0
        Chains:                                                        0

ARC misc:
        Uncompressed size:                                n/a    0 Bytes
        Memory throttles:                                              0
        Memory direct reclaims:                                        0
        Memory indirect reclaims:                                      0
        Deleted:                                                       0
        Mutex misses:                                                  0
        Eviction skips:                                                0
        Eviction skips due to L2 writes:                               0
        L2 cached evictions:                                     0 Bytes
        L2 eligible evictions:                                   0 Bytes
        L2 eligible MFU evictions:                        n/a    0 Bytes
        L2 eligible MRU evictions:                        n/a    0 Bytes
        L2 ineligible evictions:                                 0 Bytes

free -h shows:
Bash:
root@ovh-px-01:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            62Gi        61Gi       789Mi        44Mi       534Mi       652Mi
Swap:          2.0Gi       2.0Gi       1.5Mi

top -co%MEM shows: 1756902645620.png

And the summary page is as follows:
1756902863747.png

Here is the second node affected:
1756903234983.png

And this is the status page from the not-affected node:
1756903486596.png

echo 1 > /proc/sys/vm/drop_caches does virtually nothing.

There are very few things that are running on this cluster:
1756903720785.png

Total memory usage of all VMs is barely 8 GB on all nodes combined.

Please help me identify what causes memory leaks.
 
Last edited:
Forgot to mention, that this problem started after upgrading from 8.4. I had no such problems with the 8.4 version (same machines, upgraded with apt dist-upgrade).
 
could you post the contents of /proc/meminfo when your system is in this state?
 
Hello Fabian,
Thanks for looking in.
While waiting for someone to have a look, I forced Proxmox to boot the 6.8.12-13-pve kernel; no difference so far. The info below is for this kernel version.

Currently, it's not exactly a failed state, but getting there:
1756973496969.png

ATM meminfo shows following :
Bash:
root@ovh-px-01:~# cat /proc/meminfo
MemTotal:       65478560 kB
MemFree:        20683936 kB
MemAvailable:   31841344 kB
Buffers:         6711916 kB
Cached:          4882856 kB
SwapCached:       121212 kB
Active:          9744204 kB
Inactive:       10914048 kB
Active(anon):    8916648 kB
Inactive(anon):   235116 kB
Active(file):     827556 kB
Inactive(file): 10678932 kB
Unevictable:      193868 kB
Mlocked:          191992 kB
SwapTotal:       2097144 kB
SwapFree:        1932280 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:              1488 kB
Writeback:             0 kB
AnonPages:       9255132 kB
Mapped:           847068 kB
Shmem:             74932 kB
KReclaimable:     377880 kB
Slab:             644484 kB
SReclaimable:     377880 kB
SUnreclaim:       266604 kB
KernelStack:       12480 kB
PageTables:        37848 kB
SecPageTables:        32 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34836424 kB
Committed_AS:   15641944 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      160720 kB
VmallocChunk:          0 kB
Percpu:            17424 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2103296 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      384284 kB
DirectMap2M:    13926400 kB
DirectMap1G:    53477376 kB
 
could you recheck once the usage has risen further? in particular, it would be interesting if Inactive(file): 10678932 kB is the culprit
 
Just as an intermediate update prior to failing:
1757067021588.png

Bash:
root@ovh-px-01:~# cat /proc/meminfo
MemTotal:       65478560 kB
MemFree:          848188 kB
MemAvailable:    5652664 kB
Buffers:         4249688 kB
Cached:          1112844 kB
SwapCached:            0 kB
Active:          8214680 kB
Inactive:        4956192 kB
Active(anon):    7895268 kB
Inactive(anon):      220 kB
Active(file):     319412 kB
Inactive(file):  4955972 kB
Unevictable:      178816 kB
Mlocked:          176940 kB
SwapTotal:       2097144 kB
SwapFree:        2094840 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               272 kB
Writeback:           132 kB
AnonPages:       7987256 kB
Mapped:           264400 kB
Shmem:             73824 kB
KReclaimable:     245912 kB
Slab:             478536 kB
SReclaimable:     245912 kB
SUnreclaim:       232624 kB
KernelStack:        9632 kB
PageTables:        27032 kB
SecPageTables:        32 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34836424 kB
Committed_AS:   12188436 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      156736 kB
VmallocChunk:          0 kB
Percpu:            14544 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2101248 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      384284 kB
DirectMap2M:    13926400 kB
DirectMap1G:    53477376 kB

Is it me, or does the memory just vanish?
 
Hi,
I'm in touch with the support facing the same issue on pve 8.4.12.
It startet with kernel 6.8.12-12-pve, so kernel 6.8.12-11-pve is safe to use.

Best

Knut
Hi Knut,

Thanks for the hint.
I've downloaded the deb package of 6.8.12-11-pve kernel from the bookworm repo, installed it, and set it as the next boot item.
Let's see what will happen then.
 
Node is exhausted again:

1757094245505.png

Bash:
root@ovh-px-01:~# free -m
               total        used        free      shared  buff/cache   available
Mem:           63943       63063         829          52         733         879
Swap:           2047        2047           0
root@ovh-px-01:~# cat /proc/meminfo
MemTotal:       65478560 kB
MemFree:          818276 kB
MemAvailable:     878976 kB
Buffers:          384556 kB
Cached:           299660 kB
SwapCached:        73816 kB
Active:          3594524 kB
Inactive:        1682872 kB
Active(anon):    3519280 kB
Inactive(anon):  1141496 kB
Active(file):      75244 kB
Inactive(file):   541376 kB
Unevictable:      202620 kB
Mlocked:          200744 kB
SwapTotal:       2097144 kB
SwapFree:           1700 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               188 kB
Writeback:            60 kB
AnonPages:       4727376 kB
Mapped:           190176 kB
Shmem:             54272 kB
KReclaimable:      75888 kB
Slab:             283736 kB
SReclaimable:      75888 kB
SUnreclaim:       207848 kB
KernelStack:        9744 kB
PageTables:        24668 kB
SecPageTables:        60 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34836424 kB
Committed_AS:   10823028 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      139300 kB
VmallocChunk:          0 kB
Percpu:            12288 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1355776 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      384284 kB
DirectMap2M:    13926400 kB
DirectMap1G:    53477376 kB

Will reboot now to 6.8.12-11-pve kernel.
 
And for sake of completeness, here is the second failing node:

1757154328725.png

Bash:
root@ovh-px-03:~# cat /proc/meminfo
MemTotal:       65445840 kB
MemFree:         1374924 kB
MemAvailable:    1454492 kB
Buffers:           45752 kB
Cached:           636216 kB
SwapCached:        76580 kB
Active:          1164456 kB
Inactive:        1098648 kB
Active(anon):     685076 kB
Inactive(anon):   947904 kB
Active(file):     479380 kB
Inactive(file):   150744 kB
Unevictable:      163952 kB
Mlocked:          162076 kB
SwapTotal:       2097144 kB
SwapFree:          13060 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               188 kB
Writeback:             0 kB
AnonPages:       1679808 kB
Mapped:           240540 kB
Shmem:             41140 kB
KReclaimable:      86396 kB
Slab:             332188 kB
SReclaimable:      86396 kB
SUnreclaim:       245792 kB
KernelStack:       12864 kB
PageTables:        26488 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34820064 kB
Committed_AS:    8919016 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      142472 kB
VmallocChunk:          0 kB
Percpu:            20640 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      478496 kB
DirectMap2M:    14848000 kB
DirectMap1G:    52428800 kB