Memory leak

efg · Jul 16, 2024

Hi!
Some servers in my cluster are experiencing unrelated RAM consumption. For example there is a host running two kvm with 120GB dedicated memory + zfs arc 16GB. But at the same time the memory consumed by the host is 468GB.

468 - 120 -120 - 16 = 212
What can another 212GB be used for?

Bash:

root@proxmox-ef03:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       468Gi        35Gi        72Mi       1.9Gi        34Gi
Swap:             0B          0B          0B



root@proxmox-ef03:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
15:24:12     0       0     0       0     0      0    0    26M    16G    20G


root@proxmox-ef03:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       152 stage02      running    122880           100.00 2637810
       158 stage01      running    122880           100.00 3687733


root@proxmox-ef03:~# top -b -o +%MEM | head -n 30
top - 15:16:56 up 83 days,  4:32,  2 users,  load average: 1.23, 1.56, 1.86
Tasks: 1183 total,   1 running, 1182 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.8 us,  0.0 sy,  0.0 ni, 97.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 515614.5 total,  36603.4 free, 479850.0 used,   1938.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  35764.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2637810 root      20   0  125.8g  85.8g   4608 S  18.8  17.0     4w+2d kvm
3687733 root      20   0  125.3g  54.1g   7168 S  68.8  10.7     75,38 kvm
3633944 root      20   0   25.6g  12.8g   4096 S   0.0   2.5   0:38.75 kvm
 471478 root      rt   0  679104 269148  52892 S   0.0   0.1     20,41 corosync
   1914 root      20   0  247452 237056 235520 S   0.0   0.0  27:53.43 ladvd
   1922 root      20   0 6627252 178176   8192 S   0.0   0.0     12,44 soc
   3559 www-data  20   0  362356 152064  21504 S   0.0   0.0   1:33.39 pveproxy
2040950 www-data  20   0  371156 142308  10240 S   0.0   0.0   0:04.22 pveprox+
2662636 www-data  20   0  371156 140772   8704 S   0.0   0.0   0:01.34 pveprox+
2378955 www-data  20   0  371208 140260   8704 S   0.0   0.0   0:02.81 pveprox+
1741387 root      20   0  369760 138860   7680 S   0.0   0.0   0:03.93 pvedaem+
1741639 root      20   0  369760 137836   7168 S   0.0   0.0   0:03.56 pvedaem+
1763939 root      20   0  369760 137324   6656 S   0.0   0.0   0:03.23 pvedaem+
   3500 root      20   0  360940 135168   6144 S   0.0   0.0   0:59.73 pvedaem+
   3713 root      20   0  343616 110716   3072 S   0.0   0.0   7:41.19 pvesche+
   3552 root      20   0  348104 107800   3456 S   0.0   0.0  17:56.64 pve-ha-+
   3619 root      20   0  347556 107116   3072 S   0.0   0.0  11:41.90 pve-ha-+
   3485 root      20   0  291096 102700   7680 S   0.0   0.0     13,34 pvestatd
   3469 root      20   0  286388  93292   2048 S   0.0   0.0 195:00.70 pve-fir+
 684254 root       0 -20   85884  78052   3584 S   0.0   0.0   0:16.15 atop
   3613 www-data  20   0   80788  63488  13312 S   0.0   0.0   0:57.92 spicepr+
  87244 root      20   0  126580  56832   5120 S   0.0   0.0   3:05.68 puppet
 687090 www-data  20   0   81032  53428   3584 S   0.0   0.0   0:01.33 spicepr+

mira · Jul 16, 2024

The guests don't even use the whole 120GB yet -> RES is 85.8g and 54.1g
Have you tried echo 3 > /proc/sys/vm/drop_caches yet? Memory doesn't seem to be used for caches according to `free` and `top`, but I've seen it help in some cases still.

efg · Jul 16, 2024

I tried to perform this now, but it had no effect unfortunately.

mira · Jul 17, 2024

Could you check slabtop -s c?
It should show something like this:

Code:

 Active / Total Objects (% used)    : 2770699 / 2863715 (96.8%)
 Active / Total Slabs (% used)      : 75422 / 75422 (100.0%)
 Active / Total Caches (% used)     : 359 / 431 (83.3%)
 Active / Total Size (% used)       : 1192885.05K / 1232720.11K (96.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.43K / 16.00K

And it should be sorted by `cache size` in descending order.

mira · Jul 17, 2024

One more thing, which kernel are you running currently?
uname -a

Do you use a Samba share for anything and do you copy things over to it regularly?

efg · Jul 17, 2024

I had to reboot the original server, but I found another server with a similar problem.

Bash:

root@proxmox-sd07:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       181Gi        21Gi        72Mi       304Gi       321Gi
Swap:             0B          0B          0B

root@proxmox-sd07:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
12:54:13     0       0     0       0     0      0    0    31G    32G   192G

root@proxmox-sd07:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 dev01       running    32000           1000.00 167997
       161 prod01       running    4096              32.00 867249

root@proxmox-sd07:~# top -b -o +%MEM | head -n 20
top - 12:54:41 up 26 days, 22:34,  1 user,  load average: 1.26, 1.02, 1.14
Tasks: 1153 total,   1 running, 1152 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.4 us,  2.4 sy,  0.0 ni, 92.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 515620.2 total,  22039.4 free, 185709.4 used, 311349.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 329910.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 167997 root      20   0   34.9g  31.4g  15304 S 175.0   6.2     4w+5d kvm
 867249 root      20   0 6487672   3.9g   7680 S   0.0   0.8     11,39 kvm
   9783 root      rt   0  661276 257788  53760 S   0.0   0.0      7,24 corosync
  11869 www-data  20   0  362312 158208  28160 S   0.0   0.0   0:39.96 pveproxy
   8325 root      20   0  155352 145920 144384 S   0.0   0.0   9:12.80 ladvd
 969516 www-data  20   0  371148 141860  10240 S   0.0   0.0   0:03.27 pveprox+
 969517 www-data  20   0  371116 141860  10240 S   0.0   0.0   0:03.29 pveprox+
 969518 www-data  20   0  371120 139812   8192 S   0.0   0.0   0:03.45 pveprox+
1562062 root      20   0  369784 137968   7168 S   0.0   0.0   0:17.39 pvedaem+
 162911 root      20   0  369788 136944   6144 S   0.0   0.0   0:12.84 pvedaem+
 717722 root      20   0  369744 134896   5120 S   0.0   0.0   0:01.25 pvedaem+
  10630 root      20   0  360976 132844   3072 S   0.0   0.0   0:23.76 pvedaem+
  11882 root      20   0  343780 110784   3072 S   0.0   0.0   3:19.21 pvesche+

Bash:

root@proxmox-sd07:~# slabtop -s c --sort=c

 Active / Total Objects (% used)    : 115467831 / 133883598 (86.2%)
 Active / Total Slabs (% used)      : 5087007 / 5087007 (100.0%)
 Active / Total Caches (% used)     : 360 / 424 (84.9%)
 Active / Total Size (% used)       : 106013824.38K / 108973367.38K (97.3%)
 Minimum / Average / Maximum Object : 0.01K / 0.81K / 16.25K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
43715606 43715606 100%    2.00K 2734545       16  87505440K kmalloc-rnd-11-2k
43716512 43716447  99%    0.25K 1366141       32  10929128K skbuff_head_cache
7194992 6222337  86%    0.57K 128482       56   4111424K radix_tree_node
11077812 4035750  36%    0.23K 325818       34   2606544K arc_buf_hdr_t_full
 49908  47063  94%   16.00K  24954        2    798528K zio_buf_comb_16384
6192537 3011253  48%    0.10K 158783       39    635132K abd_t
4281927 4281927 100%    0.10K 109793       39    439172K buffer_head
8444032 2446511  28%    0.03K  65969      128    263876K kmalloc-rnd-12-32
240896 240595  99%    1.00K   7528       32    240896K kmalloc-rnd-04-1k
439872 253507  57%    0.50K  13746       32    219936K kmalloc-rnd-14-512
663424 663364  99%    0.25K  20732       32    165856K kmalloc-rnd-14-256
2381504 2381504 100%    0.06K  37211       64    148844K dmaengine-unmap-2
1312384 514999  39%    0.06K  20506       64     82024K kmalloc-rnd-12-64
 44388  37172  83%    1.16K   1644       27     52608K ext4_inode_cache
  6528   6519  99%    8.00K   1632        4     52224K kmalloc-rnd-14-8k
250194 187399  74%    0.19K   5957       42     47656K dentry
 51357  49323  96%    0.62K   1007       51     32224K inode_cache
 46138  45278  98%    0.70K   1003       46     32096K proc_inode_cache
  7776   6587  84%    4.00K    972        8     31104K zfs_btree_leaf_cache
  1934   1896  98%   11.81K    967        2     30944K task_struct
 69044  55541  80%    0.38K   1684       41     26944K dmu_buf_impl_t
174608 170098  97%    0.14K   3118       56     24944K kernfs_node_cache
269024 264071  98%    0.07K   4804       56     19216K vmap_area
 16232  16077  99%    1.00K    509       32     16288K kmalloc-rnd-15-1k
 15609  15600  99%    0.96K    473       33     15136K dnode_t
 24480  24314  99%    0.50K    765       32     12240K kmalloc-rnd-02-512
190528 190528 100%    0.06K   2977       64     11908K kmalloc-rnd-05-64

efg · Jul 17, 2024

Bash:

root@proxmox-sd07:~# uname -a
Linux proxmox-sd07 6.8.8-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-1 (2024-06-10T11:42Z) x86_64 GNU/Linux

I have storage cifs connected in a pve cluster. Sometimes used for backing up virtual machines

Bash:

cifs: VEEAM01
        path /mnt/pve/VEEAM01
        server veeam01.XXXXX.lan
        share Proxmox_Backup
        content iso,images,vztmpl,backup
        prune-backups keep-all=1
        username proxmox_backup

mira · Jul 17, 2024

This could be related to the following issue then: https://forum.proxmox.com/threads/k...umes-more-memory-than-6-5.147603/#post-682388

Code:

root@proxmox-sd07:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       181Gi        21Gi        72Mi       304Gi       321Gi
Swap:             0B          0B          0B

Here most of the memory is used for buffers/caches, see 304Gi under that section. And there's still 321Gi available.
In your first post on the other hand you had neither free nor available memory.

efg · Jul 17, 2024

vms - 32G + 4G
Arc - 32G
But at the same time, Used - 181G

efg · Jul 17, 2024

I checked, CIFS Storage is not used for dumps. All backups are put on pbs. I guess that's not my case

gfngfn256 · Jul 17, 2024

efg said:
I guess that's not my case

I would not be so sure, since if the memory bug outlined in the post attached by mira is of an SMB-nature; then AFAIK your CIFS storage whatever its used for will also use some form of SMB to connect, so you will probably be subject to that memory bug.

efg · Jul 17, 2024

gfngfn256 said:
I would not be so sure, since if the memory bug outlined in the post attached by mira is of an SMB-nature; then AFAIK your CIFS storage whatever its used for will also use some form of SMB to connect, so you will probably be subject to that memory bug.

I am basing this on the following:
The post says that memory leaks during smb dump. I have it without any additional actions.
Also, it says that the problem appeared when I switched to kernel 6.8, but this problem started when I switched from pve 7.5 to pve 8.1.4 (Linux kernel 6.5.13-1-pve).

Search

Search

Memory leak

efg

Member

mira

Proxmox Staff Member

efg

Member

mira

Proxmox Staff Member

mira

Proxmox Staff Member

efg

Member

efg

Member

mira

Proxmox Staff Member

efg

Member

efg

Member

gfngfn256

Renowned Member

efg

Member