Memory leak

efg

Member
Nov 24, 2020
14
0
21
41
Hi!
Some servers in my cluster are experiencing unrelated RAM consumption. For example there is a host running two kvm with 120GB dedicated memory + zfs arc 16GB. But at the same time the memory consumed by the host is 468GB.

468 - 120 -120 - 16 = 212
What can another 212GB be used for?
Bash:
root@proxmox-ef03:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       468Gi        35Gi        72Mi       1.9Gi        34Gi
Swap:             0B          0B          0B



root@proxmox-ef03:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
15:24:12     0       0     0       0     0      0    0    26M    16G    20G


root@proxmox-ef03:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       152 stage02      running    122880           100.00 2637810
       158 stage01      running    122880           100.00 3687733


root@proxmox-ef03:~# top -b -o +%MEM | head -n 30
top - 15:16:56 up 83 days,  4:32,  2 users,  load average: 1.23, 1.56, 1.86
Tasks: 1183 total,   1 running, 1182 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.8 us,  0.0 sy,  0.0 ni, 97.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 515614.5 total,  36603.4 free, 479850.0 used,   1938.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  35764.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2637810 root      20   0  125.8g  85.8g   4608 S  18.8  17.0     4w+2d kvm
3687733 root      20   0  125.3g  54.1g   7168 S  68.8  10.7     75,38 kvm
3633944 root      20   0   25.6g  12.8g   4096 S   0.0   2.5   0:38.75 kvm
 471478 root      rt   0  679104 269148  52892 S   0.0   0.1     20,41 corosync
   1914 root      20   0  247452 237056 235520 S   0.0   0.0  27:53.43 ladvd
   1922 root      20   0 6627252 178176   8192 S   0.0   0.0     12,44 soc
   3559 www-data  20   0  362356 152064  21504 S   0.0   0.0   1:33.39 pveproxy
2040950 www-data  20   0  371156 142308  10240 S   0.0   0.0   0:04.22 pveprox+
2662636 www-data  20   0  371156 140772   8704 S   0.0   0.0   0:01.34 pveprox+
2378955 www-data  20   0  371208 140260   8704 S   0.0   0.0   0:02.81 pveprox+
1741387 root      20   0  369760 138860   7680 S   0.0   0.0   0:03.93 pvedaem+
1741639 root      20   0  369760 137836   7168 S   0.0   0.0   0:03.56 pvedaem+
1763939 root      20   0  369760 137324   6656 S   0.0   0.0   0:03.23 pvedaem+
   3500 root      20   0  360940 135168   6144 S   0.0   0.0   0:59.73 pvedaem+
   3713 root      20   0  343616 110716   3072 S   0.0   0.0   7:41.19 pvesche+
   3552 root      20   0  348104 107800   3456 S   0.0   0.0  17:56.64 pve-ha-+
   3619 root      20   0  347556 107116   3072 S   0.0   0.0  11:41.90 pve-ha-+
   3485 root      20   0  291096 102700   7680 S   0.0   0.0     13,34 pvestatd
   3469 root      20   0  286388  93292   2048 S   0.0   0.0 195:00.70 pve-fir+
 684254 root       0 -20   85884  78052   3584 S   0.0   0.0   0:16.15 atop
   3613 www-data  20   0   80788  63488  13312 S   0.0   0.0   0:57.92 spicepr+
  87244 root      20   0  126580  56832   5120 S   0.0   0.0   3:05.68 puppet
 687090 www-data  20   0   81032  53428   3584 S   0.0   0.0   0:01.33 spicepr+
 
The guests don't even use the whole 120GB yet -> RES is 85.8g and 54.1g
Have you tried echo 3 > /proc/sys/vm/drop_caches yet? Memory doesn't seem to be used for caches according to `free` and `top`, but I've seen it help in some cases still.
 
Could you check slabtop -s c?
It should show something like this:
Code:
 Active / Total Objects (% used)    : 2770699 / 2863715 (96.8%)
 Active / Total Slabs (% used)      : 75422 / 75422 (100.0%)
 Active / Total Caches (% used)     : 359 / 431 (83.3%)
 Active / Total Size (% used)       : 1192885.05K / 1232720.11K (96.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.43K / 16.00K
And it should be sorted by `cache size` in descending order.
 
One more thing, which kernel are you running currently?
uname -a

Do you use a Samba share for anything and do you copy things over to it regularly?
 
I had to reboot the original server, but I found another server with a similar problem.

Bash:
root@proxmox-sd07:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       181Gi        21Gi        72Mi       304Gi       321Gi
Swap:             0B          0B          0B

root@proxmox-sd07:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
12:54:13     0       0     0       0     0      0    0    31G    32G   192G

root@proxmox-sd07:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 dev01       running    32000           1000.00 167997
       161 prod01       running    4096              32.00 867249

root@proxmox-sd07:~# top -b -o +%MEM | head -n 20
top - 12:54:41 up 26 days, 22:34,  1 user,  load average: 1.26, 1.02, 1.14
Tasks: 1153 total,   1 running, 1152 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.4 us,  2.4 sy,  0.0 ni, 92.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 515620.2 total,  22039.4 free, 185709.4 used, 311349.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 329910.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 167997 root      20   0   34.9g  31.4g  15304 S 175.0   6.2     4w+5d kvm
 867249 root      20   0 6487672   3.9g   7680 S   0.0   0.8     11,39 kvm
   9783 root      rt   0  661276 257788  53760 S   0.0   0.0      7,24 corosync
  11869 www-data  20   0  362312 158208  28160 S   0.0   0.0   0:39.96 pveproxy
   8325 root      20   0  155352 145920 144384 S   0.0   0.0   9:12.80 ladvd
 969516 www-data  20   0  371148 141860  10240 S   0.0   0.0   0:03.27 pveprox+
 969517 www-data  20   0  371116 141860  10240 S   0.0   0.0   0:03.29 pveprox+
 969518 www-data  20   0  371120 139812   8192 S   0.0   0.0   0:03.45 pveprox+
1562062 root      20   0  369784 137968   7168 S   0.0   0.0   0:17.39 pvedaem+
 162911 root      20   0  369788 136944   6144 S   0.0   0.0   0:12.84 pvedaem+
 717722 root      20   0  369744 134896   5120 S   0.0   0.0   0:01.25 pvedaem+
  10630 root      20   0  360976 132844   3072 S   0.0   0.0   0:23.76 pvedaem+
  11882 root      20   0  343780 110784   3072 S   0.0   0.0   3:19.21 pvesche+

Bash:
root@proxmox-sd07:~# slabtop -s c --sort=c

 Active / Total Objects (% used)    : 115467831 / 133883598 (86.2%)
 Active / Total Slabs (% used)      : 5087007 / 5087007 (100.0%)
 Active / Total Caches (% used)     : 360 / 424 (84.9%)
 Active / Total Size (% used)       : 106013824.38K / 108973367.38K (97.3%)
 Minimum / Average / Maximum Object : 0.01K / 0.81K / 16.25K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
43715606 43715606 100%    2.00K 2734545       16  87505440K kmalloc-rnd-11-2k
43716512 43716447  99%    0.25K 1366141       32  10929128K skbuff_head_cache
7194992 6222337  86%    0.57K 128482       56   4111424K radix_tree_node
11077812 4035750  36%    0.23K 325818       34   2606544K arc_buf_hdr_t_full
 49908  47063  94%   16.00K  24954        2    798528K zio_buf_comb_16384
6192537 3011253  48%    0.10K 158783       39    635132K abd_t
4281927 4281927 100%    0.10K 109793       39    439172K buffer_head
8444032 2446511  28%    0.03K  65969      128    263876K kmalloc-rnd-12-32
240896 240595  99%    1.00K   7528       32    240896K kmalloc-rnd-04-1k
439872 253507  57%    0.50K  13746       32    219936K kmalloc-rnd-14-512
663424 663364  99%    0.25K  20732       32    165856K kmalloc-rnd-14-256
2381504 2381504 100%    0.06K  37211       64    148844K dmaengine-unmap-2
1312384 514999  39%    0.06K  20506       64     82024K kmalloc-rnd-12-64
 44388  37172  83%    1.16K   1644       27     52608K ext4_inode_cache
  6528   6519  99%    8.00K   1632        4     52224K kmalloc-rnd-14-8k
250194 187399  74%    0.19K   5957       42     47656K dentry
 51357  49323  96%    0.62K   1007       51     32224K inode_cache
 46138  45278  98%    0.70K   1003       46     32096K proc_inode_cache
  7776   6587  84%    4.00K    972        8     31104K zfs_btree_leaf_cache
  1934   1896  98%   11.81K    967        2     30944K task_struct
 69044  55541  80%    0.38K   1684       41     26944K dmu_buf_impl_t
174608 170098  97%    0.14K   3118       56     24944K kernfs_node_cache
269024 264071  98%    0.07K   4804       56     19216K vmap_area
 16232  16077  99%    1.00K    509       32     16288K kmalloc-rnd-15-1k
 15609  15600  99%    0.96K    473       33     15136K dnode_t
 24480  24314  99%    0.50K    765       32     12240K kmalloc-rnd-02-512
190528 190528 100%    0.06K   2977       64     11908K kmalloc-rnd-05-64
 
Bash:
root@proxmox-sd07:~# uname -a
Linux proxmox-sd07 6.8.8-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-1 (2024-06-10T11:42Z) x86_64 GNU/Linux

I have storage cifs connected in a pve cluster. Sometimes used for backing up virtual machines

Bash:
cifs: VEEAM01
        path /mnt/pve/VEEAM01
        server veeam01.XXXXX.lan
        share Proxmox_Backup
        content iso,images,vztmpl,backup
        prune-backups keep-all=1
        username proxmox_backup
 
This could be related to the following issue then: https://forum.proxmox.com/threads/k...umes-more-memory-than-6-5.147603/#post-682388

Code:
root@proxmox-sd07:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       181Gi        21Gi        72Mi       304Gi       321Gi
Swap:             0B          0B          0B
Here most of the memory is used for buffers/caches, see 304Gi under that section. And there's still 321Gi available.
In your first post on the other hand you had neither free nor available memory.
 
I checked, CIFS Storage is not used for dumps. All backups are put on pbs. I guess that's not my case1721217539961.png
 
Last edited:
I guess that's not my case
I would not be so sure, since if the memory bug outlined in the post attached by mira is of an SMB-nature; then AFAIK your CIFS storage whatever its used for will also use some form of SMB to connect, so you will probably be subject to that memory bug.
 
I would not be so sure, since if the memory bug outlined in the post attached by mira is of an SMB-nature; then AFAIK your CIFS storage whatever its used for will also use some form of SMB to connect, so you will probably be subject to that memory bug.
I am basing this on the following:
The post says that memory leaks during smb dump. I have it without any additional actions.
Also, it says that the problem appeared when I switched to kernel 6.8, but this problem started when I switched from pve 7.5 to pve 8.1.4 (Linux kernel 6.5.13-1-pve).