Hi all. I need help with a memory leak that I'm experiencing. Essentially what I'm seeing is the memory usage gradually increasing over several days until it utilizes nearly 100% of RAM, fills up the swap, and then oom-killer begins killing processes. Eventually the system becomes unresponsive and I have to force restart it. This is with all VMs shutdown and no containers. I do not use ZFS.
Some background, this is on a server I've been running several years with Arch Linux. Its running an AMD Ryzen 7 5700G CPU and 32G of ram. I also have a GPU installed but I'm not currently utilizing it. Note that Proxmox only reports 27.3 GB of available ram for some reason, I've read it might have to do with sharing memory with the GPU? I have 7 HDDs, 1x SSD (not used, old Arch install), and 1x NVME (Proxmox install). All drives including the NVME Proxmox is installed to are encrypted via LUKS and unlocked during boot via SSH. I do not use raid, instead I use a combination of mergerfs and Snapraid. Due to this configuration, I first installed Debian and then installed Proxmox on top of it according to the instructions in the official wiki.
I have 4 VMs setup however I have them turned off while trying to figure this out. While they are on, the same issue occurs however it happens faster. I updated the system yesterday and am running kernel version 6.8.8.1:
Currently the system has been on for a little less than 24 hours and reports 10.4G memory usage:
Again, not using ZFS, but I limited it anyway:
Arcstat:
Shortly after boot, this is what 'free -m', top, and meminfo showed:
Currently these show:
I also attached the output of /proc/slabinfo, both shortly after booting the system and what it shows currently. I'm not seeing much difference between the two.
Finally, this is what the tmpfs mounts show. There was no change:
Any help would be greatly appreciated.
Some background, this is on a server I've been running several years with Arch Linux. Its running an AMD Ryzen 7 5700G CPU and 32G of ram. I also have a GPU installed but I'm not currently utilizing it. Note that Proxmox only reports 27.3 GB of available ram for some reason, I've read it might have to do with sharing memory with the GPU? I have 7 HDDs, 1x SSD (not used, old Arch install), and 1x NVME (Proxmox install). All drives including the NVME Proxmox is installed to are encrypted via LUKS and unlocked during boot via SSH. I do not use raid, instead I use a combination of mergerfs and Snapraid. Due to this configuration, I first installed Debian and then installed Proxmox on top of it according to the instructions in the official wiki.
I have 4 VMs setup however I have them turned off while trying to figure this out. While they are on, the same issue occurs however it happens faster. I updated the system yesterday and am running kernel version 6.8.8.1:
Code:
root@europa:/# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
amd64-microcode: 3.20230808.1.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.4-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
Currently the system has been on for a little less than 24 hours and reports 10.4G memory usage:
Again, not using ZFS, but I limited it anyway:
Code:
root@europa:/# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=512
options zfs zfs_arc_max=1024
Arcstat:
Code:
root@europa:/# arcstat
time read ddread ddh% dmread dmh% pread ph% size c avail
21:52:55 0 0 0 0 0 0 0 3.8K 873M 24G
Shortly after boot, this is what 'free -m', top, and meminfo showed:
Code:
root@europa:/# free -m
total used free shared buff/cache available
Mem: 27952 1906 24749 44 1760 26045
Swap: 975 0 975
Code:
top - 21:51:52 up 4 min, 2 users, load average: 1.05, 1.00, 0.45
Tasks: 390 total, 1 running, 389 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 27952.0 total, 24751.5 free, 1904.9 used, 1760.2 buff/cache
MiB Swap: 976.0 total, 976.0 free, 0.0 used. 26047.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3743 www-data 20 0 384224 156092 11008 S 0.0 0.5 0:00.27 pveproxy worker
3744 www-data 20 0 381368 155140 12672 S 0.0 0.5 0:00.16 pveproxy worker
3742 www-data 20 0 377196 149380 10880 S 0.0 0.5 0:00.09 pveproxy worker
3734 root 20 0 375836 146160 9216 S 0.0 0.5 0:00.06 pvedaemon worke
3735 root 20 0 375776 146160 9216 S 0.0 0.5 0:00.04 pvedaemon worke
3733 root 20 0 367700 141552 5376 S 0.0 0.5 0:00.01 pvedaemon worke
3741 www-data 20 0 368536 140420 3200 S 0.0 0.5 0:00.00 pveproxy
3732 root 20 0 367288 139244 3328 S 0.0 0.5 0:00.00 pvedaemon
3755 root 20 0 348784 115880 3072 S 0.0 0.4 0:00.01 pvescheduler
3740 root 20 0 353056 113496 3712 S 0.0 0.4 0:00.01 pve-ha-crm
3749 root 20 0 352548 112776 3584 S 0.0 0.4 0:00.02 pve-ha-lrm
3715 root 20 0 292932 104788 6912 S 0.7 0.4 0:00.27 pvestatd
3708 root 20 0 291380 100300 3968 S 0.7 0.4 0:00.36 pve-firewall
3403 root 20 0 520788 56124 48156 S 0.0 0.2 0:00.17 pmxcfs
3748 www-data 20 0 81304 55168 4480 S 0.0 0.2 0:00.00 spiceproxy work
3747 www-data 20 0 80788 53628 3200 S 0.0 0.2 0:00.00 spiceproxy
1259 root 20 0 41252 16388 15492 S 0.0 0.1 0:00.16 systemd-journal
1 root 20 0 168812 12744 9032 S 0.0 0.0 0:01.03 systemd
4448 brian 20 0 19768 11264 8832 S 0.0 0.0 0:00.11 systemd
4426 root 20 0 17800 11008 9472 S 0.0 0.0 0:00.00 sshd
3354 root 20 0 15412 9216 7936 S 0.0 0.0 0:00.00 sshd
3169 root 20 0 49764 7936 6912 S 0.0 0.0 0:00.04 systemd-logind
3679 postfix 20 0 43048 7040 6400 S 0.0 0.0 0:00.00 pickup
3680 postfix 20 0 43092 7040 6400 S 0.0 0.0 0:00.00 qmgr
4468 brian 20 0 18060 6860 4992 S 0.0 0.0 0:00.00 sshd
1283 root 20 0 27504 6652 4604 S 0.0 0.0 0:00.13 systemd-udevd
3161 root 20 0 12176 6400 4608 S 0.0 0.0 0:00.04 smartd
4449 brian 20 0 169872 5924 1792 S 0.0 0.0 0:00.00 (sd-pam)
4496 root 20 0 11652 5376 3200 R 0.3 0.0 0:00.03 top
3176 root 20 0 101392 5248 4480 S 0.0 0.0 0:00.00 zed
4472 root 20 0 10008 4852 4352 S 0.0 0.0 0:00.02 sudo
3678 root 20 0 42656 4632 4096 S 0.0 0.0 0:00.00 master
4469 brian 20 0 7968 4608 3328 S 0.0 0.0 0:00.00 bash
3150 message+ 20 0 9120 4480 4096 S 0.0 0.0 0:00.02 dbus-daemon
4474 root 20 0 8976 4096 3712 S 0.0 0.0 0:00.00 su
3136 _rpc 20 0 7876 3712 3328 S 0.0 0.0 0:00.00 rpcbind
3155 root 20 0 278156 3712 3456 S 0.0 0.0 0:00.00 pve-lxc-syscall
4475 root 20 0 7196 3712 3200 S 0.0 0.0 0:00.00 bash
1615 root 10 -10 1211700 3680 1536 S 0.0 0.0 0:00.00 mergerfs
3697 root 20 0 6128 3584 3328 S 0.0 0.0 0:00.00 proxmox-firewal
3328 root 20 0 8828 3456 3200 S 0.0 0.0 0:00.00 lxc-monitord
3390 root 20 0 317764 3280 2304 S 0.3 0.0 0:00.02 rrdcached
3380 _chrony 20 0 18860 2960 2432 S 0.0 0.0 0:00.00 chronyd
3149 root 20 0 5204 2564 2176 S 0.0 0.0 0:00.00 nfsdcld
Code:
root@europa:/# cat /proc/meminfo
MemTotal: 28622864 kB
MemFree: 25345616 kB
MemAvailable: 26673200 kB
Buffers: 421564 kB
Cached: 1184592 kB
SwapCached: 0 kB
Active: 1160564 kB
Inactive: 1409916 kB
Active(anon): 1009600 kB
Inactive(anon): 0 kB
Active(file): 150964 kB
Inactive(file): 1409916 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 999420 kB
SwapFree: 999420 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 96 kB
Writeback: 0 kB
AnonPages: 964452 kB
Mapped: 95484 kB
Shmem: 45204 kB
KReclaimable: 197216 kB
Slab: 335536 kB
SReclaimable: 197216 kB
SUnreclaim: 138320 kB
KernelStack: 7168 kB
PageTables: 9096 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15310852 kB
Committed_AS: 4014176 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 101684 kB
VmallocChunk: 0 kB
Percpu: 26624 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
Unaccepted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 325464 kB
DirectMap2M: 4808704 kB
DirectMap1G: 25165824 kB
Currently these show:
Code:
root@europa:/# free -m
total used free shared buff/cache available
Mem: 27952 10561 15992 47 1866 17390
Swap: 975 0 975
Code:
top - 18:44:28 up 20:57, 2 users, load average: 0.18, 0.18, 0.13
Tasks: 366 total, 1 running, 365 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 1.1 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 27952.0 total, 16037.8 free, 10515.7 used, 1866.1 buff/cache
MiB Swap: 976.0 total, 976.0 free, 0.0 used. 17436.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3741 www-data 20 0 368592 165380 28160 S 0.0 0.6 0:01.91 pveproxy
19525 www-data 20 0 381212 154916 12672 S 0.0 0.5 0:01.26 pveproxy worker
19524 www-data 20 0 381360 154276 11776 S 0.0 0.5 0:01.44 pveproxy worker
19523 www-data 20 0 377356 151204 12672 S 0.0 0.5 0:01.29 pveproxy worker
3735 root 20 0 376420 148572 11136 S 0.0 0.5 0:01.67 pvedaemon worke
3733 root 20 0 376228 148268 10956 S 0.0 0.5 0:01.44 pvedaemon worke
3734 root 20 0 376412 147420 10112 S 0.0 0.5 0:01.44 pvedaemon worke
3732 root 20 0 367288 139244 3328 S 0.0 0.5 0:01.27 pvedaemon
3755 root 20 0 348784 115880 3072 S 0.0 0.4 0:05.34 pvescheduler
3740 root 20 0 353056 115032 5248 S 0.0 0.4 0:07.76 pve-ha-crm
3749 root 20 0 352548 113672 4480 S 0.0 0.4 0:11.58 pve-ha-lrm
3715 root 20 0 292932 106964 9088 S 0.0 0.4 1:54.19 pvestatd
3708 root 20 0 291380 101708 5376 S 0.0 0.4 2:35.20 pve-firewall
3403 root 20 0 526992 63428 51620 S 0.0 0.2 0:37.85 pmxcfs
3747 www-data 20 0 80788 63360 13056 S 0.0 0.2 0:01.46 spiceproxy
19515 www-data 20 0 81200 55260 4608 S 0.0 0.2 0:01.26 spiceproxy work
1259 root 20 0 49588 16644 15748 S 0.0 0.1 0:00.34 systemd-journal
1 root 20 0 168964 12744 9032 S 0.0 0.0 0:01.35 systemd
152265 brian 20 0 19764 11264 8832 S 0.0 0.0 0:00.10 systemd
152241 root 20 0 17704 11008 9472 S 0.0 0.0 0:00.00 sshd
3354 root 20 0 15412 9216 7936 S 0.0 0.0 0:00.00 sshd
3169 root 20 0 49968 7936 6912 S 0.0 0.0 0:00.14 systemd-logind
3680 postfix 20 0 43092 7040 6400 S 0.0 0.0 0:00.04 qmgr
145883 postfix 20 0 43048 6912 6272 S 0.0 0.0 0:00.00 pickup
152286 brian 20 0 17964 6660 4864 S 0.0 0.0 0:00.00 sshd
1283 root 20 0 27504 6652 4604 S 0.0 0.0 0:00.21 systemd-udevd
3161 root 20 0 12176 6400 4608 S 0.0 0.0 0:00.15 smartd
152266 brian 20 0 170024 5984 1792 S 0.0 0.0 0:00.00 (sd-pam)
3176 root 20 0 101392 5248 4480 S 0.0 0.0 0:00.00 zed
152316 root 20 0 11648 5248 3072 R 0.7 0.0 0:00.05 top
152291 root 20 0 10008 4852 4352 S 0.0 0.0 0:00.03 sudo
152287 brian 20 0 7968 4736 3456 S 0.0 0.0 0:00.00 bash
3678 root 20 0 42656 4632 4096 S 0.0 0.0 0:00.24 master
3150 message+ 20 0 9120 4480 4096 S 0.0 0.0 0:00.05 dbus-daemon
152313 root 20 0 8976 4096 3712 S 0.0 0.0 0:00.00 su
152314 root 20 0 7196 3840 3328 S 0.0 0.0 0:00.00 bash
3136 _rpc 20 0 7876 3712 3328 S 0.0 0.0 0:00.12 rpcbind
3155 root 20 0 278156 3712 3456 S 0.0 0.0 0:00.00 pve-lxc-syscall
1615 root 10 -10 1211700 3680 1536 S 0.0 0.0 0:00.07 mergerfs
3390 root 20 0 727372 3664 2688 S 0.0 0.0 0:25.87 rrdcached
3697 root 20 0 6128 3584 3328 S 0.0 0.0 0:00.83 proxmox-firewal
Code:
root@europa:/# cat /proc/meminfo
MemTotal: 28622864 kB
MemFree: 16412800 kB
MemAvailable: 17845136 kB
Buffers: 507348 kB
Cached: 1196104 kB
SwapCached: 0 kB
Active: 1175248 kB
Inactive: 1502532 kB
Active(anon): 1022664 kB
Inactive(anon): 0 kB
Active(file): 152584 kB
Inactive(file): 1502532 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 999420 kB
SwapFree: 999420 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 96 kB
Writeback: 0 kB
AnonPages: 974352 kB
Mapped: 111184 kB
Shmem: 48324 kB
KReclaimable: 207732 kB
Slab: 446444 kB
SReclaimable: 207732 kB
SUnreclaim: 238712 kB
KernelStack: 6976 kB
PageTables: 31884 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15310852 kB
Committed_AS: 4042060 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 102568 kB
VmallocChunk: 0 kB
Percpu: 27776 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
Unaccepted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 325464 kB
DirectMap2M: 4808704 kB
DirectMap1G: 25165824 kB
I also attached the output of /proc/slabinfo, both shortly after booting the system and what it shows currently. I'm not seeing much difference between the two.
Finally, this is what the tmpfs mounts show. There was no change:
Code:
root@europa:/# df -k `grep tmpfs /proc/mounts | awk '{print $2}'`
Filesystem 1K-blocks Used Available Use% Mounted on
udev 14281072 0 14281072 0% /dev
tmpfs 2862288 1524 2860764 1% /run
tmpfs 14311432 43680 14267752 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 2862284 0 2862284 0% /run/user/1000
Any help would be greatly appreciated.