Memory Usage Gradually Increases Until System Crash

jabberwocky

New Member
Jun 21, 2024
9
9
3
Hi all. I need help with a memory leak that I'm experiencing. Essentially what I'm seeing is the memory usage gradually increasing over several days until it utilizes nearly 100% of RAM, fills up the swap, and then oom-killer begins killing processes. Eventually the system becomes unresponsive and I have to force restart it. This is with all VMs shutdown and no containers. I do not use ZFS.

Some background, this is on a server I've been running several years with Arch Linux. Its running an AMD Ryzen 7 5700G CPU and 32G of ram. I also have a GPU installed but I'm not currently utilizing it. Note that Proxmox only reports 27.3 GB of available ram for some reason, I've read it might have to do with sharing memory with the GPU? I have 7 HDDs, 1x SSD (not used, old Arch install), and 1x NVME (Proxmox install). All drives including the NVME Proxmox is installed to are encrypted via LUKS and unlocked during boot via SSH. I do not use raid, instead I use a combination of mergerfs and Snapraid. Due to this configuration, I first installed Debian and then installed Proxmox on top of it according to the instructions in the official wiki.

I have 4 VMs setup however I have them turned off while trying to figure this out. While they are on, the same issue occurs however it happens faster. I updated the system yesterday and am running kernel version 6.8.8.1:

Code:
root@europa:/# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
amd64-microcode: 3.20230808.1.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.4-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

Currently the system has been on for a little less than 24 hours and reports 10.4G memory usage:
1718927158456.png

Again, not using ZFS, but I limited it anyway:
Code:
root@europa:/# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=512
options zfs zfs_arc_max=1024

Arcstat:
Code:
root@europa:/# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
21:52:55     0       0     0       0     0      0    0   3.8K   873M    24G

Shortly after boot, this is what 'free -m', top, and meminfo showed:
Code:
root@europa:/# free -m
               total        used        free      shared  buff/cache   available
Mem:           27952        1906       24749          44        1760       26045
Swap:            975           0         975

Code:
top - 21:51:52 up 4 min,  2 users,  load average: 1.05, 1.00, 0.45
Tasks: 390 total,   1 running, 389 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  27952.0 total,  24751.5 free,   1904.9 used,   1760.2 buff/cache
MiB Swap:    976.0 total,    976.0 free,      0.0 used.  26047.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3743 www-data  20   0  384224 156092  11008 S   0.0   0.5   0:00.27 pveproxy worker
   3744 www-data  20   0  381368 155140  12672 S   0.0   0.5   0:00.16 pveproxy worker
   3742 www-data  20   0  377196 149380  10880 S   0.0   0.5   0:00.09 pveproxy worker
   3734 root      20   0  375836 146160   9216 S   0.0   0.5   0:00.06 pvedaemon worke
   3735 root      20   0  375776 146160   9216 S   0.0   0.5   0:00.04 pvedaemon worke
   3733 root      20   0  367700 141552   5376 S   0.0   0.5   0:00.01 pvedaemon worke
   3741 www-data  20   0  368536 140420   3200 S   0.0   0.5   0:00.00 pveproxy
   3732 root      20   0  367288 139244   3328 S   0.0   0.5   0:00.00 pvedaemon
   3755 root      20   0  348784 115880   3072 S   0.0   0.4   0:00.01 pvescheduler
   3740 root      20   0  353056 113496   3712 S   0.0   0.4   0:00.01 pve-ha-crm
   3749 root      20   0  352548 112776   3584 S   0.0   0.4   0:00.02 pve-ha-lrm
   3715 root      20   0  292932 104788   6912 S   0.7   0.4   0:00.27 pvestatd
   3708 root      20   0  291380 100300   3968 S   0.7   0.4   0:00.36 pve-firewall
   3403 root      20   0  520788  56124  48156 S   0.0   0.2   0:00.17 pmxcfs
   3748 www-data  20   0   81304  55168   4480 S   0.0   0.2   0:00.00 spiceproxy work
   3747 www-data  20   0   80788  53628   3200 S   0.0   0.2   0:00.00 spiceproxy
   1259 root      20   0   41252  16388  15492 S   0.0   0.1   0:00.16 systemd-journal
      1 root      20   0  168812  12744   9032 S   0.0   0.0   0:01.03 systemd
   4448 brian     20   0   19768  11264   8832 S   0.0   0.0   0:00.11 systemd
   4426 root      20   0   17800  11008   9472 S   0.0   0.0   0:00.00 sshd
   3354 root      20   0   15412   9216   7936 S   0.0   0.0   0:00.00 sshd
   3169 root      20   0   49764   7936   6912 S   0.0   0.0   0:00.04 systemd-logind
   3679 postfix   20   0   43048   7040   6400 S   0.0   0.0   0:00.00 pickup
   3680 postfix   20   0   43092   7040   6400 S   0.0   0.0   0:00.00 qmgr
   4468 brian     20   0   18060   6860   4992 S   0.0   0.0   0:00.00 sshd
   1283 root      20   0   27504   6652   4604 S   0.0   0.0   0:00.13 systemd-udevd
   3161 root      20   0   12176   6400   4608 S   0.0   0.0   0:00.04 smartd
   4449 brian     20   0  169872   5924   1792 S   0.0   0.0   0:00.00 (sd-pam)
   4496 root      20   0   11652   5376   3200 R   0.3   0.0   0:00.03 top                                                                                                                                         
 3176 root      20   0  101392   5248   4480 S   0.0   0.0   0:00.00 zed
   4472 root      20   0   10008   4852   4352 S   0.0   0.0   0:00.02 sudo
   3678 root      20   0   42656   4632   4096 S   0.0   0.0   0:00.00 master
   4469 brian     20   0    7968   4608   3328 S   0.0   0.0   0:00.00 bash
   3150 message+  20   0    9120   4480   4096 S   0.0   0.0   0:00.02 dbus-daemon
   4474 root      20   0    8976   4096   3712 S   0.0   0.0   0:00.00 su
   3136 _rpc      20   0    7876   3712   3328 S   0.0   0.0   0:00.00 rpcbind
   3155 root      20   0  278156   3712   3456 S   0.0   0.0   0:00.00 pve-lxc-syscall
   4475 root      20   0    7196   3712   3200 S   0.0   0.0   0:00.00 bash
   1615 root      10 -10 1211700   3680   1536 S   0.0   0.0   0:00.00 mergerfs
   3697 root      20   0    6128   3584   3328 S   0.0   0.0   0:00.00 proxmox-firewal
   3328 root      20   0    8828   3456   3200 S   0.0   0.0   0:00.00 lxc-monitord
   3390 root      20   0  317764   3280   2304 S   0.3   0.0   0:00.02 rrdcached
   3380 _chrony   20   0   18860   2960   2432 S   0.0   0.0   0:00.00 chronyd
   3149 root      20   0    5204   2564   2176 S   0.0   0.0   0:00.00 nfsdcld

Code:
root@europa:/# cat /proc/meminfo
MemTotal:       28622864 kB
MemFree:        25345616 kB
MemAvailable:   26673200 kB
Buffers:          421564 kB
Cached:          1184592 kB
SwapCached:            0 kB
Active:          1160564 kB
Inactive:        1409916 kB
Active(anon):    1009600 kB
Inactive(anon):        0 kB
Active(file):     150964 kB
Inactive(file):  1409916 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        999420 kB
SwapFree:         999420 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:        964452 kB
Mapped:            95484 kB
Shmem:             45204 kB
KReclaimable:     197216 kB
Slab:             335536 kB
SReclaimable:     197216 kB
SUnreclaim:       138320 kB
KernelStack:        7168 kB
PageTables:         9096 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15310852 kB
Committed_AS:    4014176 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      101684 kB
VmallocChunk:          0 kB
Percpu:            26624 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      325464 kB
DirectMap2M:     4808704 kB
DirectMap1G:    25165824 kB

Currently these show:
Code:
root@europa:/# free -m
               total        used        free      shared  buff/cache   available
Mem:           27952       10561       15992          47        1866       17390
Swap:            975           0         975

Code:
top - 18:44:28 up 20:57,  2 users,  load average: 0.18, 0.18, 0.13
Tasks: 366 total,   1 running, 365 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  27952.0 total,  16037.8 free,  10515.7 used,   1866.1 buff/cache
MiB Swap:    976.0 total,    976.0 free,      0.0 used.  17436.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3741 www-data  20   0  368592 165380  28160 S   0.0   0.6   0:01.91 pveproxy
  19525 www-data  20   0  381212 154916  12672 S   0.0   0.5   0:01.26 pveproxy worker
  19524 www-data  20   0  381360 154276  11776 S   0.0   0.5   0:01.44 pveproxy worker
  19523 www-data  20   0  377356 151204  12672 S   0.0   0.5   0:01.29 pveproxy worker
   3735 root      20   0  376420 148572  11136 S   0.0   0.5   0:01.67 pvedaemon worke
   3733 root      20   0  376228 148268  10956 S   0.0   0.5   0:01.44 pvedaemon worke
   3734 root      20   0  376412 147420  10112 S   0.0   0.5   0:01.44 pvedaemon worke
   3732 root      20   0  367288 139244   3328 S   0.0   0.5   0:01.27 pvedaemon
   3755 root      20   0  348784 115880   3072 S   0.0   0.4   0:05.34 pvescheduler
   3740 root      20   0  353056 115032   5248 S   0.0   0.4   0:07.76 pve-ha-crm
   3749 root      20   0  352548 113672   4480 S   0.0   0.4   0:11.58 pve-ha-lrm
   3715 root      20   0  292932 106964   9088 S   0.0   0.4   1:54.19 pvestatd
   3708 root      20   0  291380 101708   5376 S   0.0   0.4   2:35.20 pve-firewall
   3403 root      20   0  526992  63428  51620 S   0.0   0.2   0:37.85 pmxcfs
   3747 www-data  20   0   80788  63360  13056 S   0.0   0.2   0:01.46 spiceproxy
  19515 www-data  20   0   81200  55260   4608 S   0.0   0.2   0:01.26 spiceproxy work
   1259 root      20   0   49588  16644  15748 S   0.0   0.1   0:00.34 systemd-journal
      1 root      20   0  168964  12744   9032 S   0.0   0.0   0:01.35 systemd
 152265 brian     20   0   19764  11264   8832 S   0.0   0.0   0:00.10 systemd
 152241 root      20   0   17704  11008   9472 S   0.0   0.0   0:00.00 sshd
   3354 root      20   0   15412   9216   7936 S   0.0   0.0   0:00.00 sshd
   3169 root      20   0   49968   7936   6912 S   0.0   0.0   0:00.14 systemd-logind
   3680 postfix   20   0   43092   7040   6400 S   0.0   0.0   0:00.04 qmgr
 145883 postfix   20   0   43048   6912   6272 S   0.0   0.0   0:00.00 pickup
 152286 brian     20   0   17964   6660   4864 S   0.0   0.0   0:00.00 sshd
   1283 root      20   0   27504   6652   4604 S   0.0   0.0   0:00.21 systemd-udevd
   3161 root      20   0   12176   6400   4608 S   0.0   0.0   0:00.15 smartd
 152266 brian     20   0  170024   5984   1792 S   0.0   0.0   0:00.00 (sd-pam)
   3176 root      20   0  101392   5248   4480 S   0.0   0.0   0:00.00 zed
 152316 root      20   0   11648   5248   3072 R   0.7   0.0   0:00.05 top                                                                                 
152291 root      20   0   10008   4852   4352 S   0.0   0.0   0:00.03 sudo
 152287 brian     20   0    7968   4736   3456 S   0.0   0.0   0:00.00 bash
   3678 root      20   0   42656   4632   4096 S   0.0   0.0   0:00.24 master
   3150 message+  20   0    9120   4480   4096 S   0.0   0.0   0:00.05 dbus-daemon
 152313 root      20   0    8976   4096   3712 S   0.0   0.0   0:00.00 su
 152314 root      20   0    7196   3840   3328 S   0.0   0.0   0:00.00 bash
   3136 _rpc      20   0    7876   3712   3328 S   0.0   0.0   0:00.12 rpcbind
   3155 root      20   0  278156   3712   3456 S   0.0   0.0   0:00.00 pve-lxc-syscall
   1615 root      10 -10 1211700   3680   1536 S   0.0   0.0   0:00.07 mergerfs
   3390 root      20   0  727372   3664   2688 S   0.0   0.0   0:25.87 rrdcached
   3697 root      20   0    6128   3584   3328 S   0.0   0.0   0:00.83 proxmox-firewal

Code:
root@europa:/# cat /proc/meminfo
MemTotal:       28622864 kB
MemFree:        16412800 kB
MemAvailable:   17845136 kB
Buffers:          507348 kB
Cached:          1196104 kB
SwapCached:            0 kB
Active:          1175248 kB
Inactive:        1502532 kB
Active(anon):    1022664 kB
Inactive(anon):        0 kB
Active(file):     152584 kB
Inactive(file):  1502532 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        999420 kB
SwapFree:         999420 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:        974352 kB
Mapped:           111184 kB
Shmem:             48324 kB
KReclaimable:     207732 kB
Slab:             446444 kB
SReclaimable:     207732 kB
SUnreclaim:       238712 kB
KernelStack:        6976 kB
PageTables:        31884 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15310852 kB
Committed_AS:    4042060 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      102568 kB
VmallocChunk:          0 kB
Percpu:            27776 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      325464 kB
DirectMap2M:     4808704 kB
DirectMap1G:    25165824 kB

I also attached the output of /proc/slabinfo, both shortly after booting the system and what it shows currently. I'm not seeing much difference between the two.

Finally, this is what the tmpfs mounts show. There was no change:
Code:
root@europa:/# df -k `grep tmpfs /proc/mounts | awk '{print $2}'`
Filesystem     1K-blocks  Used Available Use% Mounted on
udev            14281072     0  14281072   0% /dev
tmpfs            2862288  1524   2860764   1% /run
tmpfs           14311432 43680  14267752   1% /dev/shm
tmpfs               5120     0      5120   0% /run/lock
tmpfs            2862284     0   2862284   0% /run/user/1000

Any help would be greatly appreciated.
 

Attachments

  • slabinfo_24hrs.txt
    45.2 KB · Views: 0
  • slabinfo_afterBoot.txt
    45.2 KB · Views: 0
Two immediate things pop to mind:

1. Do you use NFS?

Asking because I remember seeing someone else recently reporting a memory leak similar to this, and I think they had identified it as being something to do with NFS.

2. Have you tried dropping back to the earlier (more stable) 6.5 series Linux kernels?

Proxmox recently updated from Linux kernel 6.5 to 6.8. It's caused a bunch of weird and strange errors. It wouldn't be surprising if there's a memory leak among them as well.

If you're wondering how to drop back to the older 6.5 kernels, then the instructions here under "Kernel 6.8" might help:

https://pve.proxmox.com/wiki/Roadmap#Known_Issues_&_Breaking_Changes
 
  • Like
Reactions: Kingneutron
Two immediate things pop to mind:

1. Do you use NFS?


2. Have you tried dropping back to the earlier (more stable) 6.5 series Linux kernels?
I do use NFS to share several bind mounts to the VMs. I haven't come across anything mentioning NFS being a cause yet in my searches, I'll do some more digging and see if I can find something.

I've definitely considered rolling the kernel version back but I didn't know what version I should try or how to do so. Thank you for the link, I'll give it a shot tonight and see what happens.

When did the RAM problems start?
The problems started after installing it for the first time, probably 2 weeks ago now. I didn't use Debian at all before installing Proxmox so I don't know if it existed before installing Proxmox or not.


Currently its up to about 15 GB ram usage and been up about 33 hours.
 
After reading through that thread it definitely sounds like what I'm experiencing, however I was already updated to kernel 6.8.8.1, which I believe includes that fix. So maybe its a similar or related bug?

I decided best thing to try is to roll the kernel back to 6.5. Currently on 6.5.13-5. I went ahead and started my VMs and I'll let you know what happens.

Sincerely, thanks for the help.
 
Hi all. I've been using Proxmox since version 5. Each version had its bugs. But in version 8, a strange phenomenon happens. Proxmox increases memory consumption every day. I have an Hp 360 G9 server with 256GB and a few virtual machines that have a total of 125GB allocated (if each virtual machine consumed all the allocated memory) and it's up for 21 days, during which time the memory has reached 185GB. I mention that I don't use ZFS, NFS, just smb. At version 7 of proxmox I didn't encounter this situation. Now I'm using Proxmox 8.2.2 and today I updated to 8.2.4
 
Unfortunately seems like the problem persists after rolling back the kernel. Memory has been steadily increasing since rebooting it and was at about 80% last I checked last night. The 4 VMs I have running have 19Gb ram assigned to them combined and I have ballooning currently turned off. One VM was using a little over 7GB ram, while the others were under 4GB combined.

At some point during the night it seems that it ran out of memory and killed one of the VMs.

Relevant system log:

Code:
Jun 23 05:24:01 europa kernel: server invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Jun 23 05:24:01 europa kernel: CPU: 14 PID: 3673 Comm: server Tainted: P           O       6.5.13-5-pve #1
Jun 23 05:24:01 europa kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C91/MAG B550 TOMAHAWK (MS-7C91), BIOS A.G0 03/12/2024
Jun 23 05:24:01 europa kernel: Call Trace:
Jun 23 05:24:01 europa kernel:  <TASK>
Jun 23 05:24:01 europa kernel:  dump_stack_lvl+0x48/0x70
Jun 23 05:24:01 europa kernel:  dump_stack+0x10/0x20
Jun 23 05:24:01 europa kernel:  dump_header+0x4f/0x260
Jun 23 05:24:01 europa kernel:  oom_kill_process+0x10d/0x1c0
Jun 23 05:24:01 europa kernel:  out_of_memory+0x270/0x560
Jun 23 05:24:01 europa kernel:  __alloc_pages+0x113c/0x12f0
Jun 23 05:24:01 europa kernel:  alloc_pages+0x90/0x160
Jun 23 05:24:01 europa kernel:  folio_alloc+0x1d/0x60
Jun 23 05:24:01 europa kernel:  filemap_alloc_folio+0xfd/0x110
Jun 23 05:24:01 europa kernel:  __filemap_get_folio+0xd8/0x230
Jun 23 05:24:01 europa kernel:  filemap_fault+0x584/0x9f0
Jun 23 05:24:01 europa kernel:  __do_fault+0x39/0x150
Jun 23 05:24:01 europa kernel:  do_fault+0x266/0x3e0
Jun 23 05:24:01 europa kernel:  __handle_mm_fault+0x6cc/0xc30
Jun 23 05:24:01 europa kernel:  handle_mm_fault+0x164/0x360
Jun 23 05:24:01 europa kernel:  do_user_addr_fault+0x212/0x6a0
Jun 23 05:24:01 europa kernel:  exc_page_fault+0x83/0x1b0
Jun 23 05:24:01 europa kernel:  asm_exc_page_fault+0x27/0x30
Jun 23 05:24:01 europa kernel: RIP: 0033:0x7e831c2898d0
Jun 23 05:24:01 europa kernel: Code: Unable to access opcode bytes at 0x7e831c2898a6.
Jun 23 05:24:01 europa kernel: RSP: 002b:00007e8313ff8678 EFLAGS: 00010206
Jun 23 05:24:01 europa kernel: RAX: 00007e8318c4b000 RBX: 00007e830c40e510 RCX: 0000000000040400
Jun 23 05:24:01 europa kernel: RDX: 00007e8301fe0000 RSI: 0000000000000000 RDI: 000000000000001c
Jun 23 05:24:01 europa kernel: RBP: 0000000000000000 R08: 000000000000001e R09: 0000000000000000
Jun 23 05:24:01 europa kernel: R10: 0000000000000011 R11: 0000000000000293 R12: 00007e831c6c44c8
Jun 23 05:24:01 europa kernel: R13: 00007e831c6c4020 R14: 000000000000001c R15: 0000000000000001
Jun 23 05:24:01 europa kernel:  </TASK>
Jun 23 05:24:01 europa kernel: Mem-Info:
Jun 23 05:24:01 europa kernel: active_anon:826045 inactive_anon:2375372 isolated_anon:0
 active_file:3692 inactive_file:4106 isolated_file:0
 unevictable:0 dirty:35 writeback:0
 slab_reclaimable:34307 slab_unreclaimable:72751
 mapped:6926 shmem:6926 pagetables:18804
 sec_pagetables:78 bounce:0
 kernel_misc_reclaimable:0
 free:44698 free_pcp:0 free_cma:0
Jun 23 05:24:01 europa kernel: Node 0 active_anon:5219360kB inactive_anon:7586308kB active_file:13728kB inactive_file:18192kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:27704kB dirty:140kB writeback:0kB shmem:27704kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 11857920kB writeback_tmp:0kB kernel_stack:7840kB pagetables:75216kB sec_pagetables:312kB all_unreclaimable? no
Jun 23 05:24:01 europa kernel: Node 0 DMA free:11264kB boost:0kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 23 05:24:01 europa kernel: lowmem_reserve[]: 0 3315 27869 27869 27869
Jun 23 05:24:01 europa kernel: Node 0 DMA32 free:106084kB boost:0kB min:8032kB low:11424kB high:14816kB reserved_highatomic:0KB active_anon:21000kB inactive_anon:73104kB active_file:0kB inactive_file:232kB unevictable:0kB writepending:12kB present:3574616kB managed:3455488kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 23 05:24:01 europa kernel: lowmem_reserve[]: 0 0 24553 24553 24553
Jun 23 05:24:01 europa kernel: Node 0 Normal free:61444kB boost:0kB min:59508kB low:84648kB high:109788kB reserved_highatomic:2048KB active_anon:5198300kB inactive_anon:7513264kB active_file:12672kB inactive_file:16168kB unevictable:0kB writepending:128kB present:25660416kB managed:25143180kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 23 05:24:01 europa kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 23 05:24:01 europa kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Jun 23 05:24:01 europa kernel: Node 0 DMA32: 55*4kB (UME) 39*8kB (UME) 32*16kB (UME) 41*32kB (UME) 58*64kB (UME) 53*128kB (UME) 44*256kB (ME) 27*512kB (UME) 17*1024kB (ME) 23*2048kB (UME) 1*4096kB (M) = 106548kB
Jun 23 05:24:01 europa kernel: Node 0 Normal: 4063*4kB (ME) 2092*8kB (UME) 793*16kB (UME) 377*32kB (UME) 53*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61132kB
Jun 23 05:24:01 europa kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 23 05:24:01 europa kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 23 05:24:01 europa kernel: 16260 total pagecache pages
Jun 23 05:24:01 europa kernel: 1323 pages in swap cache
Jun 23 05:24:01 europa kernel: Free swap  = 92kB
Jun 23 05:24:01 europa kernel: Total swap = 999420kB
Jun 23 05:24:01 europa kernel: 7312757 pages RAM
Jun 23 05:24:01 europa kernel: 0 pages HighMem/MovableOnly
Jun 23 05:24:01 europa kernel: 159250 pages reserved
Jun 23 05:24:01 europa kernel: 0 pages hwpoisoned
Jun 23 05:24:01 europa kernel: Tasks state (memory values in pages):
Jun 23 05:24:01 europa kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 23 05:24:01 europa kernel: [   1224]     0  1224    14444       97   118784      160          -250 systemd-journal
Jun 23 05:24:01 europa kernel: [   1249]     0  1249     6835      220    81920      320         -1000 systemd-udevd
Jun 23 05:24:01 europa kernel: [   1585]     0  1585   303819     5948   245760     1952             0 mergerfs
Jun 23 05:24:01 europa kernel: [   3096]     0  3096      615      128    45056        0             0 bpfilter_umh
Jun 23 05:24:01 europa kernel: [   3116]   103  3116     1969      128    53248       96             0 rpcbind
Jun 23 05:24:01 europa kernel: [   3126]     0  3126     1208       96    49152       32             0 blkmapd
Jun 23 05:24:01 europa kernel: [   3128]     0  3128      711       66    49152       32             0 rpc.idmapd
Jun 23 05:24:01 europa kernel: [   3129]     0  3129     1301      161    49152        0             0 nfsdcld
Jun 23 05:24:01 europa kernel: [   3186]   100  3186     2307      192    61440       96          -900 dbus-daemon
Jun 23 05:24:01 europa kernel: [   3192]     0  3192    69539      128    81920       64             0 pve-lxc-syscall
Jun 23 05:24:01 europa kernel: [   3194]     0  3194     3044      320    61440      224             0 smartd
Jun 23 05:24:01 europa kernel: [   3195]     0  3195     1327       64    49152        0             0 qmeventd
Jun 23 05:24:01 europa kernel: [   3200]     0  3200    12486      288   102400       96             0 systemd-logind
Jun 23 05:24:01 europa kernel: [   3202]     0  3202      583       96    40960        0         -1000 watchdog-mux
Jun 23 05:24:01 europa kernel: [   3206]     0  3206    25348       64    77824      224             0 zed
Jun 23 05:24:01 europa kernel: [   3209]     0  3209    38191       64    61440       32         -1000 lxcfs
Jun 23 05:24:01 europa kernel: [   3308]     0  3308     2207       96    53248       64             0 lxc-monitord
Jun 23 05:24:01 europa kernel: [   3320]     0  3320     1610      164    57344       96             0 rpc.mountd
Jun 23 05:24:01 europa kernel: [   3321]   105  3321     1133       73    53248       64             0 rpc.statd
Jun 23 05:24:01 europa kernel: [   3330]     0  3330     1468       96    49152        0             0 agetty
Jun 23 05:24:01 europa kernel: [   3335]     0  3335     3853      256    69632      160         -1000 sshd
Jun 23 05:24:01 europa kernel: [   3361]   102  3361     4715      166    65536        0             0 chronyd
Jun 23 05:24:01 europa kernel: [   3362]   102  3362     2633      142    65536        0             0 chronyd
Jun 23 05:24:01 europa kernel: [   3370]     0  3370   181843      210   167936       64             0 rrdcached
Jun 23 05:24:01 europa kernel: [   3381]     0  3381   129162     8759   364544      256             0 pmxcfs
Jun 23 05:24:01 europa kernel: [   3658]     0  3658    10664      261    77824       96             0 master
Jun 23 05:24:01 europa kernel: [   3660]   104  3660    10773      192    81920       96             0 qmgr
Jun 23 05:24:01 europa kernel: [   3676]     0  3676     1652       64    57344        0             0 cron
Jun 23 05:24:01 europa kernel: [   3677]     0  3677     1532      256    53248        0             0 proxmox-firewal
Jun 23 05:24:01 europa kernel: [   3688]     0  3688    72841    21293   299008     3072             0 pve-firewall
Jun 23 05:24:01 europa kernel: [   3692]     0  3692    73248    21524   327680     3424             0 pvestatd
Jun 23 05:24:01 europa kernel: [   3712]     0  3712    91805    25288   409600     8896             0 pvedaemon
Jun 23 05:24:01 europa kernel: [   3713]     0  3713    94154    26402   462848     8128             0 pvedaemon worke
Jun 23 05:24:01 europa kernel: [   3714]     0  3714    94134    26423   462848     8128             0 pvedaemon worke
Jun 23 05:24:01 europa kernel: [   3715]     0  3715   101072    33390   528384     8064             0 pvedaemon worke
Jun 23 05:24:01 europa kernel: [   3720]     0  3720    88258    23879   356352     4000             0 pve-ha-crm
Jun 23 05:24:01 europa kernel: [   3721]    33  3721    92143    30747   462848     3744             0 pveproxy
Jun 23 05:24:01 europa kernel: [   3727]    33  3727    20194    10368   204800     2400             0 spiceproxy
Jun 23 05:24:01 europa kernel: [   3729]     0  3729    88138    16803   360448    10880             0 pve-ha-lrm
Jun 23 05:24:01 europa kernel: [   3734]     0  3734    87189    21028   356352     7328             0 pvescheduler
Jun 23 05:24:01 europa kernel: [   3875]     0  3875   506770   153390  2510848    46112             0 kvm
Jun 23 05:24:01 europa kernel: [   3998]     0  3998   767325   291889  3567616    40384             0 kvm
Jun 23 05:24:01 europa kernel: [   4123]     0  4123  2413845   469469  5574656    82176             0 kvm
Jun 23 05:24:01 europa kernel: [   4230]     0  4230  2397280  2077249 17805312    28320             0 kvm
Jun 23 05:24:01 europa kernel: [ 214682]    33 214682    20318    10585   196608     2208             0 spiceproxy work
Jun 23 05:24:01 europa kernel: [ 214684]    33 214684    92213    30906   438272     3552             0 pveproxy worker
Jun 23 05:24:01 europa kernel: [ 214685]    33 214685    92213    30874   438272     3552             0 pveproxy worker
Jun 23 05:24:01 europa kernel: [ 214686]    33 214686    92213    30970   438272     3552             0 pveproxy worker
Jun 23 05:24:01 europa kernel: [ 214687]     0 214687    19796       96    53248       32             0 pvefw-logger
Jun 23 05:24:01 europa kernel: [ 242293]   104 242293    10764      288    77824        0             0 pickup
Jun 23 05:24:01 europa kernel: [ 253591]     0 253591    89036    21189   368640     7103             0 pvescheduler
Jun 23 05:24:01 europa kernel: [ 253592]     0 253592    87189    21093   356352     7199             0 pvescheduler
Jun 23 05:24:01 europa kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=pve-cluster.service,mems_allowed=0,global_oom,task_memcg=/qemu.slice/101.scope,task=kvm,pid=4230,uid=0
Jun 23 05:24:01 europa kernel: Out of memory: Killed process 4230 (kvm) total-vm:9589120kB, anon-rss:8308612kB, file-rss:384kB, shmem-rss:0kB, UID:0 pgtables:17388kB oom_score_adj:0
Jun 23 05:24:01 europa kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 23 05:24:01 europa kernel: tap101i0 (unregistering): left allmulticast mode
Jun 23 05:24:01 europa kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 23 05:24:01 europa systemd[1]: 101.scope: A process of this unit has been killed by the OOM killer.
Jun 23 05:24:01 europa systemd[1]: 101.scope: Failed with result 'oom-kill'.
Jun 23 05:24:01 europa systemd[1]: 101.scope: Consumed 1h 34min 56.138s CPU time.
Jun 23 05:24:02 europa qmeventd[253600]: Starting cleanup for 101
Jun 23 05:24:02 europa kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Jun 23 05:24:02 europa kernel: vmbr0: port 5(fwpr101p0) entered disabled state
Jun 23 05:24:02 europa kernel: fwln101i0 (unregistering): left allmulticast mode
Jun 23 05:24:02 europa kernel: fwln101i0 (unregistering): left promiscuous mode
Jun 23 05:24:02 europa kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Jun 23 05:24:02 europa kernel: fwpr101p0 (unregistering): left allmulticast mode
Jun 23 05:24:02 europa kernel: fwpr101p0 (unregistering): left promiscuous mode
Jun 23 05:24:02 europa kernel: vmbr0: port 5(fwpr101p0) entered disabled state
Jun 23 05:24:02 europa qmeventd[253600]: Finished cleanup for 101

In the last 2 hours or so, ram usage on the host as increased about 1.5GB.

1719154994777.png

At this point I'm considering re-installing Proxmox completely. Not sure what else to try, any suggestions?
 
Damn, that sucks.

When the system starts getting towards high memory usage, do the process or processes taking up too much ram show up easily in htop? ie when sorting by resident memory size or similar

Your KVM processes would be at the top of a sorted list, but just under those should be the next biggest consumers. Unless the leak is in the kernel, or if something else is going on.

Oh, what does arcstat show at that point too, just in case it's ZFS arc cache somehow going out of control?
 
At this point I'm considering re-installing Proxmox completely
Maybe it would be better (if possible on your setup) to try installing Proxmox bare-metal directly (not on top of Debian).

You may also want to consider one of the older versions of PVE. ISO's available from here.
 
Last edited:
  • Like
Reactions: justinclift
Here is the current status:
1719168573943.png

Here is what htop shows. I've never been able to find top/htop showing a process consuming the ram, so I'm assuming its kernel related.
1719168635855.png

Arcstat:
Code:
root@europa:/home/brian# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
14:51:17     0       0     0       0     0      0    0   3.8K   873M   2.9G


Maybe it would be better (if possible on your setup) to try installing Proxmox bare-metal directly (not on top of Debian).

You may also want to consider one of the older versions of PVE. ISO's available from here.

I think this might be my next best step.
 
  • Like
Reactions: justinclift
Ouch, yeah that system is having memory problems of some sort. Looks like swap is just about to get filled and have things go badly. :(

With your VMs, what's the combined maximum total amount of memory they're assigned?

That arcstat output is useful too. Looks like it's not a case of ZFS ARC going out of control.

Oh, with the htop output, if you press 'H' (capital H) it'll generally collapse the duplicate lines of user processes and let you see more of what's running.
 
Ouch, yeah that system is having memory problems of some sort. Looks like swap is just about to get filled and have things go badly. :(

With your VMs, what's the combined maximum total amount of memory they're assigned?

That arcstat output is useful too. Looks like it's not a case of ZFS ARC going out of control.

Oh, with the htop output, if you press 'H' (capital H) it'll generally collapse the duplicate lines of user processes and let you see more of what's running.

I never knew that about htop, thanks for the tip. 2x VMs had 8GB assigned, 1x had 2GB, and 1x had 1GB, so 19GB total. What I noticed early on though is that even after shutting down the VMs, the memory usage might drop some but it would always keep increasing back up to where it was until it begins swapping and killing. It didn't seem dependent on the VMs at all, it just happens faster with the VMs running, if that makes sense.

All that being said, I just finished re-installing proxmox directly to the drive (not on top of Debian). I went ahead and rolled the kernel version back to 6.5 as well. I did go ahead and install some packages (nfs-server, vim, htop, ufw, fail2ban), but I'm not going to do anything else for the time being, just wait and see what happens.

Before rolling back the kernel I did notice an increase of about .5GB in memory usage over an hour of sitting idle. Right now, its been up for 10 minutes and ram has increased .1GB, but this might be normal. I'll report back tomorrow and let you know where it stands.

Thanks for your continued help.
 
  • Like
Reactions: justinclift
Good news, I think I've narrowed down the cause of the memory leak. Even though this is a headless server, I have the tower connected to a monitor for troubleshooting/install. Last time I set it up, before installing Proxmox, I installed an old gpu I had lying around and connected the hdmi cord to the gpu. It occurred to me after my last post that it was still connected to the gpu, so I shut it down, moved the hdmi to the motherboard (onboard graphics), and started it up. I had it sitting idle overnight and checked it this morning. The ram usage was steady at 1.41 GB and never increased.

After I saw that, I shut it down again and moved the hdmi cord back to the gpu. After about 4 hours of sitting idle, the ram usage had increased to 3.2 GB.

I don't understand why the gpu would cause a memory leak but I guess the kernel didn't like my old ass gpu. It was a nvidia GTX 960.

I'm going to call this one fixed. Now I get to finish setting up Proxmox and restore my VMs.
 
Guessing you didn't specifically install the nvidia driver for the gpu then?

So that'd probably mean it was a leak caused by the nouveau kernel module.
 
This is still (more) perplexing - because if I understood correctly - the GPU card ( GTX 960 ) was still connected to the MB even when the physical monitor wasn't connected to it via HDMI. I would imagine the leak caused by the card - should still therefore be active. Maybe you disabled it in the BIOS are something? IDK.
 
You understood correctly, I just moved the cable. I didn't change anything in the bios.

I don't really get it either. I guess since the cable was connected and it was outputting to the monitor it was "in use" and eating memory, but when its not outputting a signal its essentially powered off? Maybe the bios disables the GPU if nothing is attached to it? I'm also wondering if when I do pass it thru to a VM if I'll encounter a similar issue.

But it definitely fixed it. Been almost 3 hours since I powered it up last and still only consuming 1.4G.
 
  • Like
Reactions: justinclift

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!