Proxmox mailgateway on LXC memory usage

Jul 6, 2020
8
0
1
54
The OOM killer kills the CLAM daemon on the proxmox mail gateway.

Bash:
root@antispam1:~# dmesg -T | egrep -i 'killed process'
[Mon Jun 15 11:02:06 2020] Memory cgroup out of memory: Killed process 195913 (clamd) total-vm:1939452kB, anon-rss:1387168kB, file-rss:0kB, shmem-rss:0kB, UID:100108 pgtables:3328kB oom_score_adj:0
[Mon Jun 15 12:37:25 2020] Memory cgroup out of memory: Killed process 3915135 (clamd) total-vm:1652460kB, anon-rss:1224264kB, file-rss:0kB, shmem-rss:0kB, UID:100108 pgtables:2680kB oom_score_adj:0
[Mon Jun 15 13:16:24 2020] Memory cgroup out of memory: Killed process 780253 (clamd) total-vm:1504944kB, anon-rss:1204088kB, file-rss:0kB, shmem-rss:0kB, UID:100108 pgtables:2592kB oom_score_adj:0

The LXC container has 8Gb RAM which is twice the recommended amount. My problem is, Proxmox VE insists there is no memory shortage:

image.png

What causes proxmox mail gateway to use so much memory that the OOM killer is invoked? Why does proxmox VE not properly monitor the memory demand in LXC containers?
 
anything relevant in the logs? usually PMG is quite happy with 2.5G ( 1.4 being used by clamd) - so that seems odd if the container indeed has 8 G of ram

clamd is chosen by the oom-killer because it is the single process using up most of the memory.
 
define relevant ;). Here is syslog


Code:
Jun 15 10:58:45 antispam1 pmgmirror[430]: starting cluster syncronization
Jun 15 10:58:45 antispam1 pmgmirror[430]: cluster syncronization finished  (0 errors, 0.14 seconds (files 0.11, database 0.03, config 0.00))
Jun 15 10:59:11 antispam1 clamd[357]: LibClamAV Warning: cli_tnef: file truncated, returning CLEAN
Jun 15 11:00:03 antispam1 systemd[1]: Starting Hourly Proxmox Mail Gateway activities...
Jun 15 11:00:03 antispam1 systemd[1]: Started Session 24446 of user root.
Jun 15 11:00:03 antispam1 systemd[1]: session-24446.scope: Succeeded.
Jun 15 11:00:03 antispam1 systemd[1]: Started Session 24447 of user root.
Jun 15 11:00:03 antispam1 systemd[1]: session-24447.scope: Succeeded.
Jun 15 11:00:03 antispam1 systemd[1]: Reloading Proxmox Mail Gateway Policy Daemon.
Jun 15 11:00:03 antispam1 systemd[1]: Reloaded Proxmox Mail Gateway Policy Daemon.
Jun 15 11:00:06 antispam1 systemd[1]: pmg-hourly.service: Succeeded.
Jun 15 11:00:06 antispam1 systemd[1]: Started Hourly Proxmox Mail Gateway activities.
Jun 15 11:00:37 antispam1 freshclam[351]: Received signal: wake up
Jun 15 11:00:37 antispam1 freshclam[351]: ClamAV update process started at Mon Jun 15 11:00:37 2020
Jun 15 11:00:37 antispam1 freshclam[351]: Received signal: wake up
Jun 15 11:00:37 antispam1 freshclam[351]: ClamAV update process started at Mon Jun 15 11:00:37 2020
Jun 15 11:00:37 antispam1 freshclam[351]: WARNING: Your ClamAV installation is OUTDATED!
Jun 15 11:00:37 antispam1 freshclam[351]: WARNING: Local version: 0.102.2 Recommended version: 0.102.3
Jun 15 11:00:37 antispam1 freshclam[351]: DON'T PANIC! Read https://www.clamav.net/documents/upgrading-clamav
Jun 15 11:00:37 antispam1 freshclam[351]: Your ClamAV installation is OUTDATED!
Jun 15 11:00:37 antispam1 freshclam[351]: daily.cvd database is up to date (version: 25843, sigs: 2618912, f-level: 63, builder: raynman)
Jun 15 11:00:37 antispam1 freshclam[351]: main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
Jun 15 11:00:37 antispam1 freshclam[351]: bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
Jun 15 11:00:37 antispam1 freshclam[351]: Local version: 0.102.2 Recommended version: 0.102.3
Jun 15 11:00:37 antispam1 freshclam[351]: safebrowsing.cvd database is up to date (version: 49191, sigs: 2213119, f-level: 63, builder: google)
Jun 15 11:00:37 antispam1 freshclam[351]: DON'T PANIC! Read https://www.clamav.net/documents/upgrading-clamav
Jun 15 11:00:37 antispam1 freshclam[351]: daily.cvd database is up to date (version: 25843, sigs: 2618912, f-level: 63, builder: raynman)
Jun 15 11:00:45 antispam1 pmgmirror[430]: starting cluster syncronization
Jun 15 11:00:45 antispam1 pmgmirror[430]: cluster syncronization finished  (0 errors, 0.15 seconds (files 0.11, database 0.03, config 0.00))
Jun 15 11:01:54 antispam1 pmgdaemon[484]: successful auth for user 'support@pmg'
Jun 15 11:02:03 antispam1 systemd[1]: Started Session 24448 of user root.
Jun 15 11:02:03 antispam1 systemd[1]: session-24448.scope: Succeeded.
Jun 15 11:02:03 antispam1 systemd[1]: Started Session 24449 of user root.
Jun 15 11:02:03 antispam1 systemd[1]: session-24449.scope: Succeeded.
Jun 15 11:02:05 antispam1 systemd[1]: clamav-daemon.service: Main process exited, code=killed, status=9/KILL
Jun 15 11:02:05 antispam1 systemd[1]: clamav-daemon.service: Failed with result 'signal'.
Jun 15 11:02:45 antispam1 pmgmirror[430]: starting cluster syncronization

Here is current memory usage (lxc container has now 12Gb mem):
Code:
top - 13:59:38 up 20 days, 23:17,  1 user,  load average: 3.97, 4.66, 4.70
Tasks: 139 total,   2 running, 137 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  12288.0 total,   2203.0 free,   5118.2 used,   4966.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   7169.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    339 clamav    20   0 1978296   1.5g   9096 S   0.7  12.6 374:15.73 clamd
    479 root      20   0 1544376   1.4g  11704 S   0.0  11.4   1:46.52 pmgdaemon worke
    475 root      20   0  686692 575792  11540 S   0.0   4.6   1:12.56 pmgdaemon worke
     56 root      20   0  457228 382604 242788 S   0.0   3.0  31:50.85 systemd-journal
 954248 root      20   0  341716 262152  12212 S   0.0   2.1   0:04.95 pmg-smtp-filter
 954241 root      20   0  340804 261408  12240 S   8.6   2.1   0:06.14 pmg-smtp-filter
 954307 root      20   0  340232 260688  12352 S   0.0   2.1   0:03.41 pmg-smtp-filter
 954314 root      20   0  340200 260596  12208 S   0.0   2.1   0:03.41 pmg-smtp-filter
    433 root      20   0  327760 246996  10776 S   0.0   2.0   6:22.07 pmg-smtp-filter
 263571 www-data  20   0  285632 202352  12216 S   0.0   1.6   0:24.13 pmgproxy worker
 267742 www-data  20   0  241132 159252  12252 S   0.0   1.3   0:19.80 pmgproxy worker
    477 root      20   0  200616 118780  11340 S   0.0   0.9   0:57.65 pmgdaemon worke
    563 www-data  20   0  177048 111544  17552 S   0.0   0.9   0:21.92 pmgproxy
 824202 www-data  20   0  189376 108212  11904 S   0.0   0.9   0:02.85 pmgproxy worker
 
hmm - the pmgdaemon worker using 1.4g is odd (at least I haven't seen something like it until now) - anything specific you do with the REST API?
how is the memory usage if you restart the pmgdaemon service (does it always climb to 1.4g)?
 
Bash:
root@antispam1:~# ps aux | grep pmgdaemon
root         474  0.0  0.7 176728 95156 ?        Ss   Jun15   0:22 pmgdaemon
root         475  0.0  4.5 686692 575956 ?       S    Jun15   1:18 pmgdaemon worker
root         477  0.0  0.9 200616 118780 ?       S    Jun15   1:02 pmgdaemon worker
root         479  0.0 11.4 1544376 1439520 ?     S    Jun15   1:51 pmgdaemon worker
root@antispam1:~# systemctl restart pmgdaemon
root@antispam1:~# ps aux | grep pmgdaemon
root     1499289  0.0  0.7 176740 96940 ?        Ss   16:07   0:00 pmgdaemon
root     1499290  0.0  0.7 177008 97576 ?        S    16:07   0:00 pmgdaemon worker
root     1499291  0.0  0.7 177008 97576 ?        S    16:07   0:00 pmgdaemon worker
root     1499292  0.0  0.7 177008 97576 ?        S    16:07   0:00 pmgdaemon worker
We don't do anything with a REST-API. Restarting the daemon helps freeing memory. I'll keep an eye on the pmgdaemon the next few days. Any thoughts about then proxmox pve not monitoring the memory demand of the LXC container?
 
Code:
root@antispam1:~# ps aux | grep pmgdaemon
root     1499289  0.0  0.8 176796 107320 ?       Ss   Jul08   0:05 pmgdaemon
root     1687392  0.0  0.8 189548 107652 ?       S    Jul09   0:10 pmgdaemon worker
root     1687393  0.0  0.8 189384 107752 ?       S    Jul09   0:07 pmgdaemon worker
root     1687394  0.0  0.8 178024 102696 ?       S    Jul09   0:08 pmgdaemon worker
 
so the memory usage remains sane on pmgdaemon - did you get another OOM kill ?
See the graph in the first post. pve says RAM usage for the LXC container is 4GB; then why is the OOM killer started?
it can happen that something inside the container started using much more memory in a very short timeframe - so that it did not get picked up by the graph...

if the OOM kills happen again - a bit of context of the logs could be helpful (i.e. more than just the OOM lines, and also the journal inside the container)
 
it can happen that something inside the container started using much more memory in a very short timeframe - so that it did not get picked up by the graph...
And what would that be? A big email? "It can happen something" does not help me building trust in this product.
bit of context of the logs could be helpful (i.e. more than just the OOM lines, and also the journal inside the container)
I posted a complete log; the postfix log messages are not relevant or suitable to be posted online.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!