[SOLVED] PMG eats CPU and pmg-smtp-filter child is just spamming

facyber

New Member
Sep 9, 2020
23
5
3
Hello,

In the pas few days out of nowhere PMG CPU just started to be overwhelmed, 100% most of the times. First I thought it was low on resources as I searched through the forum and somebody mentioned to check that. Server had 2 CPU and 4GB of RAM, so I double it. And after that it was fine for a short time, then it started again. I was again searching through forum before posting in order to find if somebody had similar issue.

What I noticed is that service pmg-smtp-filter is just spamming itself, child processes. At this moment systemctl status pmg-smtp-filter have 17 childs and htop says those are the ones that eats all the memory. If I restart it and check status again I see the following:

Code:
May 20 09:50:05 pmg-server systemd[1]: Starting Proxmox SMTP Filter Daemon...
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Process Backgrounded
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: 2021/05/20-09:50:06 main (type Net::Server::PreFork) starting! pid(17063)
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Binding to TCP port 10023 on host 127.0.0.1 with IPv4
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Binding to TCP port 10024 on host 127.0.0.1 with IPv4
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Group Not Defined.  Defaulting to EGID '0'
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: User Not Defined.  Defaulting to EUID '0'
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Setting up serialization via flock
May 20 09:50:06 pmg-server pmg-smtp-filter[17063]: Filter daemon (re)started (max. 40 processes)
May 20 09:50:06 pmg-server systemd[1]: Started Proxmox SMTP Filter Daemon

I tried to search in syslog for issues so I did cat /var/log/syslog | grep WARNING and find couple of interesting messages but not sure what to do with them or if they are related to the issue:

Code:
pmgpolicy[13441]: WARNING: Pid_file created by this same process. Doing nothing.
pmg-smtp-filter[5916]: WARNING: Use of uninitialized value $score in concatenation (.) or string at /usr/share/perl5/PMG/RuleDB/Spam.pm line 432.
pmg-smtp-filter[5916]: WARNING: Use of uninitialized value in sprintf at /usr/bin/pmg-smtp-filter line 210.
pmg-smtp-filter[5756]: WARNING: rules: failed to run PYZOR_CHECK test, skipping:#012#011(refcnt: fd -1 < 0#012)

I also checked postgresql logs:

2021-05-19 13:04:59.724 CEST [30795] LOG: received fast shutdown request
2021-05-19 13:04:59.760 CEST [30795] LOG: aborting any active transactions
2021-05-19 13:04:59.760 CEST [1836] root@Proxmox_ruledb FATAL: terminating connection due to administrator command
2021-05-19 13:04:59.788 CEST [30795] LOG: background worker "logical replication launcher" (PID 30802) exited with exit code 1
2021-05-19 13:04:59.827 CEST [30797] LOG: shutting down
2021-05-19 13:05:00.099 CEST [30795] LOG: database system is shut down
2021-05-19 13:07:20.793 CEST [580] LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-05-19 13:07:20.810 CEST [580] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-05-19 13:07:21.114 CEST [596] LOG: database system was shut down at 2021-05-19 13:05:00 CEST
2021-05-19 13:07:21.162 CEST [580] LOG: database system is ready to accept connections
2021-05-19 13:07:21.713 CEST [648] [unknown]@[unknown] LOG: incomplete startup packet
2021-05-19 13:07:24.909 CEST [679] root@Proxmox_ruledb ERROR: constraint "localstat_time_key" of relation "localstat" does not exist
2021-05-19 13:07:24.909 CEST [679] root@Proxmox_ruledb STATEMENT: ALTER TABLE LocalStat DROP CONSTRAINT localstat_time_key
2021-05-19 17:04:29.700 CEST [580] LOG: received fast shutdown request
2021-05-19 17:04:29.713 CEST [580] LOG: aborting any active transactions
2021-05-19 17:04:29.719 CEST [580] LOG: background worker "logical replication launcher" (PID 627) exited with exit code 1
2021-05-19 17:04:29.719 CEST [622] LOG: shutting down
2021-05-19 17:04:29.914 CEST [580] LOG: database system is shut down
2021-05-19 17:06:44.639 CEST [598] LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-05-19 17:06:44.651 CEST [598] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-05-19 17:06:45.094 CEST [662] LOG: database system was shut down at 2021-05-19 17:04:29 CEST
2021-05-19 17:06:45.145 CEST [598] LOG: database system is ready to accept connections
2021-05-19 17:06:45.693 CEST [677] [unknown]@[unknown] LOG: incomplete startup packet
2021-05-19 17:06:50.338 CEST [701] root@Proxmox_ruledb ERROR: constraint "localstat_time_key" of relation "localstat" does not exist
2021-05-19 17:06:50.338 CEST [701] root@Proxmox_ruledb STATEMENT: ALTER TABLE LocalStat DROP CONSTRAINT localstat_time_key
2021-05-19 17:10:07.017 CEST [598] LOG: received fast shutdown request
2021-05-19 17:10:07.029 CEST [598] LOG: aborting any active transactions
2021-05-19 17:10:07.038 CEST [598] LOG: background worker "logical replication launcher" (PID 668) exited with exit code 1
2021-05-19 17:10:07.038 CEST [663] LOG: shutting down
2021-05-19 17:10:07.447 CEST [598] LOG: database system is shut down
2021-05-19 17:10:24.756 CEST [593] LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-05-19 17:10:24.849 CEST [593] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-05-19 17:10:24.973 CEST [611] LOG: database system was shut down at 2021-05-19 17:10:07 CEST
2021-05-19 17:10:25.123 CEST [593] LOG: database system is ready to accept connections
2021-05-19 17:10:25.676 CEST [669] [unknown]@[unknown] LOG: incomplete startup packet
2021-05-19 17:10:28.498 CEST [698] root@Proxmox_ruledb ERROR: constraint "localstat_time_key" of relation "localstat" does not exist
2021-05-19 17:10:28.498 CEST [698] root@Proxmox_ruledb STATEMENT: ALTER TABLE LocalStat DROP CONSTRAINT localstat_time_key
2021-05-19 17:17:42.813 CEST [11799] root@Proxmox_ruledb ERROR: constraint "localstat_time_key" of relation "localstat" does not exist
2021-05-19 17:17:42.813 CEST [11799] root@Proxmox_ruledb STATEMENT: ALTER TABLE LocalStat DROP CONSTRAINT localstat_time_key
2021-05-19 17:18:50.364 CEST [13235] root@Proxmox_ruledb ERROR: constraint "localstat_time_key" of relation "localstat" does not exist
2021-05-19 17:18:50.364 CEST [13235] root@Proxmox_ruledb STATEMENT: ALTER TABLE LocalStat DROP CONSTRAINT localstat_time_key

I then found some of errors from the logs are fixed in some of the previous version of the PMG, so I upgraded to the latest PMG 6.4-4 and issue again was solved for a short time, few hourse maybe. This morning when I came it was again for hourse 100% CPU and again pmg-smtp-filter.

Here is results of pmgversion -v

Code:
proxmox-mailgateway: 6.4-1 (API: 6.4-4/1c8a73c7, running kernel: 5.4.73-1-pve)
pmg-api: 6.4-4
pmg-gui: 2.4-2
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
clamav-daemon: 0.103.2+dfsg-0+deb10u1
libarchive-perl: 3.3.3-1
libjs-extjs: 6.0.1-10
libjs-framework7: 4.4.7-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-http-server-perl: 3.1-1
libxdgmime-perl: 0.01-5
lvm2: 2.03.02-3
pmg-docs: 6.4-1
pmg-log-tracker: 2.1.9-1
postgresql-11: 11.12-0+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-spamassassin: 3.4.6-1
proxmox-widget-toolkit: 2.5-1
pve-firmware: 3.2-2
pve-xtermjs: 4.7.0-3
zfsutils-linux: 2.0.4-pve1

EDIT:
I also did this morning pmgconfig sync --restart 1 and systemctl restart postfix and for now it looks stable, but that could be temporerally only. And forgot to mention that we didn't made any specific changes that should cause this in the past days.


If there is something else I can check or provide please do tell me.
 
Last edited:
pmg-smtp-filter[5916]: WARNING: Use of uninitialized value $score in concatenation (.) or string at /usr/share/perl5/PMG/RuleDB/Spam.pm line 432.
pmg-smtp-filter[5916]: WARNING: Use of uninitialized value in sprintf at /usr/bin/pmg-smtp-filter line 210.
this would incidate a problem with SpamAssassin and its rules:
* do you have any modifications to the PMG configuraiton (especially custom SA rules or scores)?
* what's the status of the rule-updates? (GUI->Configuration->Spam Detector->Status?
* do you have enabled pyzor - and if yes how?
 
Hi Stoiko,

1. and 3. Yes I do, but I was trying to go 100% with the recommendations in a term to not change original files, but always to use templates.

[% IF pmg.spam.rbl_checks %]
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
[% END %]

# Hashcash - perform hashcash verification.
#
loadplugin Mail::SpamAssassin::Plugin::Hashcash

[% IF pmg.spam.rbl_checks %]
loadplugin Mail::SpamAssassin::Plugin::SPF
[% END %]

# always load dkim to improve accuracy
loadplugin Mail::SpamAssassin::Plugin::DKIM

loadplugin Mail::SpamAssassin::Plugin::Pyzor
use_pyzor 1


loadplugin Mail::SpamAssassin::Plugin::DCC
dcc_path /usr/local/bin/dccproc
dcc_home /var/dcc
dcc_dccifd_path /var/dcc/dccifd
dcc_body_max 999999
dcc_fuz1_max 999999
dcc_fuz2_max 999999
use_dcc 1
dcc_timeout 10

In /etc/pmg/templates/v320.pre.in I enabled Rule2XSBody.

whitelist_bounce_relays $OUR_RIPE_IP_RANGE

# Additional score for DNSEval
ifplugin Mail::SpamAssassin::Plugin::DNSEval
header RCVD_IN_BRBL eval:check_rbl('brbl-lastexternal','bb.barracudacentral.org.')
describe RCVD_IN_BRBL Received via a relay in Barracuda BRBL
tflags RCVD_IN_BRBL net
score RCVD_IN_BRBL 5.0
endif #ifplugin Mail::SpamAssassin::Plugin::DNSEval

# Custom rules that block mostly spam adds
body NO_SMARTWATCH_AD /While all her friends were positive that Mary had a sixth sens/i
score NO_SMARTWATCH_AD 20
describe NO_SMARTWATCH_AD Blocks annoying junks that promote smartwatchs

body NO_SMARTWATCH_AD_2 /Mary had a sixth sens/i
score NO_SMARTWATCH_AD_2 20
describe NO_SMARTWATCH_AD_2 Blocks annoying junks that promote smartwatchs

body NO_SMARTWATCH_AD_3 /of dogs eating cat food but not cats eating dog food/i
score NO_SMARTWATCH_AD_3 20
describe NO_SMARTWATCH_AD_3 Blocks annoying junks that promote smartwatchs

2) We also have some custom scores that we imported from the previous mail protection solution. It worked so far.

score ANY_BOUNCE_MESSAGE 1.000
score BAYES_00 -1.500
score BAYES_05 -0.100
score BAYES_20 -0.010
score BAYES_40 -0.010
score BAYES_50 2.500
score BAYES_95 5.000
score BAYES_99 7.500
score BAYES_999 10.000
score BOUNCE_MESSAGE 1.000
score CRBOUNCE_MESSAGE 1.000
score FROM_STARTS_WITH_NUMS 1.500
score FRT_OPPORTUN2 1.000
score KAM_BODY_URIBL_PCCC 1.000
score KAM_DOMAIN 2.000
score KAM_LINKBAIT 0.500
score KAM_STOCKTIP 1.500
score MSGID_MULTIPLE_AT 0.000
score RCVD_IN_BL_SPAMCOP_NET 5.000
score RCVD_IN_DNSWL_HI -100.000
score RCVD_IN_DNSWL_LOW -2.000
score RCVD_IN_DNSWL_MED -5.000
score RCVD_IN_PBL 2.500
score RDNS_NONE 0.500
score SPF_SOFTFAIL 0.500
score UNPARSEABLE_RELAY 0.500
score UNWANTED_LANGUAGE_BODY 1.500
score URIBL_BLACK 4.500
score URIBL_JP_SURBL 4.500
score URIBL_PH_SURBL 4.500
score URIBL_SBL 4.500
score URIBL_WS_SURBL 4.500
score VBOUNCE_MESSAGE 1.000
 
Hm - one suggestion would be to lint your spamassassin config:
Code:
spamassassin -D all --lint  2>&1 |less
and check the output for potential problems with the config
At least that's where I'd start.
 
Hmm going through it I see only something like this that is interesting but I doubt that is the issue, as there is a lot of these:

Code:
May 20 11:30:09.075 [25603] dbg: config: warning: score set for non-existent rule SCHAALIT_URI_114
May 20 11:30:09.075 [25603] dbg: config: warning: score set for non-existent rule DKIMDOMAIN_IN_DWL_UNKNOWN
May 20 11:30:09.076 [25603] dbg: config: warning: score set for non-existent rule KAM_BODY_URIBL_PCCC
May 20 11:30:09.116 [25603] dbg: config: warning: no description set for SCHAALIT_URI_234
May 20 11:30:09.116 [25603] dbg: config: warning: no description set for SCHAALIT_URI_237
May 20 11:30:09.116 [25603] dbg: config: warning: no description set for SCHAALIT_URI_235

If I do additional grep on "fail" for example I get this:

Code:
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: Digest::SHA1 ('require' failed)
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: GeoIP2::Database::Reader ('require' failed)
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: IP::Country::DB_File ('require' failed)
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: Net::Patricia ('require' failed)
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: BSD::Resource ('require' failed)
May 20 11:42:33.505 [26538] dbg: diag: [...] optional module not installed: IO::String ('require' failed)

Now I checked again syslog and it seems I forgot to include some logs in original post:

pmg-smtp-filter[24859]: WARNING: internal error, python traceback seen in response: Traceback (most recent call last):\n File "/usr/bin/pyzor", line 408, in <module>\n main()\n File "/usr/bin/pyzor", line 152, in main\n if not dispatch(client, servers, config):\n File "/usr/bin/pyzor", line 237, in check\n for digested in get_input_handler(style):\n File "/usr/bin/pyzor", line 174, in _get_input_msg\n msg = email.message_from_file(sys.stdin)\n File "/usr/lib/python3.7/email/__init__.py", line 54, in message_from_file\n return Parser(*args, **kws).parse(fp)\n File "/usr/lib/python3.7/email/parser.py", line 54, in parse\n data = fp.read(8192)\n File "/usr/lib/python3.7/codecs.py", line 322, in decode\n (result, consumed) = self._buffer_decode(data, self.errors, final)\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1318: invalid start byte at /usr/share/perl5/Mail/SpamAssassin/Plugin/Pyzor.pm line 308.
pmg-smtp-filter[5895]: WARNING: cannot unlink /tmp/.spamassassin5895xolmHctmp: No such file or directory at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 3180
pmg-smtp-filter[5895]: WARNING: cannot unlink /tmp/.spamassassin5895eJV5cItmp: No such file or directory at /usr/share/perl5/Mail/SpamAssassin/Message.pm line 699.
pmg-smtp-filter[5902]: WARNING: cannot unlink /tmp/.spamassassin5902aK4wcJtmp: No such file or directory at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 3180
pmg-smtp-filter[5902]: WARNING: cannot unlink /tmp/.spamassassin5902FJmsrdtmp: No such file or directory at /usr/share/perl5/Mail/SpamAssassin/Message.pm line 699.
pmg-smtp-filter[26986]: WARNING: Use of uninitialized value $ip in pattern match (m//) at /usr/share/perl5/PMG/RuleCache.pm line 245.

Monitoring graphs still looks okay for the past hour, ever sync I sync pmg config and restarted postfix (but I did restarted postfix only yesterday and there were no changes).
 
I forgot to mention also that the main issue is that mails are late, when I check tracking center, I see this for almost all mails as they are in queued/deferred status.

delivery temporarily suspended: conversation with 127.0.0.1[127.0.0.1] timed out while receiving the initial server greeting
 
May 20 11:30:09.075 [25603] dbg: config: warning: score set for non-existent rule DKIMDOMAIN_IN_DWL_UNKNOWN
These messages are not uncommon - and I would not worry too much about them..
May 20 11:30:09.116 [25603] dbg: config: warning: no description set for SCHAALIT_URI_234
* This is not in the shipped rulesets (and not in any of our update-channels (standard spamassassin-updates and KAM ruleset updates)
* IIRC there was a ruleset with that name, but its not maintained anymore? - you might consider removing these rules...

Now I checked again syslog and it seems I forgot to include some logs in original post:
Yes this is the most likely cause of the issues you're experiencing:
* pyzor seems to have a problem on your machine (or rather with one particular mail, which contains a non utf-8 byte
** a quick search yields this: https://github.com/SpamExperts/pyzor/issues/64 - but since the last release of pyzor happened around 7 years ago I would not expect a fixed version soon.
* pyzor erroring out causes SpamAssassin to get wrong data, which in turn causes pmg-smtp-filter to exit due to an error
This also explains the longer wait-times for mails

I would disable pyzor, at least until the problematic mail is processed.

* Does pyzor have a significant benefit in your environment (during my last tests it did not help too much)? - If not I'd keep it disabled

I hope this helps!
 
I disabled pyzor soon after my last report and it does seem better again, at least for now. I will then keep watching it for a few days.

I also found some people complained about same issue but yes, those were years ago.
* pyzor seems to have a problem on your machine (or rather with one particular mail, which contains a non utf-8 byte
Hah wasn't aware of that, will try to search for it if there some maybe included in some dev or test mails.

* Does pyzor have a significant benefit in your environment (during my last tests it did not help too much)? - If not I'd keep it disabled

To be honest, I am not sure it does provide big extra security, as I tried to make PMG similar to our old setup (with PMG improvements of course), like ClamAV Unofficial, DCC, pyzor, etc. Will watch then and on Monday I will mark this thread as solved if disabling Pyzor indeed helped.

Thanks for the help.
 
  • Like
Reactions: Stoiko Ivanov
Hi Stoikov,

Just a small final update. Ever since Pyzor is disabled, no more high CPU spikes so indeed it was causing the issue. And indeed I don't see much bigger security enabling it, I guess other filters are good enough. Thanks for the support.

For everyone else just in case you need to disable it also:

sudo apt purge pyzor

nano /etc/pmg/templates/init.pre.in

Then remove the lines

Code:
loadplugin Mail::SpamAssassin::Plugin::Pyzor
use_pyzor 0

Then sync config and restart pmg-smtp-filter

Code:
pmgconfig sync --restart 1
systemctl restart pmg-smtp-filter
 
  • Like
Reactions: Stoiko Ivanov
In our case we had 2 years of all-good, then suddenly at 9pm last night pmg-smtp-filter had 20+ children, and the server actual CPU was overutilised by 1,000% and beyond (ie. 2 cores had load of 30-40). Multiple nodes, same condition on each.

Tripled CPU from 2 cores to 6 per each, doubled RAM from 8GB to 16GB, just in case as these were low-ish despite relatively small mail volumes. Then went down the pyzor rabbit hole but no evidence of existence.

Applied updates and rebooted, same to apply the resource increases. Load low, then started to rise again, then back to similar condition to before despite updates and resource increases.

From there, removed all template files in /etc/pmg/templates/ and then synced config back in, then restarted pmg-smtp-filter via systemctl.

So far, there is only 1x instance of pmg-smtp-filter on each machine at 100% CPU for that process, and server loads remain OK.

If it occurs again, we will keep seeking the cause and report back with updates. Hope the above helps to some degree.
 
Same problem now returned on both nodes after some hours following 7.3 update to latest minor/build, then major up to 8.0.

EDIT: Changelog says the below. So via Admin > Configuration > Spam Detector > Use Bayesian filter > is now OFF rather than ON. Load OK.

SpamAssassin's naive-Bayesian-style classifier and the auto-whitelisting plugin are now disabled by default.

Screenshot 2023-10-05 at 2.51.32 pm.png
Screenshot 2023-10-05 at 2.52.27 pm.png
 
Last edited:
Same problem now returned on both nodes after some hours following 7.3 update to latest minor/build, then major up to 8.0.

EDIT: Changelog says the below. So via Admin > Configuration > Spam Detector > Use Bayesian filter > is now OFF rather than ON. Load OK.
Not sure if bayes in SpamAssassin causes that load (would not remember running into such a thing) - is there anything in the journal for the timeframe?
(please consider opening a new thread instead of replying to one 2 years old - you can mention me with @Stoiko Ivanov )
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!