Hi there,
Weird one, forked from this other thread about a similar issue. @Stoiko Ivanov
pmg-smtp-filter after a while has many instances running, and with each taking about half a core, the machine is CPU-overloaded fairly quickly.
We updated 7.x branch to latest (on same final sub-major) and rebooted, plus added more CPU and RAM, then rebooted again, but after 30-90 minutes same thing.
Checked for custom template files and removed one then resynced configs and rebooted. Same thing came back. Updated to 8.0 and Bookworm, thought okay all good, but same thing is back.
Now with the raised core count, it is on average at 60% usage of 6 cores which for current usage (2 cores was fine for 2 years until 9pm last night or so) is too much. Mail processing time today when we logged in to investigate a mail flow issue report, was at 1,500 seconds. Now it is down to almost 100 seconds which is good.
We were trying to do smart things with sa_learn and so on, with being able to have end users forward spam-that-was-not-flagged/quarantined-as-spam to the system, for it to learn that as a spam sample. (side note, is this doable and if so, how?) I feel maybe there is something from that which may be at play.
Nothing is jumping out in the logs. Perhaps we should reboot again then get the timestamp when it begins to flare? Or should it be more obvious?
Images in the previous thread.
Thanks!
Weird one, forked from this other thread about a similar issue. @Stoiko Ivanov
pmg-smtp-filter after a while has many instances running, and with each taking about half a core, the machine is CPU-overloaded fairly quickly.
We updated 7.x branch to latest (on same final sub-major) and rebooted, plus added more CPU and RAM, then rebooted again, but after 30-90 minutes same thing.
Checked for custom template files and removed one then resynced configs and rebooted. Same thing came back. Updated to 8.0 and Bookworm, thought okay all good, but same thing is back.
Now with the raised core count, it is on average at 60% usage of 6 cores which for current usage (2 cores was fine for 2 years until 9pm last night or so) is too much. Mail processing time today when we logged in to investigate a mail flow issue report, was at 1,500 seconds. Now it is down to almost 100 seconds which is good.
We were trying to do smart things with sa_learn and so on, with being able to have end users forward spam-that-was-not-flagged/quarantined-as-spam to the system, for it to learn that as a spam sample. (side note, is this doable and if so, how?) I feel maybe there is something from that which may be at play.
Nothing is jumping out in the logs. Perhaps we should reboot again then get the timestamp when it begins to flare? Or should it be more obvious?
Images in the previous thread.
Thanks!