BAYES Training

dennis_81

Member
Sep 27, 2021
13
1
8
44
Hello all,

we see in our PMG that more and more explicit spam mails are delivered to the recipients.
Checking the logs, I see that these are getting a score of (-1.9) through BAYES_00.

If I understand this correctly, BAYES should learn which mails are wanted and which are not. But by which parameters does the PMG recognize that? Is there an automatic training?

Can I do anything to influence this and is it possible to reset the rating?

Thank you and best regards
Dennis
 
If I understand this correctly, BAYES should learn which mails are wanted and which are not. But by which parameters does the PMG recognize that? Is there an automatic training?
bayes learning in PMG is using Spamassassin's autolearning feature - see https://cwiki.apache.org/confluence/display/SPAMASSASSIN/BayesFaq (and linked pages)

This can lead to (and quite often does) a wrong scoring over time

Can I do anything to influence this and is it possible to reset the rating?
yes - simply disable the feature in GUI->Configuration->Spam Detector

(you can reenable it afterwards and would start fresh)
I would suggest that you try how the performance is with bayes turned off - most users are quite happy with the detection rates without it

I hope this helps!
 
Sorry for re-aliving this topic - but same Problem here. Bayes is allowing clear spam mail every day - but without Bayes the Detection Rate goes down significantly.

Since you can train Bayes - is there a way to create a spam-learning Mailbox that user can forward their spam to?
 
we see this behaviour also on our mailcleaner instances. nicebayes lets obvious spam pass through, but deactivation results in even more spam.

a spam-learing mailbox would be a great idea like this
- fetch mails automatically from this mailbox
- sa-learn --spam /path/to/spam/folder
 
we see this behaviour also on our mailcleaner instances. nicebayes lets obvious spam pass through, but deactivation results in even more spam.

a spam-learing mailbox would be a great idea like this
- fetch mails automatically from this mailbox
- sa-learn --spam /path/to/spam/folder
i also talked about that and hoped for the new version to bring it up.
Meanwhile i help myself with analyzing clear spam mails and harden it with custom scores
 
Sorry for re-aliving this topic - but same Problem here. Bayes is allowing clear spam mail every day - but without Bayes the Detection Rate goes down significantly.

Since you can train Bayes - is there a way to create a spam-learning Mailbox that user can forward their spam to?
I am getting into this as well.
I suggest making a spam & ham directory somewhere on your PMG. For example,
/root/bayes/spam/ → phishing, scams, junk
/root/bayes/ham/ → legitimate business mail

You need to start saving .eml files of spam mails and legitimate emails and then SCP/SFTP the eml files into their respective folders.
After you got a good sample-size run the commands:
sa-learn --spam /root/bayes/spam/
sa-learn --ham /root/bayes/ham/

To verify that bayes is learning you can run the command:
sa-learn --dump magic

Look for:
  • bayes_token_count
  • bayes_seen
  • bayes_journal_size
If these numbers grow → training is working.

If your bayes is learning poorly, and its blocking legitimate things, you can reset it:
sa-learn --clear

You can cronjob this too if you automate this task:
0 3 * * * sa-learn --spam /root/bayes/spam/ && sa-learn --ham /root/bayes/ham/
You'd need to generate some type of python or bash script that could go out to a designated mailbox of your choosing and download the "Junk/Spam" folder in that mailbox and dump the .eml files into the respective directory on your PMG.

I personally haved used ImapFetch (https://github.com/ansemjo/imapfetch) to download mailboxes and their folders.

I personally haven't automated it myself, I have just begun manually putting .eml files into the folders.

Hope this helps someone.