teach spam

learjet3204 · Apr 21, 2020

so when a spam e-mail gets through to a mail box.
How do you teach the mail gateway that it was a spam message?

oguz · Apr 22, 2020

hi,

each email has a spam score assigned.
you can check this score in the logs and add a custom rule.

check the documentation for more information on how you can do it[0]

[0]: https://pmg.proxmox.com/pmg-docs/pmg-admin-guide.html#pmgconfig_spamdetector
[1]: https://pmg.proxmox.com/pmg-docs/pmg-admin-guide.html#chapter_mailfilter

heutger · May 10, 2020

Or you use sa-learn via command line on PMG to train mail as spam or ham to improve the bayes filter.

CRCinAU · May 11, 2020

@oguz - Can you confirm, when a spam message is held in the Quarantine, if the "Delete" button is pressed, does this run the message though sa-learn or the spamassassin reporting function before it deletes it?

I've been tweaking my setup - but I'm getting dozens of the same spam message come through - and I'm hoping that the system will learn this if its deleted from quarantine often enough.... Otherwise, I'm probably better off approving it to my own mailbox, then feeding it back via my scripts to be reported / learned from...

Stoiko Ivanov · May 11, 2020

CRCinAU said:
Can you confirm, when a spam message is held in the Quarantine, if the "Delete" button is pressed, does this run the message though sa-learn or the spamassassin reporting function before it deletes it?

The messages are not run through sa-learn when deleting it - PMG relies on the autolearn feature from SpamAssassin.

CRCinAU said:
Otherwise, I'm probably better off approving it to my own mailbox, then feeding it back via my scripts to be reported / learned from...

if you want to fine tune your bayes filtering than that would be the way to go

CRCinAU · May 11, 2020

I wonder if a good feature for the roadmap would be a 'Report' button to go along with the existing that would feed the message via `spamassassin -r` - then delete it?

Stoiko Ivanov · May 11, 2020

CRCinAU said:
I wonder if a good feature for the roadmap would be a 'Report' button to go along with the existing that would feed the message via `spamassassin -r` - then delete it?

The question regarding more integration of bayes filtering does pop every now and then.
Currently it's not on the roadmap - mostly for 2 reasons:
* In our experience in most setups bayes causes more misclassifications than helping with better accuracy (usually due to mistakes when teaching in the filter) - until now I think I haven't seen one mail where bayes filtering caught a spam message which would otherwise go through
* it would help to have some reliable numbers about the effectiveness of a well-trained network (over a larger period of time) - but until now noone could provide some statistics that would speak in favor of bayes filtering

You could open an enhancement request over at https://bugzilla.proxmox.com - maybe more users would join in and maybe somebody could provide some numbers

CRCinAU · May 11, 2020

Yep - I hear you - its hard to have a solid case for an against...

I don't have my old manual config of SA from my old filtering since I switched to PMG - but that had close to 10 years of filtering history... As such, its kinda hard to give solid figures either way...

I believe that it used to catch more - however my current setup only sees ~150-300 messages a day across 3 domains - which means I can't make a change and get an idea of effects in a day or so...

I've only had PMG installed for about a week, but so far, this is the monthly stats:

Then this is just today:

Stoiko Ivanov · May 11, 2020

CRCinAU said:
I don't have my old manual config of SA from my old filtering since I switched to PMG - but that had close to 10 years of filtering history... As such, its kinda hard to give solid figures either way...

depends on where you kept your bayes db in the old system - it could be as easy as enabling bayes filtering, copying it over (see https://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html - the backup+restore option should work when run as root on PMG) and rebooting

that way you could compare the filtering without your db and with it

heutger · May 11, 2020

Stoiko Ivanov said:
depends on where you kept your bayes db in the old system - it could be as easy as enabling bayes filtering, copying it over (see https://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html - the backup+restore option should work when run as root on PMG) and rebooting

that way you could compare the filtering without your db and with it

Would be great to see the results. On my private PMG installation, I have bayes trained and the filter works much better. I was able to reduce the milter-reject (and in future before-queue) reject at a spam level of 5 without any false-positives. So yes, I believe, a well trained bayes filter will help very much. I also tried to recently import "foreign spam" and good much worse spam scores.

CRCinAU · May 12, 2020

I used to use a hard 550 reject level as 5 too... I'm looking at what it would take to do this - but I'm not getting my hopes up...

EDIT: Ah! One of my systems was still using it... A shared mysql db that was used by multiple systems. I've done a --backup and --restore onto PMG... Lets see what the next few days brings...

EDIT2: Reviewing my config on the older mail server, it looks like I was using this custom config for BAYES scoring.. I'm not going to add these yet - as it might change the test somewhat....

Code:

bayes_auto_learn                1                                                                                                                                      
bayes_auto_learn_on_error       1                                                                                                                                      
bayes_auto_learn_threshold_nonspam 0.1                                                                                                                                  
bayes_auto_learn_threshold_spam 6.0                                                                                                                                    
                                                                                                                                                                       
## Basic settings                                                                                                                                                      
required_hits                   5.0                                                                                                                                    
report_safe                     0                                                                                                                                      
                                                                                                                                                                       
## Bayesian scoring                                                                                                                                                    
score BAYES_00                  -3.0                                                                                                                                    
score BAYES_05                  -2.0                                                                                                                                    
score BAYES_20                  -1.5                                                                                                                                    
score BAYES_40                  -1.0                                                                                                                                    
score BAYES_50                  0.8                                                                                                                                    
score BAYES_60                  1.5                                                                                                                                    
score BAYES_80                  2.0                                                                                                                                    
score BAYES_90                  3.0                                                                                                                                    
score BAYES_99                  4.0

CRCinAU · May 13, 2020

So with my imported bayes db, Todays stats show:

As such, it seems to have really firmed up the middle ground compared to before the import...

At this stage, I haven't customised any of the weighting in the PMG setup.

heutger · May 13, 2020

CRCinAU said:
So with my imported bayes db, Todays stats show:
View attachment 17138

As such, it seems to have really firmed up the middle ground compared to before the import...

At this stage, I haven't customised any of the weighting in the PMG setup.

So finally bayes scores are working well for you as for me. For sure, there are no much data about that, as PMG in its usual deployment (of people, who won't ever enter the shell and do any adjustments and also then won't train the bayes database and as mentioned, it takes a long long time to get enough spam auto-learned to get bayes activated, and the really most important ones, the mails, which are not 100% spam or 100% ham are the most interesting as the 100% spam usually would be caught by other rules or blacklists, it's more interesting about the ones, which may be spam or not to get better results on that with bayes scores). I just entered my installation without training, which I set up about two years ago and it still haven't learned enough spam from autolearn, it's currently a count of 140.

CRCinAU · May 13, 2020

From the backup of my old bayes db, I can gather the following:

Code:

v       3       db_version # this must be the first line!!!
v       1840    num_spam
v       11744   num_nonspam

As such, its not really surprising that the detection is better now - as it feels in a history of over 13,000 messages...

heutger · May 13, 2020

CRCinAU said:
From the backup of my old bayes db, I can gather the following:

Code:

v 3 db_version # this must be the first line!!! v 1840 num_spam v 11744 num_nonspam

As such, its not really surprising that the detection is better now - as it feels in a history of over 13,000 messages...

That's it. It's a hard and time consuming job (without mail server behind and scripting to use it as a "learn base", e.g. if you manually import all spam messages on PMG and train them there, but it's exactly my experience as well). This are the values of my private installation, I trained:

0.000 0 3 0 non-token data: bayes db version
0.000 0 450 0 non-token data: nspam
0.000 0 17503 0 non-token data: nham

And that's the count (resulting finally after two years of running in a company environment in no bayes scores been used as the critical volume of spam and ham hasn't been reached yet) of "waiting" on autolearning:

0.000 0 3 0 non-token data: bayes db version
0.000 0 140 0 non-token data: nspam
0.000 0 394566 0 non-token data: nham

So @Stoiko Ivanov as you can see, it really makes sense to provide an option for learning spam and ham to PMG.

Search

Search

teach spam

learjet3204

Member

oguz

Proxmox Retired Staff

heutger

Famous Member

CRCinAU

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

CRCinAU

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

CRCinAU

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

heutger

Famous Member

CRCinAU

Well-Known Member

CRCinAU

Well-Known Member

heutger

Famous Member

CRCinAU

Well-Known Member

heutger

Famous Member