teach spam

learjet3204

Member
Sep 13, 2013
11
0
21
so when a spam e-mail gets through to a mail box.
How do you teach the mail gateway that it was a spam message?
 
@oguz - Can you confirm, when a spam message is held in the Quarantine, if the "Delete" button is pressed, does this run the message though sa-learn or the spamassassin reporting function before it deletes it?

I've been tweaking my setup - but I'm getting dozens of the same spam message come through - and I'm hoping that the system will learn this if its deleted from quarantine often enough.... Otherwise, I'm probably better off approving it to my own mailbox, then feeding it back via my scripts to be reported / learned from...
 
Can you confirm, when a spam message is held in the Quarantine, if the "Delete" button is pressed, does this run the message though sa-learn or the spamassassin reporting function before it deletes it?

The messages are not run through sa-learn when deleting it - PMG relies on the autolearn feature from SpamAssassin.

Otherwise, I'm probably better off approving it to my own mailbox, then feeding it back via my scripts to be reported / learned from...
if you want to fine tune your bayes filtering than that would be the way to go
 
  • Like
Reactions: CRCinAU
I wonder if a good feature for the roadmap would be a 'Report' button to go along with the existing that would feed the message via `spamassassin -r` - then delete it?
 
I wonder if a good feature for the roadmap would be a 'Report' button to go along with the existing that would feed the message via `spamassassin -r` - then delete it?

The question regarding more integration of bayes filtering does pop every now and then.
Currently it's not on the roadmap - mostly for 2 reasons:
* In our experience in most setups bayes causes more misclassifications than helping with better accuracy (usually due to mistakes when teaching in the filter) - until now I think I haven't seen one mail where bayes filtering caught a spam message which would otherwise go through
* it would help to have some reliable numbers about the effectiveness of a well-trained network (over a larger period of time) - but until now noone could provide some statistics that would speak in favor of bayes filtering

You could open an enhancement request over at https://bugzilla.proxmox.com - maybe more users would join in and maybe somebody could provide some numbers
 
Yep - I hear you - its hard to have a solid case for an against...

I don't have my old manual config of SA from my old filtering since I switched to PMG - but that had close to 10 years of filtering history... As such, its kinda hard to give solid figures either way...

I believe that it used to catch more - however my current setup only sees ~150-300 messages a day across 3 domains - which means I can't make a change and get an idea of effects in a day or so...

I've only had PMG installed for about a week, but so far, this is the monthly stats:
1589186163991.png

Then this is just today:
1589186195803.png
 
  • Like
Reactions: Stoiko Ivanov
I don't have my old manual config of SA from my old filtering since I switched to PMG - but that had close to 10 years of filtering history... As such, its kinda hard to give solid figures either way...
depends on where you kept your bayes db in the old system - it could be as easy as enabling bayes filtering, copying it over (see https://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html - the backup+restore option should work when run as root on PMG) and rebooting

that way you could compare the filtering without your db and with it
 
depends on where you kept your bayes db in the old system - it could be as easy as enabling bayes filtering, copying it over (see https://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html - the backup+restore option should work when run as root on PMG) and rebooting

that way you could compare the filtering without your db and with it

Would be great to see the results. On my private PMG installation, I have bayes trained and the filter works much better. I was able to reduce the milter-reject (and in future before-queue) reject at a spam level of 5 without any false-positives. So yes, I believe, a well trained bayes filter will help very much. I also tried to recently import "foreign spam" and good much worse spam scores.
 
I used to use a hard 550 reject level as 5 too... I'm looking at what it would take to do this - but I'm not getting my hopes up...

EDIT: Ah! One of my systems was still using it... A shared mysql db that was used by multiple systems. I've done a --backup and --restore onto PMG... Lets see what the next few days brings...

EDIT2: Reviewing my config on the older mail server, it looks like I was using this custom config for BAYES scoring.. I'm not going to add these yet - as it might change the test somewhat....
Code:
bayes_auto_learn                1                                                                                                                                      
bayes_auto_learn_on_error       1                                                                                                                                      
bayes_auto_learn_threshold_nonspam 0.1                                                                                                                                  
bayes_auto_learn_threshold_spam 6.0                                                                                                                                    
                                                                                                                                                                       
## Basic settings                                                                                                                                                      
required_hits                   5.0                                                                                                                                    
report_safe                     0                                                                                                                                      
                                                                                                                                                                       
## Bayesian scoring                                                                                                                                                    
score BAYES_00                  -3.0                                                                                                                                    
score BAYES_05                  -2.0                                                                                                                                    
score BAYES_20                  -1.5                                                                                                                                    
score BAYES_40                  -1.0                                                                                                                                    
score BAYES_50                  0.8                                                                                                                                    
score BAYES_60                  1.5                                                                                                                                    
score BAYES_80                  2.0                                                                                                                                    
score BAYES_90                  3.0                                                                                                                                    
score BAYES_99                  4.0
 
Last edited:
So with my imported bayes db, Todays stats show:
1589349697469.png

As such, it seems to have really firmed up the middle ground compared to before the import...

At this stage, I haven't customised any of the weighting in the PMG setup.
 
So with my imported bayes db, Todays stats show:
View attachment 17138

As such, it seems to have really firmed up the middle ground compared to before the import...

At this stage, I haven't customised any of the weighting in the PMG setup.

So finally bayes scores are working well for you as for me. For sure, there are no much data about that, as PMG in its usual deployment (of people, who won't ever enter the shell and do any adjustments and also then won't train the bayes database and as mentioned, it takes a long long time to get enough spam auto-learned to get bayes activated, and the really most important ones, the mails, which are not 100% spam or 100% ham are the most interesting as the 100% spam usually would be caught by other rules or blacklists, it's more interesting about the ones, which may be spam or not to get better results on that with bayes scores). I just entered my installation without training, which I set up about two years ago and it still haven't learned enough spam from autolearn, it's currently a count of 140.
 
From the backup of my old bayes db, I can gather the following:
Code:
v       3       db_version # this must be the first line!!!
v       1840    num_spam
v       11744   num_nonspam

As such, its not really surprising that the detection is better now - as it feels in a history of over 13,000 messages...
 
From the backup of my old bayes db, I can gather the following:
Code:
v       3       db_version # this must be the first line!!!
v       1840    num_spam
v       11744   num_nonspam

As such, its not really surprising that the detection is better now - as it feels in a history of over 13,000 messages...

That's it. It's a hard and time consuming job (without mail server behind and scripting to use it as a "learn base", e.g. if you manually import all spam messages on PMG and train them there, but it's exactly my experience as well). This are the values of my private installation, I trained:

0.000 0 3 0 non-token data: bayes db version
0.000 0 450 0 non-token data: nspam
0.000 0 17503 0 non-token data: nham

And that's the count (resulting finally after two years of running in a company environment in no bayes scores been used as the critical volume of spam and ham hasn't been reached yet) of "waiting" on autolearning:

0.000 0 3 0 non-token data: bayes db version
0.000 0 140 0 non-token data: nspam
0.000 0 394566 0 non-token data: nham

So @Stoiko Ivanov as you can see, it really makes sense to provide an option for learning spam and ham to PMG.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!