Bayes and Spam Marking - Question

thebiggeek · Aug 5, 2020

So I have a Spam Email coming in from someone, it is evidently spam, while most of my rules blocked the first few emails, the following emails from the same sender are getting through. The Sending server is the same, the Subject and Body are all the same. Here are to SA Scores

SA score=2/5 time=2.729 bayes=0.36 autolearn=no autolearn_force=no hits=BAYES_40(-0.001),DCC_CHECK(1.1),HTML_MESSAGE(0.001),KAM_DMARC_STATUS(0.01),KAM_LAZY_DOMAIN_SECURITY(1),RCVD_IN_DNSWL_NONE(-0.0001),RCVD_IN_SORBS_PROBLEMS(0.5),SPF_HELO_NONE(0.001),SPF_NONE(0.001)

SA score=10/5 time=2.767 bayes=0.00 autolearn=no autolearn_force=no hits=BAYES_00(-1.9),DCC_REPUT_70_89(0.1),HTML_MESSAGE(0.001),KAM_DMARC_STATUS(0.01),KAM_LAZY_DOMAIN_SECURITY(1),KAM_LIST3(11),SPF_HELO_NONE(0.001),SPF_NONE(0.001)

Anyone has inputs on how to fix something like this?

Stoiko Ivanov · Aug 5, 2020

thebiggeek said:
The Sending server is the same, the Subject and Body are all the same. Here are to SA Scores

Somewhere the body needs to differ between both mails - see the hits: The one rule which changes the result quite harshly is:
KAM_LIST3 (which adds 11 points for the second mail, but not for the first mail)
from the description:

Code:

Mailing List Purveyor Spam

from the rule itself it matches the body+headers of an e-mail (the subject needs to contain some mentioning of contact, qualified leads and some such) - and the body also needs to indicate these things
(you can check yourself by searching for KAM_LIST3 in /usr/share/spamassassin-extra/KAM.cf)

I hope this explains it

thebiggeek · Aug 7, 2020

Stoiko Ivanov said:
Somewhere the body needs to differ between both mails - see the hits: The one rule which changes the result quite harshly is:
KAM_LIST3 (which adds 11 points for the second mail, but not for the first mail)
from the description:

Code:

Mailing List Purveyor Spam

from the rule itself it matches the body+headers of an e-mail (the subject needs to contain some mentioning of contact, qualified leads and some such) - and the body also needs to indicate these things
(you can check yourself by searching for KAM_LIST3 in /usr/share/spamassassin-extra/KAM.cf)

I hope this explains it

Thanks, so I am now getting a lot of negative results thanks to AWL and BAYES and am working on improving those. I went through this - thoughtcrimes I have no visibility of the mail, they looked similar. Case in Example I have some personal rules that are giving a high marking for emails that we want to block - having certain keywords, as India related Spam Filters don't exist. They are also getting kicked in, but BAYES and AWL Scores reduce the whole points - Let's Look at a few headers here

SA score=1/5 time=2.692 bayes=0.00 autolearn=no autolearn_force=no hits=AWL(-0.792),
BAYES_00(-1.9),DCC_REPUT_70_89(0.1),DKIM_INVALID(0.1),DKIM_SIGNED(0.1),HEADER_FROM_DIFFERENT_DOMAINS(0.001),HTML_MESSAGE(0.001),HTTPS_HTTP_MISMATCH(0.1),KAM_DMARC_STATUS(0.01),RCVD_IN_DNSWL_NONE(-0.0001),RCVD_IN_MSPIKE_H3(0.001),RCVD_IN_MSPIKE_WL(0.001),S3BMS_HEADER_15(4),SPF_HELO_NONE(0.001),SPF_PASS(-0.001)

Now Here, S3BMS_HEADER_14 has triggered and given it 4 Points, but the -0.792 from AWL and -1.9 from BAYES is hurting a bit.

Similarly

SA score=3/5 time=2.820 bayes=0.00 autolearn=no autolearn_force=no hits=AWL(-3.341),BAYES_00(-1.9),DKIM_SIGNED(0.1),DKIM_VALID(-0.1),DKIM_VALID_AU(-0.1),DKIM_VALID_EF(-0.1),HTML_IMAGE_RATIO_02(0.001),HTML_MESSAGE(0.001),KAM_NUMSUBJECT(0.5),MAILING_LIST_MULTI(-1),RCVD_IN_RP_RNBL(1.31),RCVD_IN_SORBS_PROBLEMS(0.5),S3BMS_BODY_45(4),S3BMS_HEADER_19(4),SPF_HELO_NONE(0.001),SPF_PASS(-0.001)

In the last one, we actually had 2 Rules that Triggered from our Lists giving it 8 Points, and in RNBL gave it 1.31 Points - but AWL and Bayes collectively reduced the points. Is there something I can do to avoid this?

ittk · Aug 7, 2020

thebiggeek said:
Thanks, so I am now getting a lot of negative results thanks to AWL and BAYES and am working on improving those. I went through this - thoughtcrimes I have no visibility of the mail, they looked similar. Case in Example I have some personal rules that are giving a high marking for emails that we want to block - having certain keywords, as India related Spam Filters don't exist. They are also getting kicked in, but BAYES and AWL Scores reduce the whole points - Let's Look at a few headers here

SA score=1/5 time=2.692 bayes=0.00 autolearn=no autolearn_force=no hits=AWL(-0.792),
BAYES_00(-1.9),DCC_REPUT_70_89(0.1),DKIM_INVALID(0.1),DKIM_SIGNED(0.1),HEADER_FROM_DIFFERENT_DOMAINS(0.001),HTML_MESSAGE(0.001),HTTPS_HTTP_MISMATCH(0.1),KAM_DMARC_STATUS(0.01),RCVD_IN_DNSWL_NONE(-0.0001),RCVD_IN_MSPIKE_H3(0.001),RCVD_IN_MSPIKE_WL(0.001),S3BMS_HEADER_15(4),SPF_HELO_NONE(0.001),SPF_PASS(-0.001)

Now Here, S3BMS_HEADER_14 has triggered and given it 4 Points, but the -0.792 from AWL and -1.9 from BAYES is hurting a bit.

Similarly

SA score=3/5 time=2.820 bayes=0.00 autolearn=no autolearn_force=no hits=AWL(-3.341),BAYES_00(-1.9),DKIM_SIGNED(0.1),DKIM_VALID(-0.1),DKIM_VALID_AU(-0.1),DKIM_VALID_EF(-0.1),HTML_IMAGE_RATIO_02(0.001),HTML_MESSAGE(0.001),KAM_NUMSUBJECT(0.5),MAILING_LIST_MULTI(-1),RCVD_IN_RP_RNBL(1.31),RCVD_IN_SORBS_PROBLEMS(0.5),S3BMS_BODY_45(4),S3BMS_HEADER_19(4),SPF_HELO_NONE(0.001),SPF_PASS(-0.001)

In the last one, we actually had 2 Rules that Triggered from our Lists giving it 8 Points, and in RNBL gave it 1.31 Points - but AWL and Bayes collectively reduced the points. Is there something I can do to avoid this?

Have the same effect, since the day BAYES_xx Rules began to work (automatically) one day for SA Scoring, it last serveral month before BAYES_xx Rules went active. I think about to deactivate both AWL and Bayes... or play around with each option.

thebiggeek · Aug 7, 2020

ittk said:
Have the same effect, since the day BAYES_xx Rules began to work (automatically) one day for SA Scoring, it last serveral month before BAYES_xx Rules went active. I think about to deactivate both AWL and Bayes... or play around with each option.

I have for now disabled AWL, not sure about BAYES as that is an Integral Part too, but the BAYES_XXX, lets see - how is it set on your public systems do you have Bayes Enabled / AWL Enabled?

ittk · Aug 7, 2020

thebiggeek said:
I have for now disabled AWL, not sure about BAYES as that is an Integral Part too, but the BAYES_XXX, lets see - how is it set on your public systems do you have Bayes Enabled / AWL Enabled?

Using PMG Defaults. Means both are on. But currently not sure, if they are contraproductive...

thebiggeek · Aug 7, 2020

Noted, would love to hear what @heutger thinks about AWL and BAYES

ittk · Aug 7, 2020

thebiggeek said:
Noted, would love to hear what @heutger thinks about AWL and BAYES

For me its only important to see, if it works for me or not i never used sa-learn to train the Bayes.. so means the System autolearn does not actually an good Job for my site. Maybe just giving BAYES_00 Score 0 would also optimize the results, and turning off AWL, who knows

thebiggeek · Aug 7, 2020

So how are you training Bayes if not by SA Learn? Do you have a hook into all MailBoxes?

ittk · Aug 7, 2020

Nothing, BAYES automatically get active with the BAYES_XX Sa Rules after some month Of incoming email flow. It only the quite high BAYES_00 rule decreasing -1.9 points. That seems to much for my site results... so do you use sa-learn (manually)?

thebiggeek · Aug 7, 2020

No I dont use SA-LEARN manually as I dont' have access to mailboxes

heutger · Aug 28, 2020

My five cents on that: I have AWL and Bayes enabled. Both works fine for me, however, Bayes isn't working for either installation with autolearn. My private installation for sure has much too less mails to ever reach the required volume of spams (ham is no problem). My commercial test installation (recently, my colleagues now did a productive installation and also purchased a license therefor, I would recommend everyone to do so to support the product and future development, also you will get commercial support on the particular plans, which may be required for commercial setups) as well didn't reach the volume and our business is solely online business, so we get much mails, for sure, we are no ISP, but it's a reasonable volume. However, although my blacklists are fine upfront, still some spam come through and as the commercial productive setup is a fresh one, I can see the differences in my recent AWL learning against the current state, so I would mandate for AWL and Bayes are both usable. I'm unsure on TxRep, which is the successor of AWL, but as you seem still to have the options for AWL, I believe, PMG is still using AWL and didn't switch to TxRep (@Stoiko Ivanov are there plans to switch to TxRep anytime in future?).

Stoiko Ivanov · Aug 28, 2020

heutger said:
, I believe, PMG is still using AWL and didn't switch to TxRep

This is correct

heutger said:
are there plans to switch to TxRep anytime in future?

currently not - In our experience from our various support-channels (which might be biased, because people only seek support if something is not working well) - AWL is not always helpful (meaning we've seen quite a few installations where it caused false negatives/false positives).

From a quick view of the wiki-page: https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep it would also need training just as Bayes (which is currently not available in PMG)

heutger · Aug 28, 2020

Stoiko Ivanov said:
This is correct

currently not - In our experience from our various support-channels (which might be biased, because people only seek support if something is not working well) - AWL is not always helpful (meaning we've seen quite a few installations where it caused false negatives/false positives).

From a quick view of the wiki-page: https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep it would also need training just as Bayes (which is currently not available in PMG)

Thanks for response. Training is always somehow required but is also the only way beside blacklists and some settings to improve the filter over time.

Bayes and Spam Marking - Question

thebiggeek

Active Member

Stoiko Ivanov

Proxmox Staff Member

thebiggeek

Active Member

ittk

Member

thebiggeek

Active Member

ittk

Member

thebiggeek

Active Member

ittk

Member

thebiggeek

Active Member

ittk

Member

thebiggeek

Active Member

heutger

Famous Member

Stoiko Ivanov

Proxmox Staff Member

heutger

Famous Member

We value your privacy