How to train for spam

Here's another good example of BAYES catching stuff that other rules did not:

Code:
X-SPAM-LEVEL: Spam detection results:  7
    BAYES_99                  3.5 Bayes spam probability is 99 to 100%
    BAYES_999                 0.2 Bayes spam probability is 99.9 to 100%
    HTML_MESSAGE            0.001 HTML included in message
    KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
    KHOP_HELO_FCRDNS        0.398 Relay HELO differs from its IP's reverse DNS
    RCVD_IN_RP_RNBL          1.31 Relay in RNBL, https://senderscore.org/blacklistlookup/
    RDNS_DYNAMIC            0.982 Delivered to internal network by host with dynamic-looking rDNS
    SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
    SPF_SOFTFAIL            0.665 SPF: sender does not match SPF record (softfail)

Right now, I have my hard reject level set at 9 - but as I get more confidence in results, I'm aiming to bring that number down.
 
So I got a bit sick of having no real solution to this - so I hacked something together....

My path structure is /vmail/$user/ for email. I use a Spam folder which ends up at /vmail/$user/.Spam/cur and /vmail/$user/.Spam/new

Set up an SSH public / private key to allow you to log into your PMG from your mail server. I won't cover this here - the instructions are a google search away if required.

On your mail server end, add the following script to your cron:

Bash:
#!/bin/bash
MAILFILTER=10.1.1.94

for i in /vmail/*/.Spam/cur/* /vmail/*/.Spam/new/*; do
        if [ -f "$i" ]; then
                STATUS=`file "$i"`
                if [[ $STATUS == *"gzip"* ]]; then
                        gunzip -d -c "$i" > /tmp/tempmail.$$
                fi
                if [[ $STATUS == *"bzip2"* ]]; then
                        bzip2 -d -c "$i" > /tmp/tempmail.$$
                fi
                if [[ $STATUS == *"SMTP mail"* ]]; then
                        cp "$i" /tmp/tempmail.$$
                fi

                cat /tmp/tempmail.$$ | ssh root@$MAILFILTER report
                if [ $? != 0 ]; then
                        echo "Error running sa-learn. Aborting."
                        exit 1
                fi
                rm -f "$i"
                rm -f /tmp/tempmail.$$
        fi
done

On the PMG end, add `command="/root/bin/spam-reporter"` to your public key in /root/.ssh/authorized_keys.

Put this on your PMG as /root/bin/spam-reporter:
Bash:
#!/bin/sh
case "$SSH_ORIGINAL_COMMAND" in
        report)
                sa-learn --spam
                ;;
        revoke)
                sa-learn --ham
                ;;
        *)
                echo "Wwwwhat?"
                ;;
esac

What will happen is that your mail server will end up reporting each spam message via sa-learn on the PMG system - which should allow your bayes filter to learn your spam a little better...


Trying to setup for my zimbra mail, would the concept be the same? but as zimbra uses a different way to store emails
 
Trying to setup for my zimbra mail, would the concept be the same? but as zimbra uses a different way to store emails

You would need to adapt the mail server side script (not the PMG side) to reflect the correct paths / formats of your message store - whatever that may be...

The concept itself should still work however...
 
  • Like
Reactions: killmasta93