A few days ago I finally got fed up with spam, so I decided to install a spam filter on my server. After reading around a bit, I settled on Spambayes, which (apart from being written in Python) looks a very solid and well maintained project.
I don't use Outlook (I run Linux both on my server and on my desktops), so using Spambayes Outlook plugin was not an option. Since I'm using Maildir as the storage format for both SMTP and IMAP, I initially tried the Spambayes IMAP filter. Unfortunately, the filter is still in its early stage of development, and the IMAP protocol varies significantly among different server implementations. The main problems I had with the IMAP filter were its marking all new messages as read after processing (this is apparently due to my IMAP server lack of support for an obscure IMAP command), and its frequent crashes.
So after a few hours of monitoring the IMAP filter's activity, I decided to change my approach. Reading around a bit, I discovered that the venerable procmail (which I used a lot until five or six years ago) now natively supports Maildirs.
A .qmail forward file, a .procmailrc recipe and a cron job later, I had a flawless Spambayes setup. In the past 3 days, Spambayes has worked admirably with minimal training, intercepting 99% of all spam and generating zero false positives. Definitely recommended.
My .qmail file simply passes everything along to procmail for delivery:
| preline /usr/bin/procmail
My .procmailrc recipe looks like this:
PATH=$HOME/bin:/usr/bin:/bin:/usr/local/bin:.
MAILDIR=$HOME/Maildir
DEFAULT=$MAILDIR/
LOGFILE=$HOME/procmail.log
LOCKFILE=$HOME/.lockmail
:0 fw
| /usr/bin/sb_filter.py
:0
* ^X-SpamBayes-Classification: spam
.INBOX.spambayes.spam/
:0
* ^X-SpamBayes-Classification: unsure
.INBOX.spambayes.unsure/
Notice how the trailing slash in the DEFAULT delivery identifies a Maildir storage. The rest is pretty self-explanatory, apart maybe from the folder namespace, which is the one used by default by my IMAP server:
- the first directive instructs procmail to feed the message to sb_filter.py
- sb_filter processes the message and adds the X-SpamBayes-Classification header with two values, the first one marking the message as either spam/ham/unsure, the second one displaying the exact numeric spam rating (from 0 to 1)
- the second and third directives match the header on its first value for spam and unsure, and deliver the message to the appropriate Maildir
- if a message is not matched by the second or third directive, it "falls off" the chain and gets delivered to DEFAULT, which in this case is my inbox
To train the filter, I run a cron job every half hour that looks into two Maildir folders for spam and ham messages (the following lines are ofc a single crontab line):
0,30 * * * * /usr/bin/sb_mboxtrain.py -d /home/ludo/.spambayes_hammie.db
-g /home/ludo/Maildir/.INBOX.spambayes.train_ham/
-s /home/ludo/Maildir/.INBOX.spambayes.train_spam/
-n >/dev/null 2>&1
Meaning, every half hour cron runs sb_mboxtrain, instructing it to use the .spambayes_hammie.db (previously created with sb_filter.py -n), and to fetch ham messages from the .INBOX.spambayes.train_ham Maildir, and spam messages from the .INBOX.spambayes.train_spam Maildir.
The Maildir directories where spam/unsure messages get delivered, and where you deposit messages to train SpamBayes, can be created either from your mail client or with the command-line utility maildirmake, provided with qmail and courier.
The last piece of information you need before running this setup, is a .spambayesrc file in your home directory. Mine contains the following lines:
[Storage]
persistent_use_database = True
persistent_storage_file = ~/.spambayes_hammie.db
That's all, efficient and reliable spam protection in 5 minutes or so.