Using Spam Assassin to filter out spam.

NOTE: Spam Assassin processing is NOT ENABLED by default. If yow want to try it, you'll need to do some setup for yourself. To get started, keep reading...

Summary


Quick Start

If you just want to enable SpamAssassin, and don't care about the details of how it works, do this: Login to a unix machine such as one of the elements machines. Make a $HOME/.spamassassin directory.
	mkdir $HOME/.spamassassin
Give the user mailserver permissions to create and maintain it's auto-whitelist database.
	fs setacl $HOME/.spamassassin mailserver rlidw
Create a file called $HOME/.spamassassin/user_prefs with a minimum of required_hits set.
	cat >> $HOME/.spamassassin/user_prefs
	required_hits           8
	<hit control-D to end>
This will cause any incoming message with a 'score' of 8 or more to have its subject changed to include the word SPAM which you can use with your favorite filter, such as procmail, to redirect to a seperate mailbox.

How Spam Assassin Works

Spam Assassin (SA) looks at an individual mail message, applies a series of tests and generates a "score". These tests look at many different aspects of the message. Most involve looking for particular patterns of characters, words, and/or phrases that are common in SPAM mail. These tests have different "weights", and the final score is generated by summing the weights of all the tests that were matched in the message. This score reflects a "best guess" likelyhood that a particular message is "SPAM".

The higher the score, the more likely it is that the message really is spam. Typical scores range from 0 (or less, negative scores are possible) upto 15 or more. The default cutoff for declaring a message to be spam is a score of 5 or above. This cutoff level is considered "fairly aggressive", meaning there's a real risk that a non-spam message will be mistakenly marked as being spam. On the other hand, setting the level higher would mean that more spam will get through undetected.

I've found a score of 8 works well.

IMPORTANT NOTE: Spam Assassin is not 100% reliable.
It usually does a pretty good job, but it is HIGHLY RECOMMENDED that you DO NOT just discard mail that Spam Assassin has flagged as spam. Rather, save such messages to a separate file or folder where they can be reviewed once a week or so to check for messages that aren't really spam. Likewise, don't be surprised when an occasional spam message sneaks through undetected. By default, after Spam Assassin processes an incoming message, it will put a mini-report into the headers that looks something like this:
	X-Spam-Status: No, hits=3.3 required=7.0
		tests=CLICK_BELOW,CLICK_HERE_LINK
		version=2.21
	X-Spam-Level: ***
This lists the tests that returned positive. In this case, the CLICK_BELOW and CLICK_HERE_LINK tests matched this message. Their scores add up to the 3.3 listed. Because 3.3 is less than the 7.0 required, this is not likely to be SPAM.

Here is a case where the message was over the required threshold. The subject line will be prefixed with "*****SPAM*****". In the headers:
	X-Spam-Status: Yes, hits=8.5 required=8.0
	  tests=BIG_FONT,EXCUSE_14,FREE_MONEY,GREAT_OFFER,
	  HTML_FONT_COLOR_BLUE,HTML_FONT_COLOR_CYAN,
	  HTML_FONT_COLOR_GRAY,HTML_FONT_COLOR_GREEN,
	  HTML_FONT_COLOR_RED,HTML_FONT_COLOR_YELLOW,
	  MIME_LONG_LINE_QP,OFFER_EXPIRE,SPAM_PHRASE_03_05,
	  SUB_FREE_OFFER,WEB_BUGS,WHILE_SUPPLIES
	  version=2.43
	X-Spam-Flag: YES
	X-Spam-Level: ********
	X-Spam-Checker-Version: SpamAssassin 2.43 (1.115.2.20-2002-10-15-exp)
	X-Spam-Prev-Content-Type: multipart/alternative;
	  boundary="---=_NEXT_f426d36af6"
This will be added to the message itself:
	SPAM: -------------------- Start SpamAssassin results ----------------------
	SPAM: This mail is probably spam.  The original message has been altered
	SPAM: so you can recognise or block similar unwanted mail in future.
	SPAM: See http://spamassassin.org/tag/ for more details.
	SPAM: 
	SPAM: Content analysis details:   (8.50 hits, 8 required)
	SPAM: SUB_FREE_OFFER     (0.3 points)  Subject starts with "Free"
	SPAM: FREE_MONEY         (-0.1 points) BODY: Free money!
	SPAM: WHILE_SUPPLIES     (0.3 points)  BODY: While Supplies Last
	SPAM: EXCUSE_14          (2.0 points)  BODY: Tells you how to stop further spam
	SPAM: GREAT_OFFER        (0.2 points)  BODY: Trying to offer you something
	SPAM: OFFER_EXPIRE       (0.1 points)  BODY: Offer Expires
	SPAM: WEB_BUGS           (2.0 points)  BODY: Image tag with an ID code to identify you
	SPAM: SPAM_PHRASE_03_05  (1.1 points)  BODY: Spam phrases score is 03 to 05 (medium)
	SPAM:                    [score: 3]
	SPAM: HTML_FONT_COLOR_YELLOW (0.4 points)  BODY: HTML font color is yellow
	SPAM: HTML_FONT_COLOR_CYAN (0.4 points)  BODY: HTML font color is cyan
	SPAM: HTML_FONT_COLOR_GREEN (0.4 points)  BODY: HTML font color is green
	SPAM: BIG_FONT           (0.3 points)  BODY: FONT Size +2 and up or 3 and up
	SPAM: HTML_FONT_COLOR_GRAY (0.3 points)  BODY: HTML font color is gray
	SPAM: HTML_FONT_COLOR_RED (0.3 points)  BODY: HTML font color is red
	SPAM: HTML_FONT_COLOR_BLUE (0.2 points)  BODY: HTML font color is blue
	SPAM: MIME_LONG_LINE_QP  (0.3 points)  RAW: Quoted-printable line longer than 76 characters
	SPAM: 
	SPAM: -------------------- End of SpamAssassin results ---------------------
As you can see, there are several things that can be used to filter on. One tactic is to filter anything with the *****SPAM***** in the subject to a seperate folder.

Filtering messages Processed by Spam Assassin

IMPORTANT NOTE: Spam Assassin is not 100% reliable.
It usually does a pretty good job, but it is HIGHLY RECOMMENDED that you DO NOT just discard mail that Spam Assassin has flagged as spam. Rather, save such messages to a separate file or folder where they can be reviewed once a week or so to check for messages that aren't really spam. Likewise, don't be surprised when an occasional spam message sneaks through undetected.
Now that Spam Assassin has marked what it believes is spam, you'll filter those marked messages somehow. The steps can be a bit different if you are a unix or a windows users. However, if you follow the unix steps, this will work no matter which email client you use in Unix or Windows.

Unix

In unix, a good tool to use is procmail. There are websites dedicated to procmail tutorials, so I will only show some basics. This is an excerpt from a $HOME/.procmailrc that I use.
	MAILDIR=$HOME/mail      #you'd better make sure it exists
	
	:0 c
	backup

	:0
	* ^Subject:.*\*SPAM\*
	spam
The first line tells procmail what prefix to use for folder paths. This should also have write permissions for mailserver.
	mkdir $HOME/mail
	fs setacl $HOME/mail mailserver write
In procmail, 'recipes' start with :0, so I have two recipies in this example.
The next two lines are there incase I add something to my procmailrc that breaks filtering. It copies (That's what the 'c' is) every message to $HOME/mail/backup I go and clear out things from here every few weeks.
The next recipe has a regular expression that matches any message whose Subject line includes *SPAM*, which is the subject change that Spam Assassin will make. If it matches, it will save that message to $HOME/mail/spam and no further recipes are processed (There is no 'c').
Another idea one might try is matching the X-Spam-Level header. Something like this might work:
	:0
	* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
	extreme-spam
This example saves anything that scores a 15 or higher into $HOME/mail/extreme-spam.

Windows

The step of creating a $HOME/.spamassassin/user_prefs mentioned
at the beginning of the page is still needed, but the sorting and filtering can be done with your mail client if you don't want to mess with unix and procmail. See Russ if you need help with windows client configuration.

Relevant Links