Previous Next | Up | Table of Contents | SpamSieve Home

3   Using SpamSieve With Your E-Mail Client

Before you can use SpamSieve, you must give it some examples of messages you consider to be spam, and ones which you do not. You do this by selecting some messages and then telling SpamSieve whether they are spam or good. SpamSieve collects information from the messages it’s trained with into its corpus, which it uses to predict whether subsequent messages are spam.

For the details of how to train SpamSieve, find the section below that corresponds to your e-mail program. For now, what’s important is that you will train SpamSieve with both good messages and spam messages. Don’t worry; it learns quickly!

First, you’ll train SpamSieve with a batch of messages to get it started recognizing your mail. The Corpus section at the bottom of the Statistics window shows how many good and spam messages SpamSieve has been trained with, and what percentage of them are spam. After the initial training, SpamSieve will automatically train itself, and you’ll only need to train it to correct mistakes.

For the initial training, use as many messages as you have on-hand, subject to two requirements:

Do not use more than 1,000 messages.
Using up to 1,000 recent messages in the initial training lets SpamSieve start out with a high level of accuracy. In general, the more messages you train SpamSieve with, the better its accuracy will be. However, using more than 1,000 messages initially, would “fill up” SpamSieve’s corpus with older messages, making it slower and less effective at adapting to new kinds of spam that you’ll receive in the future.
The messages should be approximately 65% spam.
This means 650 spams out of 1,000 messages, 325 out of 500, or 195 out of 300. For example, if you have 500 good messages but only 400 saved spam messages, don’t train SpamSieve with those 900 messages. This would unbalance the corpus, making SpamSieve inaccurate and slower to learn. Instead, train it with the 400 spams and about 260 representative good messages. This will get it started at a higher level of accuracy. As new messages arrive, SpamSieve will automatically learn from them, keeping its corpus balanced, and its accuracy will improve.

Accuracy will improve with time, but if you’ve used at least 100 or so messages in the initial training, SpamSieve should immediately start moving some of the incoming spam messages to your spam folder. If you don’t see results right away, check the setup in your mail program. After a few hundred messages of each type are in the corpus, SpamSieve should be catching most of your spam.

If SpamSieve marks a good message as spam, you should tell SpamSieve that it is a good message. This lets SpamSieve know that it made a mistake, and also adds the message to the corpus to improve future accuracy. Likewise, if SpamSieve marks a spam message as good, you should tell SpamSieve that it is a spam message (even if you think the message might confuse SpamSieve). If you do not correct SpamSieve when it makes mistakes, its accuracy will deteriorate over time. Also, the sooner you correct SpamSieve, the better; by promptly correcting SpamSieve, you ensure that it’s always acting on accurate information.

If you make a mistake and tell SpamSieve that a message is spam when it is actually good (or vice-versa), simply correct yourself as you would correct SpamSieve. That is, if the message is good, train it as good; if it is spam, train it as spam. SpamSieve will “undo” the previous, incorrect, training.

Previous Next | Up | Table of Contents | SpamSieve Home