SpamSieve Initial Accuracy Questions

Hey Michael,

I’ve been using SpamSieve for a few days now in Apple Mail v 2.1.1. I set everything up according to instructions and have been training it also. Since then, only 64 emails was received on both my email accounts. SpamSieve has been catching spam and moving it into my spam mailbox. However, there has been a number of false positives . . . which I corrected by manually marking it as good.

I am at an 82.8% SpamSieve accuracy, which is pretty low. But is this 82.8% accuracy common, since not a lot of email was received?

I notice that after initial training and letting SpamSieve running on its own, the corpus becomes unbalanced (65% spam becoming 62-64%). Worried, I would try to find more spam to manually mark in order to balance it out.

Any suggestions?

Thanks!


Filtered Mail:

53 Good Messages
11 Spam Messages (17%)
2 Spam Messages Per Day

SpamSieve Accuracy:

9 False Positives
2 False Negatives (18%)
**82.8%**Correct

Corpus:

311 Good Messages
561 Spam Messages (64%)
47, 653 Total Words

Rules:

869 Blocklist Rules
447 Whitelist Rules

Showing Statistics Since:

3/1/07 2:04 AM

Hmm, normally most of SpamSieve’s mistakes are false negatives rather than false positives. And with the number of messages you’ve trained it with, the accuracy should be at least 97%. Did you train it with a representative mix of good messages? How many good and spam messages did you use in the initial training? Please send SpamSieve’s log file to me via e-mail so that I can see what’s happening here.

That’s nothing to worry about. Anywhere in the 60-70% range should be OK.

After the initial training, I recommend only training SpamSieve with messages that it didn’t classify correctly. SpamSieve will automatically maintain the proper balance.

Hey Michael,

Just sent an email with the information requested.

Thanks Again!

Phong

I don’t think there’s anything to worry about. Most of the false positives look like marketing and mass-mailing type messages. As SpamSieve sees more of these messages, it will learn to recognize them as good.