PDA

View Full Version : Spam Message percent in 72% versus ideal 65% - now what?



User Name
03-30-2007, 04:32 PM
So, when I look at my stats I see that I have 1030 Spam Messages or 72%. I read that ideal is 65%.

So, do I just leave it, or pick some messages in my saved mail boxes and train them as Good Messages to try to balance it out? As it is, I have 395 Good Messages listed.

Corpus
395 Good Messages
1030 Spam Messages (72%)
56357 Total Words

Michael Tsai
03-30-2007, 07:46 PM
So, when I look at my stats I see that I have 1030 Spam Messages or 72%. I read that ideal is 65%.

So, do I just leave it, or pick some messages in my saved mail boxes and train them as Good Messages to try to balance it out?

If you are starting from scratch, you should train SpamSieve with a 65% ratio of messages. In your case, you already have a corpus, so just leave it as-is. As you correct SpamSieve’s mistakes and it automatically learns from incoming messages, it will keep the corpus properly balanced.

User Name
03-30-2007, 07:57 PM
thanks for reply.

BTW, in a False Negative - does that not mean it flagged something spam that was not spam? If so, my statistics are incorrect as it claims only 4 False Negatives (67%) and ony 2 False Positives which is not true. I had 2 emails today alone that were deleted as spam, but were in fact just fine and that normally do not get filtered out as spam?? And I have had many that got through or time, but should not have ( a False Positive).

And yes, I DO train by using the scripts to Train Good ones that get flagged as spam and Train Bad ones that slip through. So, I would think the Statistics would show this as 2 and 4 are simply not correct and it seems it could only know 2 or 4 based on my training it when it made errors which I have done many more than 6 times.

Michael Tsai
03-30-2007, 08:20 PM
BTW, in a False Negative - does that not mean it flagged something spam that was not spam?


No. A false negative is a spam message that it didn’t catch. A false positive is a good message that it put in the spam folder.



And yes, I DO train by using the scripts to Train Good ones that get flagged as spam and Train Bad ones that slip through. So, I would think the Statistics would show this as 2 and 4 are simply not correct and it seems it could only know 2 or 4 based on my training it when it made errors which I have done many more than 6 times.

Barring any problems due to file corruption, the statistics that SpamSieve reports are correct. However, keep in mind that a message can only be a mistake (i.e. a false negative or a false positive) if SpamSieve examined the message and classified it incorrectly. lf you have your Mail/Entourage rules setup such that SpamSieve is only asked to examine some of the incoming messages, then it is not a mistake if a spam message that SpamSieve wasn’t asked to examine gets through. You may want to look at SpamSieve’s log (http://c-command.com/spamsieve/manual-ah/open-log) to see which messages SpamSieve processed and what it thought (if anything) about the messages that you think were mistakes.