Statistics not updating

First off, thanks for a wonderful product. I have used this for many years now, and I am extremely happy with SpamSieve!

I had a problem with some VIAGRA spam getting through. I saw in my whitelist that somehow some emails with VIAGRA in the email address were in there. I deleted those addresses from the whitelist.

I then decided to restart the statistics, so I could see how my change would affect SpamSieve. (an aside: my change seems to have fixed the Viagra issue).

I had to train three new SPAM messages (not related to VIAGRA), but when I check the statistics, it says that it is 100% on accuracy:

Filtered Mail
119 Good Messages
424 Spam Messages (78%)
297 Spam Messages Per Day

SpamSieve Accuracy
0 False Positives
0 False Negatives
100.0% Correct

Corpus
3,817 Good Messages
12,075 Spam Messages (76%)
711,640 Total Words

Rules
18,114 Blocklist Rules
27,977 Whitelist Rules

Showing Statistics Since
2/15/10 10:53 AM

My question is, why did it not take the 3 messages I manually marked as spam using the TRAIN SPAM script in Entourage as False Negatives?

-Rodney

Something doesn’t sound right. Normally, if SpamSieve thinks a message is good, it will add the sender to the whitelist. If the message turns out to be spam, you would train it as spam, and SpamSieve would disable (uncheck) the whitelist rule. So there should be no need to edit the whitelist yourself. If you had trained the message as spam, the whitelist rule should already have been disabled; deleting it would not help anything and would in fact be counterproductive (since SpamSieve would lose the record of that being a bad rule). If you hadn’t trained the message as spam, the proper thing to do is to train it as spam now.

There are two possibilities:

  1. SpamSieve’s statistics database is damaged, so that it can’t properly cross-reference the incoming messages with the ones that you trained. If this is the case, there would be “Predicted: Good” entries for those three messages in the log. You can reset the statistics database by holding down the Command and Option keys when you launch SpamSieve. That will fix the problem for future messages.
  2. A setup problem in your mail program meant that SpamSieve never examined the messages when they arrived. Since SpamSieve didn’t judge them incorrectly, they’re not false negatives. If this is the case, there will be no “Predicted” entries for those three messages in the log.

Thanks for quick reply!

SpamSieve’s statistics database is damaged, so that it can’t properly cross-reference the incoming messages with the ones that you trained. If this is the case, there would be “Predicted: Good” entries for those three messages in the log. You can reset the statistics database by holding down the Command and Option keys when you launch SpamSieve. That will fix the problem for future messages.

I think this is right, as in the log I do see the “Predicted: Good” entries. I just reset the history part, leaving the corpus alone. I see now my statistics are reset. I am hopeful this corrects the issue.

In regards to your statement about the whitelist, I did train them as spam, but they kept coming in as false negatives, and when I looked in the log, I saw a message like this (NOT AN ACTUAL LOG MESSAGE):

matched rule <From (address) Is Equal to “VIAGRA@xxx.com”> in SpamSieve whitelist

So that is why I manually deleted them from the whitelist. I do notice that some items in the whitelist have checkmarks, and some do not. I take it from what you are telling me is the ones that don’t have checkmarks are the ones that have been manually trained as spam?

-Rodney

OK, next time if there’s a problem please compare with the messages that you trained. When you train a message as spam, if there is an enabled whitelist rule that matches SpamSieve will disable it and note this in the log.

Right.

Michael:

Looks like the statistics is now working fine:

Filtered Mail
50 Good Messages
337 Spam Messages (87%)
142 Spam Messages Per Day

SpamSieve Accuracy
0 False Positives
8 False Negatives
97.9% Correct

Corpus
3,823 Good Messages
12,084 Spam Messages (76%)
711,908 Total Words

Rules
18,127 Blocklist Rules
27,996 Whitelist Rules

Showing Statistics Since
2/15/10 10:53 AM

After one day, is 97.9% a good percentage?

-Rodney

Great!

It’s a bit low. Please send in a report.

Thanks for sending the log file. I think it will help to reset SpamSieve’s corpus, and then re-train it with a smaller number of recent messages.

Okay, so I reset the corpus, trained it with good messages and spam messages, and then reset the history. Here is what i now have:

Filtered Mail
0 Good Messages
0 Spam Messages
0 Spam Messages Per Day

SpamSieve Accuracy
0 False Positives
0 False Negatives
100.0% Correct

Corpus
131 Good Messages
395 Spam Messages (75%)
78,678 Total Words

Rules
18,653 Blocklist Rules
28,146 Whitelist Rules

Showing Statistics Since
2/15/10 10:53 AM

So, is it right that I didn’t change the rules? Just want to make sure I did this correctly.

I have to thank you again for your great product, and your great support!

-Rodney

This doesn’t look good. These numbers should not have decreased after resetting the corpus. Either the statistics database is still damaged, or your setup in the mail program is no longer correct, or you haven’t received any mail since the 15th.

Right.

Isn’t that due to me also resetting the history right after resetting the corpus?
Here is the latest statistics (as you can see I also changed the set date for this morning):

Filtered Mail
6 Good Messages
87 Spam Messages (94%)
509 Spam Messages Per Day

SpamSieve Accuracy
0 False Positives
0 False Negatives
100.0% Correct

Corpus
137 Good Messages
399 Spam Messages (74%)
79,481 Total Words

Rules
18,653 Blocklist Rules
28,146 Whitelist Rules

Showing Statistics Since
2/20/10 10:38 AM

Sorry, I missed that you had reset the history a second time. The statistics look fine now.