Once or twice a year I open my corpus, sort by Last Used date and delete the oldest entries. It seems to me that, if a word has not been encountered in over a year, it’s not really needed.
Is there a better way to keep the corpus current?
Also, here are current stats:
Filtered Mail
167,106 Good Messages
6,607 Spam Messages (4%)
7 Spam Messages Per Day
SpamSieve Accuracy
185 False Positives
116 False Negatives (39%)
99.8% Correct
Corpus
8,221 Good Messages
7,622 Spam Messages (48%)
65,498 Total Words
Rules
5,046 Blocklist Rules
9,107 Whitelist Rules
Showing Statistics Since
10/15/06 10:34 AM
Given these statistics, is there any reason to adjust anything? The training instructions say to use 1000 messages maximum, and suggests 65% spam. Is there a recommended way to bring my corpus into compliance with those guidelines or do they only apply to initial training?
Should I just leave things as they are?