For a while now, my Spamsieve installation hasn’t been catching spam about watches very well. Previously, I was using Powermail, and next to none of them were caught. Now, after migrating to a new Mac this past week, I’ve reset the corpus, set up Spamsieve to work with Mail instead, and set to training.
Now, after about 280 messages trained at 71% spam (I’m working on getting some good messages to help fix that ratio to the 60-percentiles), Spamsieve seems to be getting a lot more of this stupid spam about watches, but still several get through. I have my spam-catching strategy set right in the middle, but thought I’d run it past people here for an opinion before I increase it toward “aggressive.”
Here are the entries from the log for the latest spam about watches:
Predicted: Good (27)
Subject: beautiful watches
From: zenith@earthlink.net
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Reason: P(spam)=0.130[0.500], bias=0.000, S:watches(0.999), rolex(0.999), rolex(0.999), run(0.002), replica(0.998), sell(0.002), replica(0.998), sell(0.002), run(0.002), cases(0.002), browse(0.002), cases(0.002), bands(0.998), bands(0.998), 1000(0.005)
Date: 2009-08-04 05:30:18 -0500Trained: Good (Auto)
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Actions: added rule <From (address) Is Equal to “zenith@earthlink.net”> to SpamSieve whitelist, added rule <From (name) Is Equal to “Chanel Watches”> to SpamSieve whitelist, added to Good corpus (65)
Date: 2009-08-04 05:30:18 -0500Trained: Spam (Manual)
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Actions: disabled rule <From (address) Is Equal to “zenith@earthlink.net”> in SpamSieve whitelist, disabled rule <From (name) Is Equal to “Chanel Watches”> in SpamSieve whitelist, added rule <From (address) Is Equal to “zenith@earthlink.net”> to SpamSieve blocklist, added rule <From (name) Is Equal to “Chanel Watches”> to SpamSieve blocklist, added to Spam corpus (163), removed from Good corpus (64)
Date: 2009-08-04 05:30:45 -0500Mistake: False Negative
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Classifier: Bayesian
Score: 27
Date: 2009-08-04 05:30:51 -0500
The word “watch” and forms of it in the corpus look like this:
Word Spam Good Total Prob. Last Used
watch 14 6 20 0.482 8/4/09
watchers 0 1 1 0.005 8/2/09
powerwatchers.com 0 1 1 0.005 8/2/09
powerwatch 0 1 1 0.005 8/2/09
MI:forums.powerwatchers.com 0 1 1 0.005 8/2/09
watchshop 2 0 2 0.998 8/2/09
watches 17 0 17 1.000 8/4/09
U:watch 1 0 1 0.995 8/2/09
U:proudwatches 1 0 1 0.995 8/2/09
U:logoswatches 1 0 1 0.995 8/2/09
U:graandwatches 2 0 2 0.998 8/2/09
S:watches 8 0 8 0.999 8/4/09
S:watch? 2 0 2 0.998 8/2/09
S:watch! 1 0 1 0.995 8/2/09
S:watch 2 0 2 0.998 8/4/09
rep1icawatches 2 0 2 0.998 8/2/09
Stats since 1/1/2008 are:
Filtered Mail
7,837 Good Messages
39,755 Spam Messages (84%)
68 Spam Messages Per DaySpamSieve Accuracy
8 False Positives
611 False Negatives (99%)
98.7% CorrectCorpus
65 Good Messages
163 Spam Messages (71%)
14,071 Total WordsRules
3,457 Blocklist Rules
10,495 Whitelist RulesShowing Statistics Since
1/1/08 12:01 AM
Anything else needed?