Over the past few weeks Spam sieve has been letting more spam through than it has been. Before I might get one a day in my inbox, but now I’m seeing six or seven, sometimes more.
I dug through my log and I noticed something strange. It looks like SpamSieve is sometimes not acknowledging when I train a false positive. I’ve reproduced an example from my log below.
Notice the entries for 2008-03-06, the subject wonyun. The log shows Spam Sieve marking it as good, but there’s no entry that shows where I marked it as a false positive. I mark all spam that gets through as false positives, and I specifically remember marking this one, as I thought the subject was particularly weird. But SS has no record of my marking it, so it kept the address in the whitelist and let other spams with that address through. I looked at my whitelist and saw a number of very suspicious addresses on it, which make me think SS has missed more of the false negatives I’ve trained manually.
This seems to have started when I upgraded to 2.6.6. I’m reverting back to 2.6.5 to see if the problem persists.
=====================================================================
Predicted: Spam (95)
Subject: Designer Footwear from Gucci Prada Chanel & More, buy direct, forget department store prices
From: b6-liner123@acclamation.com
Identifier: mVTrvXQGPjhd/miUyLUAow==
Reason: P(spam)=1.000[0.999], bias=0.000, S:store(0.999), ^bg-fffff3(0.999), S:Chanel(0.999), S:Prada(0.999), S:Gucci(0.999), S:Designer(0.998), F:bernard(0.002), CT:FOOTWEAR6-parta4endd(0.995), ^fb-FOOTWEAR6-parta4endd(0.995), S:direct(0.995), S:Footwear(0.995), ^iw-410(0.995), ^file-name-FOOTWEAR6-parta4endd-gif(0.995), ^ih-327(0.995), ^i5-202d3a69(0.995)
Date: 2008-02-21 05:46:22 -0500
=====================================================================
Predicted: Good (27)
Subject: Re: wonyun
From: -liner123@acclamation.com
Identifier: DqHCJGmC8OOCUSG8bhzJFg==
Reason: P(spam)=0.487[0.492], bias=0.000, F:alston(0.005), R:^201^213(0.995), medication(0.875), R:^3(0.268), to:list-manager@(0.302), MI:^bad-host(0.675), F:sally(0.328), darkness(0.348), helps(0.369), R:^201(0.617), yourself(0.600)
Date: 2008-03-06 08:50:26 -0500
=====================================================================
Trained: Good (Auto)
Subject: Re: wonyun
Identifier: DqHCJGmC8OOCUSG8bhzJFg==
Actions: added rule <From (address) Is Equal to "-liner123@acclamation.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to “alston sally”> to SpamSieve whitelist, added to Good corpus (1941)
Date: 2008-03-06 08:50:26 -0500
=====================================================================
Predicted: Good (1)
Subject: 80% discount. Code #EJ72
From: -liner123@acclamation.com
Identifier: Svz5dI0JZJZQ44pT8mRvFg==
Reason: (
"-liner123@acclamation.com"
) matched rule <From (address) Is Equal to "-liner123@acclamation.com"> in SpamSieve whitelist
Date: 2008-03-08 13:02:51 -0500
=====================================================================
Trained: Good (Auto)
Subject: 80% discount. Code #EJ72
Identifier: Svz5dI0JZJZQ44pT8mRvFg==
Actions: added to Good corpus (1976)
Date: 2008-03-08 13:02:51 -0500
=====================================================================
Trained: Spam (Manual)
Subject: 80% discount. Code #EJ72
Identifier: Svz5dI0JZJZQ44pT8mRvFg==
Actions: disabled rule <From (address) Is Equal to "-liner123@acclamation.com"> in SpamSieve whitelist, added rule <From (address) Is Equal to "-liner123@acclamation.com"> to SpamSieve blocklist, added to Spam corpus (3224), removed from Good corpus (1975)
Date: 2008-03-08 13:08:44 -0500
=====================================================================
Mistake: False Negative
Subject: 80% discount. Code #EJ72
Identifier: Svz5dI0JZJZQ44pT8mRvFg==
Classifier: Whitelist
Score: 1
Date: 2008-03-08 13:08:49 -0500