A few a day starting to get through.. see example

User_Name · January 29, 2008, 5:49pm

I have noticed that recently, 1 or 2 (of the hundreds I get a day) are slipping through. I am not sure why and cannot see why SS is accepting them as good in the first place. I would make a manual rule if I could figure out what to hone in on as there are some common themes in the ones that get through. Here are some of the “reasons” it gives for allowing them…

**Reason: P(spam)=0.536[0.505], bias=0.311, S:Seen(0.998), S:t.v(0.998), tue(0.037), R:^216(0.072), 0600(0.075), F:George(0.078), ^i-semi(0.129), 2008(0.168), MI:solanatrade.com(0.796), jan(0.258), to:@myemailaddress.com(0.698), R:^solanatrade^com(0.694)

Reason: P(spam)=0.872[0.605], bias=0.311, R:^71^172(0.998), S:Better(0.998), 0800(0.028), tue(0.040), ^i-semi(0.129), R:^71(0.861), 2008(0.171), MI:solanatrade.com(0.798), jan(0.259), to:@myemailaddress.com(0.698), R:^solanatrade^com(0.693), S:they(0.334), S:say!(0.334)**

as you see, some common themes BUT what is interesting is some of these themes also run through other messages that are correctly labeled as spam.

Is it possible something in my Corpus is making these come through as good emails?

User_Name · January 29, 2008, 5:56pm

also, thinking that a lot of spam seems to have this solanatrade attribute, I see several mentions in the Corpus, for example one is this:

Word
R:^solanatrade^com
Spam
1499
Good
332
Total
1831
Prob
.693

how/why would there be any Good for this word as it always seems to be a Spam related word just like all their iterations: MI:R:solanatrade.com, RP@solanatrade.com, R:^mail^solanatrade^com -

cant I just edit the corpus to make anything from any of their iterations always be Spam and not Good and make the Good number just 0

Michael_Tsai · January 29, 2008, 5:57pm

Yes, in fact that’s exactly what’s happening. The messages contain words that have, in the past, mostly appeared in your good messages. How many good and spam messages are in your corpus?

User_Name · January 29, 2008, 6:19pm

1247 Good, 2489 Spam = 67%

could you not just edit the Corpus and make every iteration with solanatrade “0” under the Good column? In my emails above, make 332 become 0?
I also notice that the ones getting thorugh contain a URL in the message body like this http: / /google.co.uk/search?hl=en&q=inurl:rneskimo.com%2BVPXL%2BMade%2BEasy&btnI=oscilloscope - so wonder if you could make a rule in Blocklist that has the beginning of that URL http: / /google.co.uk/
also, what about clearing out the corpus of just those words it is hitting on… for example, not sure why so many positive hits on Bigger or Better as neither are common words in emails I write or receive.

Michael_Tsai · January 29, 2008, 8:16pm

Either (a) you received some good messages with that word, or (b) you received some spam messages with that word, SpamSieve thought they were good, and you failed to correct it by training the messages as spam.

You could. However, if the corpus is messed up because of the incorrect training described above, it’s probably going to be too difficult to fix it by editing individual words.

Yes, you could do that. Just make a rule for “Body Contains.”

If there are a lot of words that you think do not have the right numbers in the corpus, it would probably be better to reset the corpus and re-train SpamSieve.

User_Name · January 30, 2008, 8:11am

did this last night and already caught 3 this morning! likely the 3 that would have gotten through had I not

want to avoid this like the plague… spent a lot of time training it correctly and to throw out the baby with the bath water for 2-3 that split through a day is not the answer for me. better is an answer to see what can be adjusted or adding rules to the blocklist.