Reset Corpus, Retrained, But...

Back in June I posted about (relatively) poor performance in catching Spam.

It was obvious from the stats I posted then that I needed to reset my corpus and retrain:

Filtered Mail
8,759 Good Messages
19,105 Spam Messages (69%)
37 Spam Messages Per Day

SpamSieve Accuracy
82 False Positives
467 False Negatives (85%)
98.0% Correct

Corpus
3,150 Good Messages
5,101 Spam Messages (62%)
278,600 Total Words

Rules
12,949 Blocklist Rules
10,072 Whitelist Rules

Showing Statistics Since
1/1/08 9:40 PM

So I did so and now five weeks later here are my new stats:

Filtered Mail
551 Good Messages
1,571 Spam Messages (74%)
45 Spam Messages Per Day

SpamSieve Accuracy
4 False Positives
32 False Negatives (89%)
98.3% Correct

Corpus
351 Good Messages
664 Spam Messages (65%)
100,952 Total Words

Rules
1,777 Blocklist Rules
2,083 Whitelist Rules

Showing Statistics Since
6/10/09 2:35 PM

I moved from 98% to 98.3% correct. Michael stated I should be at or above 99%.

Is there something else I need to do?

Also, I use Entourage 2008. I thought I set up Spamsieve correctly many years ago in an earlier version of Entourage (X I think) using the instructions provided but I still get a fair amount of mail, maybe 1 or two an hour, that goes into my Entourage Junk folder. It gets marked as junk instead of spam. I have retrained the items in the Junk folder as Spam using the Applescripts but even then I still see a few similar “junk” mails ending up there. Is that supposed to happen or do I need to tweak something?

There may be a setup problem. Please send in a report.

The statistics say 32 spam messages in the last five weeks. That doesn’t sound like “1 or two an hour,” so it definitely sounds like something is wrong.

What do you mean by that? Spam messages in Entourage are supposed to be marked as “Junk”.

Report
I found the log file and will send it in after posting this.

The stats say 32 spam messages but what does that mean? I get dozens of spam messages every day that go into my Deleted Items folder (I designated that as my spam folder) and I really do get one or two per hour that end up in the Junk folder.

you wrote: [Spam messages in Entourage are supposed to be marked as “Junk”. ]

That may be the problem. When I set this up, I had all Spam marked in red and automatically sent to the Deleted Items folder. So its an easy one step process to view the spam and delete it using an Applescript. But I still get a few spam messages showing up in the Junk folder (marked in purple).

Sorry, I think I misread what you wrote. The stats say there were 32 spam messages that SpamSieve didn’t catch. If SpamSieve is correctly catching one or two spam messages per hour, that’s a good thing.

Well, I don’t know what red and purple signify in your setup. What are the names of the categories? Did you forget to turn off Entourage’s built-in junk mail filter?

The normal SpamSieve setup is to have spam messages marked as Junk (gray) and moved to the “Junk E-mail” folder.

From the log, it looks like most of the spam messages that got through did so in June, when you were re-training SpamSieve. There were only 10 false negatives in July. So the accuracy seems to be improving.

Secondly, there are a whole bunch of messages that you trained as spam but that SpamSieve was never asked to filter. That probably means that there’s a setup problem in Entourage. I cannot tell where the problem is because you didn’t send any screenshots.

Actually, what I said back then was that you should first check the setup to make sure that the problem was actually with the corpus. I’m starting to think that it wasn’t.

Entourage Rules
OK, it looks like the problem is indeed in the Rules order.

I just reordered them so the SpamSieve Rules are now at the top. I also deleted rules so old I forgot what they referred to.

I have four SpamSieve rules - your instructions at 8.1.2 Checking the Entourage Setup mention two rules so maybe the additional two are from a previous setup.

I have, in this order from the top:

Spamsieve - Mark If Spam
Spamsieve - Move If Spam
Spamsieve - Move If Uncertain
Spamsieve - Move Messages

Spamsieve - Mark If Spam is unchecked (but the other three are enabled).

Spamsieve - Move If Uncertain puts messages into the Junk folder (this is probably the culprit.)

Somewhere along the line I may have customized these rules.

Spamsieve - Move If Spam has one “criterion” - it simply runs your spam-finding Applescript on all messages.

Spamsieve - Move If Uncertain finds messages that have been labeled with the category “Uncertain Junk” and puts them into the Junk folder.

Spamsieve - Move Messages finds messages that have been labeled with the category “Junk” and puts them into the Deleted Items folder.

So I guess all I need to know now is…where did I screw up. ; )

I think you can delete the “Spamsieve - Mark If Spam” rule. It sounds like everything will be fine so long as “SpamSieve - Move If Spam” is at the top.