Significantly more Obvious False Negatives than Usual

I am getting 3-4 False Negatives a day, up from about 1 a week. Most are obvious, and the log seems to think they were whitelisted. I would never whitelist such obvious spam, and am stumped as to how they got there. I am VERY careful about correcting every mistake, and I run about 98% accurate, with 70-100 spam messages a day. False Negatives are at 24%. I run SpamSieve on a single machine letting it do all the filtering for my iOS devices and laptop.

Example, showing the erroneous “Predicted Good”, "Trained Good (Auto) and the log from when I corrected the error:

=====================================================================
Predicted: Good (1)
Subject: Your Barbecue On STEROIDS
From: yoshigrill70@riddlequipmet.com
Identifier: zjl6NWXeda6uUXhNiw+RDw==
Reason: (
“Yoshi Grill”
) matched rule <From (name) Is Equal to “Yoshi Grill”> in SpamSieve whitelist
Date: 2015-08-03 14:06:11 -0500 (CDT)

Trained: Good (Auto)
Subject: Your Barbecue On STEROIDS
From: yoshigrill70@riddlequipmet.com
Identifier: zjl6NWXeda6uUXhNiw+RDw==
Actions: added rule <From (address) Is Equal to "yoshigrill70@riddlequipmet.com"> to SpamSieve whitelist, added rule <From (address) Is Equal to "yoshigrill19@riddlequipmet.com"> to SpamSieve whitelist, added to Good corpus (4778)
Date: 2015-08-03 14:06:11 -0500 (CDT)

=====================================================================
Trained: Spam (Manual)
Subject: Your Barbecue On STEROIDS
From: yoshigrill70@riddlequipmet.com
Identifier: zjl6NWXeda6uUXhNiw+RDw==
Actions: disabled rule <From (address) Is Equal to "yoshigrill70@riddlequipmet.com"> in SpamSieve whitelist, disabled rule <From (name) Is Equal to “Yoshi Grill”> in SpamSieve whitelist, added rule <From (address) Is Equal to "yoshigrill70@riddlequipmet.com"> to SpamSieve blocklist, added rule <From (name) Is Equal to “Yoshi Grill”> to SpamSieve blocklist, added to Spam corpus (6632), removed from Good corpus (4777)
Date: 2015-08-03 14:23:23 -0500 (CDT)

Mistake: False Negative
Subject: Your Barbecue On STEROIDS
Identifier: zjl6NWXeda6uUXhNiw+RDw==
Classifier: Whitelist
Score: 1
Date: 2015-08-03 14:23:28 -0500 (CDT)

Please see Spammy Whitelist Rules. It looks like SpamSieve auto-trained “Yoshi Grill” to the whitelist after you received an earlier message, and SpamSieve never saw a correction for that message, so it left the whitelist rule in place.

In short, false negatives like this either mean that you are not correcting the mistakes promptly or that there is an error that’s preventing the corrections from taking place. If you look in the log, you should be able to see which message added the “Yoshi Grill” rule, and then you can look for that message in your mail program. It’s very rare to see errors preventing the training from working, especially since the log shows that it did work for the message in your example, but if there were errors they would be noted in the Console log.

If there are a lot of messages that were not corrected, that will really reduce SpamSieve’s accuracy, and to fix that you would need to follow the Resetting SpamSieve instructions.

Not clear
I have been using SpamSieve for a while. I am meticulous in training and correcting. I correct EVERY message that is misclassified. I always have and my interaction with the program has not changed since I started, so no there are not a lot (any) spam messages that go uncorrected.

Every morning I check my spam folder, and about once a week, pull a good message out of it, using “Train As Good.”

Then I check my inbox, and if there is an obvious spam in it, I correct that using “Train As Spam”

That USED to happen perhaps once a week. Now it happens 3-4 times a day. In the Yoshi Grill example, if there was a message that added it to the whitelist (something I cannot imagine) it is long gone, as it is an obvious spam. I don’t think I ever let spam sit in my inbox, and I never do so without eventually training it as spam (e.g. vacation, when I come back to a bunch of messages after several days away)

The log I provided is the typical procedure–spam comes in, is a false negative, I see it, mark it as spam, and it is removed. All this happens in about 1 minute. If I need to do that any faster I can’t.

Trying to avoid a corpus reset as that all but shoots the program dead for a while.

Do you have any rules other than the SpamSieve one that move messages to the Spam mailbox or trash? Such rules can prevent misclassifications from being noticed.

Do you always make the corrections using the “SpamSieve - Train as Spam” command, rather than the Junk button in Mail’s toolbar or its associated menu command?

If the message is gone, then I guess all we can do is make sure things are working properly going forward. As I said, any errors would be reported in the Console log. Also, when you train a message as spam, you could watch the Statistics window and make sure that the number of spam messages in the corpus increases.

That’s plenty fast. Even within a day or two is probably fine. The Yoshi Grill example was long enough ago that it was not in the current log file.

Another
I always train with the SpamSieve “Train As…” command, Good and Spam. The few messages that go into “Junk” (thanks iCloud, for not letting me turn that off!) I just “Apply Rules” to and they are filtered normally and always marked as spm by SpamSieve as expected.

Yesterday this came in–a bad prediction as Good

=====================================================================
Predicted: Good (26)
Subject: Browse Best SUVs! Honda CR-V, Buick Encore, Audi Q5, Nissan Rogue and More
From: info@sharunillwayrole.com
Identifier: ZYOZaqxvZ5lJoScp5DjOxw==
Reason: P(spam)=0.000[0.468], bias=0.505, F:SUV(1.000), RT:SUV(0.999), F:Models(0.999), A:Favorite(0.001), U:OS(0.003), U:OS(0.003), U:8bfc(0.007), 191112(0.010), U:Ick(0.010), dual-purpose(0.010), U:JUs(0.010), U:TYl(0.010), U:77f(0.015), S:Nissan(0.019), S:Encore(0.019)
Date: 2015-08-04 12:03:12 -0500 (CDT)

Trained: Good (Auto)
Subject: Browse Best SUVs! Honda CR-V, Buick Encore, Audi Q5, Nissan Rogue and More
From: info@sharunillwayrole.com
Identifier: ZYOZaqxvZ5lJoScp5DjOxw==
Actions: added rule <From (address) Is Equal to "info@sharunillwayrole.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to “SUV Models”> to SpamSieve whitelist, added to Good corpus (4792)
Date: 2015-08-04 12:03:12 -0500 (CDT)

Today this:

Predicted: Good (1)
Subject: Deals on SUVs! Honda CR-V, Buick Encore, Audi Q5, Nissan Rogue and More
From: info@paracrystalvista.com
Identifier: dd+CROaXMe+K3Prs/jPVKg==
Reason: (
“SUV Models”
) matched rule <From (name) Is Equal to “SUV Models”> in SpamSieve whitelist
Date: 2015-08-05 12:17:22 -0500 (CDT)

I certainly never trained a message with that spam subject as good. And the auto train on yesterday’s is flawed.

Here the same sender is predicted Good, then as spam based on a auto whitelist rule using “SUV Models”

=====================================================================
Predicted: Good (1)
Subject: Deals on SUVs! Honda CR-V, Buick Encore, Audi Q5, Nissan Rogue and More
From: info@paracrystalvista.com
Identifier: dd+CROaXMe+K3Prs/jPVKg==
Reason: (
“SUV Models”
) matched rule <From (name) Is Equal to “SUV Models”> in SpamSieve whitelist
Date: 2015-08-05 12:17:22 -0500 (CDT)

Predicted: Spam (99)
Subject: avatar@stevebasile.com, Your Oil-Change Coupons Are Delivered.
From: info@paracrystalvista.com
Identifier: wJxucMs+6YBkdCWTsmrmBw==
Reason: (
"info@paracrystalvista.com"
) matched rule <From (address) Is Equal to "info@paracrystalvista.com"> in SpamSieve blocklist
Date: 2015-08-05 12:20:02 -0500 (CDT)

I’m puzzled as to why after many years this is happening so often in the past 3-4 months.

Please answer my question about your other rules.

Does the log show that you trained this message as spam and that the whitelist rules were disabled?

Nothing says that you did.

It’s normal to get an auto-trained rule when SpamSieve predicts the message to be good. But it looks either you did not train the message as spam or the training didn’t work, and so the rule was still enabled today.

No other rules
I have a couple of rules that file good messages in to Finance and Sports mailboxes, e.g., but nothing that does junk or spam message manipulation.

That is good.

Found the Issue I Think
Got to thinking about when this surge of missed spam began and about vacation, and I think that was the source. THAT was the break in my rigorous spam-training regimen.

I run SS on one machine, my home iMac, always on and filtering. While traveling in June, and again in July, home iMac went down after a power failure and stopped filtering for a week or 10 days while I was away–twice. During that time, messages came to my laptop and iPhone and because I don’t run SS there, I could not do anything but delete them till I got back home and restarted my iMac.

This would cause SS to think they were OK, and perhaps whitelist them? As I look through my whitelist there are dozens of obvious rules like that, each no doubt attributable to a message I just deleted.

So, there is a bunch of junk in my whitelist. I think I saw you say elsewhere not to delete bad rules in the list but turn them off? Is that my best approach to clear some of these bad auto-trains?

One option would be to move the messages to a TrainSpam mailbox so that you can train them when you get back.

Yes.

Correct.

No, because that only fixes the whitelist, not the corpus. However, you could reset the corpus and disable the bad whitelist rules instead of resetting the whitelist. That would let you keep your good whitelist rules.

How About This?
Now that I know what caused this (and I will setup a “For Training Later” holding mailbox) what if I just begin retraining/untraining the obvious spams using “Train As Spam” as they arrive, and tough it out?

This will avoid the hassle of resetting either Corpus or Whitelist and seems like it will eventually get SS back on track as it has been doing well before this…

It would improve the situation compared with where you are now, but the accuracy will never be as good as if you reset to get rid of the incorrect data. So it’s really up to you whether you want to put in a bit more time now in order to save time in the long run.