Huge Whitelist file

Hi. I’m a long-time Spamsieve user, currently running 2.9.2 on OS X 10.10.5. The number of spam emails arriving in my inbox has been going up for several weeks. In looking around, I discovered that my blacklist file contains 4,000 items and my whitelist file contains over 25,000 items. The vast majority of those items are clearly spam. I have attached a small example of the file. Is there an easy way edit this file, of is there some way that you can, let’s say, erase any item that has never been applied?

Screen Shot 2015-08-18 at 10.01.29 AM.png

Please see Spammy Whitelist Rules. If you see spammy rules on the whitelist that are checked, that means that you have not been correcting all the mistakes. That will lead to all sorts of problems, such as spam messages in your inbox.

If the rules have 0 hits, they have never been applied, so they are not the cause of the problem (of spam in your inbox). Rather, the incorrect rules are a symptom (of not training the spam messages that get through as spam). If you wanted to delete these rules, you could sort the Whitelist window by Hits in order to group all the unused rules together. However, this is (a) not really necessary, and (b) won’t fix the root problem (because it doesn’t fix the incorrect information in SpamSieve’s corpus).

To fix the corpus, you would need to (a) find all the spam messages that got through and train them as spam, or (b) reset the corpus and retrain SpamSieve.

I’m not sure I’m thrilled to be told that this as my fault, as I have trained and trained according to your products instructions. I may have been sloppy a few times when the amount of spam became overwhelming, but more like 1% of the total times, and certainly not some 20,000 times.

You say, “To fix the corpus, you would need to (a) find all the spam messages that got through and train them as spam, or (b) reset the corpus and retrain SpamSieve.”

Well, I am unlikely to spend several weeks finding and training the thousands of errors that have been made, regardless of who made them, especially when I don’t think that was me.

So, tell me what happens to the whitelist and blacklist files when I reset the corpus, since that appears to be my only real choice. That should mean that I am essentially starting over with your product, I suppose, which means all of the valid information that I have built up in those files will be destroyed, and I will be starting totally over?

I see that it keeps the old blacklist / whitelist information. Will it still pay attention to it? I am more concerned about losing GOOD whitelist information than anything else, since that may well cause me to lose a lot of valid email.

Are you saying that 80% (20,000/25,000) of your whitelist rules are spammy and enabled? It’s fine if they are spammy and disabled.

I think the first thing to do is to figure out what happened here, so that it doesn’t happen again.

  • Which mail program are you using, and how are you training the messages as spam? For example, if you are using Apple Mail and clicking the Junk button (instead of choosing SpamSieve - Train as Spam), that will only train Mail’s junk filter.
  • Do you have any rules in your mail program (aside from SpamSieve) that move messages to the Spam mailbox or trash? Such rules can prevent you from knowing when you need to train SpamSieve.
  • If you are using SpamSieve on multiple Macs, please make sure that you are using one of the setups described here.
  • If you get spam messages in your iPhone’s inbox, do you delete them from the phone? That would prevent them from being trained as spam.
  • When you train a message as spam, does this show up in the Corpus numbers in the Statistics window?

Lastly, you could prevent the bad effects of not correcting all the mistakes if you turn off auto-training.

Yes, I was just trying to explain how it works. The second option is much easier and quicker.

Resetting the corpus does not affect the whitelist or blocklist. So, most importantly, it will still use all the valid whitelist rules that you have built-up over the years.

Right. That’s another reason that I don’t recommend trying to clean out your whitelist.

Michael:

Some good news at least. I have not surveyed all twenty-odd-thousand to see if their boxes are checked. It appears to be a healthy mix of checked and unchecked as I cruise through quickly.

As for your initial grout of questions:

I am using Apple Mail. I mark a message as spam by selecting it and striking control-command-S.

I have no other active rules in my Apple Mail rules list, just “Spamsieve.”

I am only using Spamsieve on my desktop rMBP, and no other computers.

I rarely delete spam from my phone, and never did until the last few weeks when I have been getting perhaps a hundred or more spams a day in my inbox. I did not know that was a problems, but still have not done it much, though I m sure much more than you like. When one is away from his desk, and one gets many spams per hour, it becomes difficult to keep track of one’s email.

Yes, it appears as if the statistics are moving along with my marking of a spam message.

Turning off auto-training: frankly, I don’t know enough about it to know if this would be a good idea or not.

So, in review, I seem to be doing not much wrong except deleting spam from my phone (my accounts are all IMAP, by the way, including the one via which 99.5% of my spam arrives) and you seem to be saying that I should reset the corpus.

Any other immediate questions or tips, or should I just reset the corpus without doing anything with the other files?

That all sounds good. A very standard setup.

Based on what you’ve written, I think it would be fine to leave it on so long as you are careful about not deleting untrained spams from your iPhone. (If you want to get the spams out of sight, you could move them to another mailbox for later training.)

You can go ahead and reset the corpus.