reinstall advice needed

I have been using Spamsieve for 5+ years. I was amazed when I first installed and highly recommend to others. Over the past nine months Spamsieve’s effectiveness has been degrading. Please note this decline is most likely my fault and due to several issues relating to my usage and not a problem with Spamsieve. I have looked at the log and most of the offending messages have been whitelisted by auto training. Although, my whitelist has 9,170 rules which seems high to me.

To get Spamsieve back in peak performance, I want to change my setup/workflow and feel a reinstall is the best path forward. However, I am wondering how to retain the whitelisting of my previous recipients as well as the best means for dealing with a lot of my messages which are currently processed by Spamsieve as SPAM by me election even thought they may not technically be SPAM.

Since you typically advise against reinstalling I will provide a bit more background. I suspect my problem stems from several issues relating to my usage.

(1) About nine months ago I had hardware problems with my MacPro which resulted in my running AppleMail and Spamsieve on my MacBookPro for a month. This was done without moving the Spamsieve training between the MacPro and MBP. Thus causing a break in my training.

(2) I have been processing a higher percentage of my email from an iPhone or iPad. However, I have not setup a drone and thus started having SPAM get through without training Spamsieve. This is the first change I want to make in my installation.

(3) I want to change one aspect of my usage. More than 33% of the messages I have trained Spamsieve to think of as SPAM are actually not SPAM. These are messages I wish to retain and may look at from time to time. However, I do not want to see them in my INBOX. These are mostly emails which are triggered by some registration for a subscription, mailing list, or service. Think advertisements from vendors I use on a regular basis, airline frequent flyer programmes, hotels, facebook notifications, newsletters and the like. These messages are currently in my SPAM folder. This causes me problems because I can not readily delete SPAM messages without a lot of process. These archive messages eventually age, so currently I keep around 20,000 messages in the SPAM folder so as to have the archive messages available should I need them. On an annual basis I trim off some of the oldest ones.

This is the second change I wish to make in my usage pattern. Rather than having Spamsieve treat these as SPAM I want to put all of these messages into an archive folder. Then I could simply delete the SPAM folder contents more regularly. For the moment, I am planning on creating a rule where emails from selected domains are placed in an archive folder. I am planning on adding this rule before the Spamsieve rule in AppleMail or possibly putting on the server side via a sieve script. I would be even happier if Spamsieve were to add a feature for auto-archiving which one could train just as one trains SPAM. That is an auto-archive for messages from certain addresses to be saved in a specified archive folder. Of course this would also mean drone support for such auto-archiving.

Now as to my questions:

(A) If you reinstall, how does one retain whitelisting of previous recipients but have Spamsieve start afresh in all other regards?

(B) Is there a script to generate that whitelist entries of recipients based on processing a set of sent messages?

© Any alternative approaches for dealing with the messages I wish to Archive, that is not treat them as SPAM, but not have in my INBOX?

(D) In some cases, I would like to WHITELIST all messages from a domain (think clients) rather than a specific address. Since sender emails sometimes have different variants to the right of @ in their address due to mailserver configurations or different sending regions and the like, is the best way to do this using a REGEX match in the WHITELIST rule? Also, which of BLOCKLIST and WHITELIST has precedence in Spamsieve’s processing? I am assuming WHITELIST, but thought I would check.

That doesn’t really mean anything because SpamSieve whitelists the addresses from all the messages that it thinks are good. If you later train the message as spam, it will disable the corresponding whitelist rules.

That seems fine to me.

That’s what the Reset Corpus command is for. You may also want to change the date in the Statistics window so that SpamSieve starts counting from when you reset the corpus.

Currently there isn’t for Apple Mail (since it has Previous Recipient support built-in) but one could be written. Please see this page.

I recommend making rules in your mail program to move these messages into separate separate mailboxes. (I have one for orders, one for press releases, etc.) You should tell SpamSieve that they’re good.

You could probably just use a rule like Ends With apple.com. Or, to be more precise, Matches Regex @.]apple.com.

The whitelist has precedence.

Thanks for the quick response.

Reset Corpus only resets data for the bayesian engine? There are probably 7K+ of WHITELISTed address from which I never want to see an email. It seems like you don’t want me to reinstall but then all of these WHITELISTed addresses will be retained until training catches something once again. If I don’t reinstall should I go in and delete those addresses? I actually think it would be easier to retrain. Particularly if my Previous Recipients are preserved. If Previous Recipients are retained, I believe tossing my WHITELIST and BLOCKLIST and training again will repopulate things more appropriately.

I had been thinking of putting such a rule prior to Spamsieve’s rule so as to guarantee that they get in the folder. Since you say train as good it seems like your preference is that such a rule comes after the Spamsieve rule. Is that correct?

Right. I thought you wanted to retain the many useful whitelist (and probably also blocklist) rules that you’ve built up over the years.

Yes, but most of these rules are probably either disabled (because you had trained the message as spam) or won’t bother you again. So I suggest keeping them.

I think the only reason to get rid of them would be to reduce SpamSieve’s RAM usage a bit. However, you can achieve that end by sorting the whitelist by the Enabled or Hits column and bulk-deleting the rules that you won’t want.

Either way will work. If you put your rule first, some spam messages from forged addresses will likely end up in your archive mailbox. If you put the SpamSieve rule first, it should be able to catch the spam, though there is also the possibility of a good message occasionally going into the Spam mailbox.