SpamSieve doing bad job with embedded images

JackAubrey · September 15, 2011, 4:18pm

I have been a happy SpamSieve user for at least 3 years. For 2.75 years, I could say “I don’t get SPAM”. Recently, I have become a little disappointed. SPAM messages consisting of a single embedded jpg image are getting past SpamSieve at least 30% of the time.

A message with those attributes is almost certainly SPAM. Why is SpamSieve so easily fooled? Particularly since it used to be more than 99% accurate!

Michael_Tsai · September 15, 2011, 4:28pm

Please see the Why is SpamSieve not catching my spam? page.

JackAubrey · September 15, 2011, 4:41pm

Like I said, I am an experienced user and have certainly checked the configuration on all my Macs. None of my machines changed significantly - and all started missing messages at roughly the same time.

The latest wave of ‘single image’ SPAM messages began infecting my mailbox a few months ago. There is not much content to scan since the entire message is just an inline image surrounded by a hyperlink. SpamSieve stats show it is not adapting very well to this unique class of SPAM. Other types of SPAM (with more text) continue to be caught effectively by SpamSieve.

BTW: I bootstrap each new machine with a corpus from an existing machine and then train it with a few hundred new messages. SpamSieve is just not identifying this class of spam message very accurately.

Michael_Tsai · September 15, 2011, 5:16pm

That’s good to know. The most common problem for longtime users such as yourself is that something changes in their mail program’s configuration, perhaps without their knowledge, and this prevents a certain class of messages from being sent to SpamSieve for filtering. If you’ve verified that SpamSieve is indeed filtering these messages (just poorly), the next step is to follow the instructions on the What information should I include when I report a problem? page to send in your log file and some recent false negative files.

It would probably work better just to use the few hundred new messages.

JackAubrey · September 15, 2011, 6:08pm

Good. I will do that this weekend.

My experience has been that copying the base corpus from my main iMac gets SpamSieve up and running much faster. Filtering is accurate on day one, instead the usual 3-7 days.

I don’t think my problems are due to this type of bootstrapping. My machines have worked well for months after bootstrapping from the same corpus. Recent poor filtering has occurred independently on both my leopard and lion machines with no copying of files.

Michael_Tsai · September 16, 2011, 6:11am

That’s true, but it probably reduces long-term accuracy. For best results, I recommend training SpamSieve with recent messages from the actual account(s) that it will be filtering.