PDA

View Full Version : Stock spam immune to SpamSieve?



Ian
08-31-2006, 09:39 AM
A lot of the spam I receive consists of a jpeg touting some junk stock and a bunch of semirandom text, and SpamSieve almost never recognizes these as spam, no matter how many times I manually flag them as such. What's the magic trick that gets these things past SpamSieve (and Tuffmail's spam filters, for that matter)? It's like they're immune to spam filtering.

Michael Tsai
08-31-2006, 09:49 AM
In my experience, SpamSieve has no trouble catching that kind of spam. Please report them (http://c-command.com/spamsieve/manual-ah/what-information-should) so that I can see if there's a setup problem on your end or if I need to make any adjustments to SpamSieve.

Ian
09-01-2006, 11:11 AM
Will do, thanks.

hedgman
10-07-2006, 07:32 PM
I am getting the same kind of spam. Daily. I have checked the appropriate preference to save them and will report them to you per your instructions after I have a couple. It is the only stuff that slips through. I love SS.
hedgman

hedgman
10-25-2006, 03:07 PM
A suggestion from Michael Tsai: Try creating a
blocklist rule in SpamSieve:

Body Matches Regex <body bgcolor="#ffffff" text="#000000">\s<img

pjmolite
10-25-2006, 10:39 PM
I'm getting the same penny stock spam sometimes twice a day. It's a .GIF file that's attached. No one sends me those so I tried to add the rule "any attachment ends with" in the blocklist but it will not work. I'm using Apple Mail, so I'm trying a second rule to block (it seems to work but I have to wait and see). My Network Solutions web mail doesn't block the attachment either so there must be more to this than I know.

I'll keep you posted on this. I'm new here, using SpamSieve for about six months. At least I know I'm not alone!

Michael Tsai
10-25-2006, 10:56 PM
No one sends me those so I tried to add the rule "any attachment ends with" in the blocklist but it will not work.

It sounds like that should have worked. Please send me some information (http://c-command.com/spamsieve/manual-ah/what-information-should) so that I can help you further.

pjmolite
10-27-2006, 03:04 AM
I think I resolved my problem. I re-trained SpamSieve from scratch (deleting everything), but this time applied Apple Mail rules in groups of 500 messages. Possibly, SpamSieve was over-trained initially. Also, the SpanSieve rule was not listed correctly. I read the manual more carefully.

Thanks.

Michael Tsai
10-30-2006, 08:55 PM
SpamSieve 2.5 (http://c-command.com/blog/2006/10/30/spamsieve-25/) should be much better at catching these spams.

garybx
11-02-2006, 04:49 PM
I just read a discussion on macoshints (http://www.macosxhints.com/article.php?story=20061031221923601) about how to handle these new-fangled spam messages where they use an image to contain the "text" of the advertisement. Basically, they recommend using a rule which looks for attached GIF files which are NOT coming from a person in my address book and has a Content-Type with "multipart/related". That seems to be a good rule, but I notice that the new SpamSieve 2.5 mentions that it is improved in recognizing image messages (which I assume are the same as we're talking about here).

The folks at macoshints were concerned that training a spam program such as yours to reject these would be counterproductive since they frequently also include random text with "good" words.

Here's the relevant part of the referenced discussion:

------------ start of quoted message --------------
This is less a hint about improving accuracy than one about not deteriorating the accuracy of your junk mail filter.

Most email spam filters classify spam based on word frequency. When you train the filter, you give the filter a list of bad words. If a particular bad word comes up frequently it increases the likelihood that that email is spam. Then when an email comes in with enough bad words it is classified as spam.

The .gif emails that are currently going around do not get filtered well by spam filters. These emails contain two parts: the first part, the .gif, contains the text that the filter would normally learn to trigger off. Since this text is in the image, the filter can't see it. Instead the filter sees the second part of the email: a list of random phrases and words. The filter picks up these words and calculates a spam score from them.

The best thing to do with these spams is simply delete them and not try to train your filter with them. If you do have your filter learn the words in these messages, it will only be learning common words, which will skew its results. Use a rule to highlight these emails or to move them to your spam folder, as explained in this hint; it will maintain the integrity of your spam filter.
------------ end of quoted message --------------

Do you have an opinion about this - ie, whether your spam filter's accuracy is degraded by flagging this particular type of email as spam (due to the good words included)?

Michael Tsai
11-02-2006, 07:51 PM
I don't recommend making a rule like MacOSXHints says, since non-spam messages with "multipart/related" are common. SpamSieve 2.5 uses a variety of new techniques to recognize image spams, but in order for it to do so, you must train it with them. In fact, I recommend that you always train SpamSieve with spam messages that it missed. Omitting messages because you think they might be counterproductive will seriously decrease SpamSieve's accuracy, because it's like telling it that you do want those messages in your inbox.

garybx
11-02-2006, 10:31 PM
Thanks for the advice. I will continue to let SpamSieve do its job. In fact, since it is currently telling me that its corpus is larger than necessary, I am going to reset it and retrain using the 600-700 spam messages I've received this week!