New spam is defeating SpamSieve

philipcaplan · August 21, 2007, 1:11pm

Hello All

In the last 2 days I’ve started receiving a lot of spam which SpamSieve is failing to recognise.

It all has a similar format : here’s a typical one

Greetings,

Welcome To Pet World.

Confirmation Number: 57937125221437
Temp Login ID: user6020
Your Password ID: tt558

For security purposes please login and change the temporary Login ID and Password.

Use this link to change your Login info: http://24.176.213.20/

Enjoy,
Internet Support
Pet World

The subject will be something like “Registration Details” or “Membership Support”.

The “From” will be exactly what it says in the message, such as “Pet World” or "Entertaining Pros"or “Cat Lovers” or “Joke-A-Day”.

Obviously the spammer’s intention is to get the unwise recipient to visit the the “link” as given, whereupon nasty things can be surreptiously snuck into their computer.

Nothing new in this. But what I can’t understand is why isn’t SpamSieve recognising them?

I keep Control-Command-S’ing them, to train SpamSieve, but I’m still getting 20 or more a day.

I’m just curious as to how these are evading SpamSieve, which traps an average of 500 other spams for me each day.

Is anybody else seeing this?

PHILIP

Michael_Tsai · August 21, 2007, 1:24pm

Maybe they’re not. Did you check the log to make sure that SpamSieve is examining them and thinking that they’re not spam?

If they are, please send me a report.

philipcaplan · August 21, 2007, 2:12pm

I’ve looked at the log, and it seems that each of these spam messages has been added to my whitelist, therefore future arrivals from same place or with same subjects are getting through!

For example:

Predicted: Good (29)
Subject: Registration Details
From: m.coll3@btfinancialgroup.com
Identifier: /49HFVk2IPwtnhZueEtmQA==
Reason: P(spam)=0.753[0.555], bias=0.000, F:Pet(0.998), R:^cox^net(0.995), U:213(0.995), U:20(0.022), to:petgord34truew@(0.869), please(0.170), welcome(0.212), link(0.224), password(0.236), password(0.236), change(0.241), change(0.241), greetings(0.243), S:Registration(0.246), U:24(0.258)
Date: 2007-08-21 19:27:23 +0100

Trained: Good (Auto)
Subject: Registration Details
Identifier: /49HFVk2IPwtnhZueEtmQA==
Actions: added rule <From (address) Is Equal to "m.coll3@btfinancialgroup.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to “Pet World”> to SpamSieve whitelist, added to Good corpus (1128)
Date: 2007-08-21 19:27:23 +0100

Now, the question is, how are they getting added to my whitelist, when all I’ve done is since they started arriving I’ve marked each one as “spam” with Control-Command-s.

Using MacOSX 10.4.10 and SpamSieve 2.6.4

I’m not sure what or where to “send a report”.

Michael_Tsai · August 21, 2007, 4:21pm

It’s normal for addresses to get automatically added to the whitelist when SpamSieve thinks the message is good. However, this is not a problem because when you train the message as spam the corresponding whitelist rules will be disabled.

The log excerpt that you’ve quoted does not support this conclusion.

Basically this means that words like “please,” “welcome,” “link,” “password,” “change,” and “greetings” have historically appeared in your good mail but not in the spam. Therefore, their presence makes SpamSieve think that this message is good. Training it with more messages like this will eventually teach it otherwise, or you could speed things up by either (a) going to the corpus window and deleting these words, or (b) resetting SpamSieve’s corpus and re-training it with some recent messages.

If you click on “send a report” from my previous post, it will take you to the Web page that tells how and where to send a report.

Bruce · August 21, 2007, 7:54pm

Same issue here. I’m not getting more than 10 a day but they’re consistent.

Bruce

philipcaplan · August 22, 2007, 4:04am

Aha! So Bruce is seeing this, too, by which I take it Bruce that you mean that 10 a day or so are getting into your Inbox after having been accepted as good by SpamSieve.

Are any other forum members seeing a similar pattern?

Bruce · August 22, 2007, 6:39am

Oddly enough, none at all since I posted the message. By that I mean that none were detected nor passed over by SpamSieve.

Bruce

Michael_Tsai · August 22, 2007, 6:42am

So far no one has followed the instructions to send me a report about this. I’m much more interested in stopping these spams than in knowing how many people who read the forums are seeing them.

Bruce · August 22, 2007, 7:21am

I was in the process of doing this but wanted to wait to see if one of these messages came in this morning. One finally did and the log and false negatives should be at the requested address by the time you read this.

Bruce

philipcaplan · August 22, 2007, 10:13am

I’ve sent Michael the log file as he asked.

I didn’t mean to criticize his program – it’s worked wonderfully for me since I started using it.

I was just surprized that this spam (say 10 or more messages a day) was being passed through by SpamSieve, and kept coming even after I had trained SS on dozens of them.

Excuse my ignorance of programming etc, but isn’t the problem that if I tell SpamSieve that emails containing words such as “please,” “welcome,” “link,” “password,” “change,” and “greetings” are spam, I’m also at risk of losing genuine emails when I sign up for something!!

Looking forward to Michael’s response, and observations from anyone else in the Forum who’s seeing similar types of spam getting through.

Thanks again, and all the best in our joint fight against those pesky spammers!

PHILIP CAPLAN

Bruce · August 22, 2007, 12:14pm

Good news. SpamSieve has recognized one of these e-mails as spam and colored it “blue.”

Bruce

Michael_Tsai · August 22, 2007, 1:32pm

No problem. Criticism is fine—I just like it to be something that I can act on.

I’ve now looked at several log files that people sent, and it seems to be the same pattern. There’s nothing unusual about the structure of the message. SpamSieve is processing it normally. But it happens that it contains a bunch of words that (for these users) have historically appeared mostly in good messages. The other factor is that in all these cases SpamSieve’s corpus was a bit larger than normal, so it takes longer for it to respond to training. If spams like this are bothering you, the quickest way to better accuracy is probably to reset the corpus and then re-train SpamSieve with a smaller number of recent messages.

At the simplest level, yes, if you tell it that messages containing “please” are spammy then, all things being equal, future messages with that word will be predicted to be more spammy.

But the idea is that, if you train SpamSieve with both good and spam messages that contain these words, it will learn to not treat them as a strong indicator in either direction. It will use other words in the messages to make its determination. So I don’t think this is cause for concern.

philipcaplan · August 22, 2007, 2:33pm

Thanks for the response, Michael.

Does this mean there’s not a “halfway house” to ‘thin down’ the corpus in some other way – that I either leave things as they are, or I have to reset the corpus and start the training over again, in which case these words are perhaps going to get a heavier weight as ‘spam’ than in my current corpus?

Seems to me that some spammer somewhere has decided to be very clever and go for the “lean, mean and simple” approach. Trying to slip “under the wire” instead of breaking through it or flying over it!!

I wonder how other “anti-spam” approaches are dealing with this. Do you know if this is causing problems in the outside world?

PHILIP

Michael_Tsai · August 22, 2007, 2:44pm

That’s right. Since spam changes over time, re-training will probably have other good effects, as well.

I don’t know. For my own mail, both SpamSieve and SpamAssassin have been finding most of these messages to be spam.

Bruce · August 22, 2007, 3:03pm

Maybe I’m misunderstanding the role of the corpus here but if I reset it, don’t I then have to retrain SpamSieve? Don’t I lose the accuracy built up over the last several months? While that training is going on, I’ll be getting a lot of spam which is now unrecognized as such.

Are the whitelists/blacklists part of the corpus, as well? (Excuse me if that’s explained in the help or other documentation.)

Bruce

Michael_Tsai · August 22, 2007, 6:21pm

Yes.

So long as you have a few hundred spam and good messages for the re-training, there shouldn’t be much accuracy drop.

Those are not part of the corpus, and they do not need to be reset.

jenlen · August 27, 2007, 9:53pm

spamsieve letting too much trained junk mail pass through
I’ve used spamsieve for over 1.5 years w/no problems. in the last few weeks, I’ve been getting very obvious spam. Words with sexual connotations, garbage words, receiving greeting cards too, which are all bogus. It’s very baffling. I use Entourage, Sys X 10.4.10.

I don’t know where to reset it to train it again nor why should I? It seemed to get worse with the last one or two updates from spamsieve. I clicked on some links in this topic, but I’m clueless what to do (like reset anything) nor how to do it.

I removed spamsieve from my dock (changed that code from 1 to 0 or whatever it is, that every time I update it I have to into a file so it doesn’t appear on my already-crowded dock).

Sorry if this seems rambling, but I work 12 hour/day, 7 days a week and really don’t have the focus to figure this out. I was soooo happy with the program and not happy with the stupid, obvious junk mail I’m receiving. Of course I always click Train-Spam on each of these, but they keep getting thru to my inbox. Help please and thanks!

Michael_Tsai · August 28, 2007, 6:37am

Please look at the log to see whether these messages are getting through SpamSieve or if there is a setup problem. If you don’t understand how to read the log, you can send it to me and I’ll recommend what you should do.

It’s normal for the Dock preference to be reset when updating to a new version of SpamSieve. A future version of SpamSieve will address this.

jenlen · August 28, 2007, 9:39am

spamsieve letting too much trained junk mail pass through
I DON’T KNOW HOW TO OPEN THE LOG!!! I read that info online about the log. I wrote that I am totally confused!!! Entourage is only showing the scripts logo.

Where is the log???

I am BEGGING you to walk me through this, please!!! don’t send me to a manual page that does not say where is the log!!!
jenny

Michael_Tsai · August 28, 2007, 10:03am

The manual page that I linked to shows exactly where the log is. But, to be more specific:

Click the Finder icon in your Dock.
Choose Home from the Go menu.
Open the Library folder.
Open the Logs folder.
Open the SpamSieve folder.
You should now see the log file, which is called “SpamSieve Log.log”. Drag it onto the Entourage icon in your Dock to create a new message.
Type a subject and enter spamsieve-fn@c-command.com as the To address, and then send the message.

New spam is defeating SpamSieve

It all has a similar format : here’s a typical one

Enjoy, Internet Support Pet World

For example:

Enjoy,
Internet Support
Pet World