eBay phishing spams are causing false positives

One of the phishing spams I get is of the form “You’ve received a question about your eBay item.” I’ve been following the standard advice of training the phishes as spam, but when I get legitimate eBay email of the same format, SpamSieve thinks it’s spam, too. I get probably 5 or 10 times as many spams as non-spams of this type, so I guess that confuses SpamSieve.

I’m wondering if I should set up a whitelist to counteract this problem. For instance, the phishing spams always seem to contain the text “Dear member”, whereas the legitimate eBay email contains the text “Dear <my user name>”. Maybe I could use that as a whitelist condition.

Training it with a significant number of the good ones should help.

Yes, that would work. Another option (depending on how your e-mail is setup) would be to set a unique address for eBay so that you can easily see which messages are legitimate. For example, you could tell them that your address is vocaro+ebay@domain.com. Many e-mail hosts support this.

These options haven’t worked for me. What about disabling the blocklist altogether? That should substantially reduce my false positive problem, but I’m worried about adverse side-effects.

Are you sure this problem is caused by the blocklist? What does the log say? There should be no reason to disable the blocklist as a whole because SpamSieve automatically disables any problematic blocklist rules when you train a message as good.

And did you try the vocaro+ebay@domain.com option? That should prevent phishing spams from causing problems because you can whitelist the good messages.

Below is one example. I took the SpamSieve log for May and filtered it out to contain only the log entries for Garmin. (I had recently applied for a job with them.) Apparently, “Human Resources” was already in the blocklist, which caused several of Garmin’s emails to be classified as spam.


Predicted: Spam (99)
Subject: Profile submitted to Garmin International
From: hr-VIDE3@invalidemail.com
Identifier: H1Bz0gSoUjAjCe3e0p8MPg==
Reason: (
    "Human Resources"
) matched rule <From (name) Is Equal to "Human Resources"> in SpamSieve blocklist
Date: 2009-05-26 13:09:40 -0400
=====================================================================
Predicted: Spam (99)
Subject: Software Engineer-090004T at Garmin
From: hr-VIDE3@invalidemail.com
Identifier: bxkxS91ITtqjdDbvlzSlYQ==
Reason: (
    "Human Resources"
) matched rule <From (name) Is Equal to "Human Resources"> in SpamSieve blocklist
Date: 2009-05-26 13:14:39 -0400
=====================================================================
Predicted: Spam (99)
Subject: Applications Developer - Internet-08000LG at Garmin
From: hr-VIDE3@invalidemail.com
Identifier: XxKdTVZZkR3Wiope7JxsTg==
Reason: (
    "Human Resources"
) matched rule <From (name) Is Equal to "Human Resources"> in SpamSieve blocklist
Date: 2009-05-26 13:34:37 -0400
=====================================================================
Predicted: Spam (99)
Subject: Senior Software Engineer-090000X at Garmin
From: hr-VIDE3@invalidemail.com
Identifier: YS2YHk3cegubeeZmmhW6IA==
Reason: (
    "Human Resources"
) matched rule <From (name) Is Equal to "Human Resources"> in SpamSieve blocklist
Date: 2009-05-26 13:34:39 -0400
=====================================================================
Trained: Good (Manual)
Subject: Senior Software Engineer-090000X at Garmin
Identifier: YS2YHk3cegubeeZmmhW6IA==
Actions: added rule <From (address) Is Equal to "hr-VIDE3@invalidemail.com">
to SpamSieve whitelist, added rule <From (name) Is Equal to "Human
Resources"> to SpamSieve whitelist, disabled rule <From (name) Is Equal to
"Human Resources"> in SpamSieve blocklist, added to Good corpus (1723)
Date: 2009-05-26 14:14:40 -0400
=====================================================================
Trained: Good (Manual)
Subject: Applications Developer - Internet-08000LG at Garmin
Identifier: XxKdTVZZkR3Wiope7JxsTg==
Actions: added to Good corpus (1724)
Date: 2009-05-26 14:14:42 -0400
=====================================================================
Trained: Good (Manual)
Subject: Software Engineer-090004T at Garmin
Identifier: bxkxS91ITtqjdDbvlzSlYQ==
Actions: added to Good corpus (1725)
Date: 2009-05-26 14:14:44 -0400
=====================================================================
Trained: Good (Manual)
Subject: Profile submitted to Garmin International
Identifier: H1Bz0gSoUjAjCe3e0p8MPg==
Actions: added to Good corpus (1726)
Date: 2009-05-26 14:14:47 -0400
=====================================================================
Trained: Good (Manual)
Subject: Your application at ARA-New England Division
Identifier: a2nLEYFQaNCg20egz9BkSg==
Actions: added rule <From (address) Is Equal to "ara@hrdepartment.com"> to
SpamSieve whitelist, added to Good corpus (1727)
Date: 2009-05-26 14:14:49 -0400
=====================================================================
Mistake: False Positive
Subject: Your application at ARA-New England Division
Identifier: a2nLEYFQaNCg20egz9BkSg==
Classifier: Encoded HTML
Score: 100
Date: 2009-05-26 14:14:54 -0400
=====================================================================
Mistake: False Positive
Subject: Profile submitted to Garmin International
Identifier: H1Bz0gSoUjAjCe3e0p8MPg==
Classifier: Blocklist
Score: 99
Date: 2009-05-26 14:14:54 -0400
=====================================================================
Mistake: False Positive
Subject: Software Engineer-090004T at Garmin
Identifier: bxkxS91ITtqjdDbvlzSlYQ==
Classifier: Blocklist
Score: 99
Date: 2009-05-26 14:14:54 -0400
=====================================================================
Mistake: False Positive
Subject: Applications Developer - Internet-08000LG at Garmin
Identifier: XxKdTVZZkR3Wiope7JxsTg==
Classifier: Blocklist
Score: 99
Date: 2009-05-26 14:14:54 -0400
=====================================================================
Mistake: False Positive
Subject: Senior Software Engineer-090000X at Garmin
Identifier: YS2YHk3cegubeeZmmhW6IA==
Classifier: Blocklist
Score: 99
Date: 2009-05-26 14:14:54 -0400
=====================================================================
Predicted: Good (1)
Subject: Software Engineer-090004T at Garmin
From: hr-garmin@invalidemail.com
Identifier: DSAVa05b09D2Nf6ARsB4JA==
Reason: (
    "Human Resources"
) matched rule <From (name) Is Equal to "Human Resources"> in SpamSieve whitelist
Date: 2009-05-29 16:26:56 -0400

I didn’t try it because it’s only a piecemeal solution. I could set up a vocaro+ebay address with eBay, but what about Garmin and the other false positives caused by the blocklist? I’d have to implement this solution everywhere, which takes a lot of work. I don’t want to worry about implementing a special From address every time I register with a web site.

Well, the reason I suggested what I did is that your original post only mentioned false positives caused by eBay phishing scams. These Garmin messages seem to be related to neither eBay nor phishing. I seriously doubt that the eBay false positives were caused by the blocklist.

Of course, you can turn off the blocklist. But I wonder if something else is going on here because the blocklist doesn’t generally cause lots of false positives. If you send your latest log file to spamsieve-fn@c-command.com I can take a closer look.

Well, whatever is causing the false positives, the log says “Classifier: Blocklist”, so if I disable the blocklist, I should get fewer false positives, right?

I noticed that the Garmin logs say “Reason: (“Human Resources”) matched rule <From (name) Is Equal to “Human Resources”> in SpamSieve blocklist”. I take this to mean that I had previously received spam from “Human Resources” (not necessarily from Garmin, just some random spam), and SpamSieve added “Human Resources” to its blocklist. When the Garmin email came along, it therefore matched a name in the blocklist. Does this explain the false positive, or am I misunderstanding how the blocklist works?

Right. But I don’t want to generalize too much from a small excerpt from a months-only log. The one Human Resources rule caused several false positives, and then one was caused by the Encoded HTML filter, but I don’t know how many problematic blocklist rules there really are.

Right—because you received a spam from “Human Resources” and trained it as spam.

Correct.

I looked at the logs that you sent and, since August 23 there was only one false positive due to the blocklist. This was because ebay@ebay.com was on the blocklist, presumably because of one of the phishing messages that you mentioned. You’ve since trained that message as good, and so that blocklist rule is now disabled. So I see no reason to turn off the whole blocklist now.