Results 1 to 18 of 18

Thread: PDF Spam Filtering

  1. #1

    Default PDF Spam Filtering

    I (like most everyone else) have been hit with a lot of PDF spam recently. SpamSieve doesn't seem to have the ability to filter this type of spam yet. Any plans to issue an update with this feature? Soon please :)

  2. #2

    Default

    I don’t think this requires an update. SpamSieve is catching all the PDF spams for me. If it isn’t for you, there may be something else going on. Please check your setup and/or report the ones that are getting through.

  3. #3

    Default No Luck Here

    PDF Spam is filling my mailbox and training does not seem to help.
    What specific settings might help?

  4. #4

    Default

    Quote Originally Posted by sfraw View Post
    PDF Spam is filling my mailbox and training does not seem to help.
    What specific settings might help?
    No special settings are necessary, but you need to verify that the messages are actually getting through SpamSieve, and it may be necessary to reset the corpus. But before you reset the corpus, please send me a report.

  5. #5

    Default pdf spam and unusal "trained as good"

    Michael,
    I am getting lots of pdf spam too, both with the pdf opened and displayed and also as pdf attachments.

    I always (almost) hit cntl-apple-s and the message disappears but it seems many are still being trained as "good" and added to my white list, at least that is what the log.log seems to say. I am enclosing it.

    In fact I just had an attachment pdf hit my mailbox while typing this and saw it automatically added to my whitelist (the whitelist was open.) I'm going to remove many of these from the whitelist and see if I can add them to the black list...

    Well, it looks like I can only remove them and not move them.

  6. #6

    Default

    Quote Originally Posted by sharkez View Post
    I am getting lots of pdf spam too, both with the pdf opened and displayed and also as pdf attachments.

    I always (almost) hit cntl-apple-s and the message disappears but it seems many are still being trained as "good" and added to my white list, at least that is what the log.log seems to say. I am enclosing it.
    Same here. I've tried training these types of files as SPAM, but for some reason, they seem to be adding themselves to my whitelist, such as the ones below.

    ================================================== ===================
    Predicted: Good (25)
    Subject: Emailing: alert.pdf
    From: cemil@b1b2.com
    Identifier: QIiyNZ9bIdzKrMpYJdjKxQ==
    Reason: P(spam)=0.000[0.457], bias=0.000, x-mimeole:ProducedByMicrosoftMimeOLEV6.00.2900.3138( 0.001), MT:MSHTML6.00.2900.3132(0.001), XM:MicrosoftOutlookExpress6.00.2900.3138(0.001), S:Emailing(0.001), R:^32(0.998), R:^112(0.995), X:HELO-DYNAMIC-DIALIN(0.909), X:INVALID-TZ-GMT(0.889), H:X-MSMail-Priority(0.842), x-msmail-priority:Normal(0.842), handled(0.160), handled(0.160), attachments(0.170), attachments(0.170), viruses(0.191)
    Date: 2007-07-18 09:37:15 -0400
    ================================================== ===================
    Trained: Good (Auto)
    Subject: Emailing: alert.pdf
    Identifier: QIiyNZ9bIdzKrMpYJdjKxQ==
    Actions: added rule <From (address) Is Equal to "cemil@b1b2.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to "cemil novac"> to SpamSieve whitelist, added to Good corpus (1104)
    Date: 2007-07-18 09:37:15 -0400
    ================================================== ===================
    Predicted: Good (25)
    Subject: Emailing: Invoice.pdf
    From: LEBEDEVxwx@leadbetterteam.com
    Identifier: 8UdhNEWDMBUpCOXrhuH2Jg==
    Reason: P(spam)=0.000[0.454], bias=0.000, x-mimeole:ProducedByMicrosoftMimeOLEV6.00.2900.3138( 0.001), MT:MSHTML6.00.2900.3132(0.001), S:Emailing(0.001), XM:MicrosoftOutlookExpress6.00.2900.3138(0.001), invoice.pdf(0.005), ^ih-944(0.005), ^iw-896(0.005), invoice.pdf(0.005), R:^ono^com(0.995), R:^dyn^user^ono^com(0.995), R:^user^ono^com(0.995), X:RCVD-NUMERIC-HELO(0.895), handled(0.143), handled(0.143), H:X-MSMail-Priority(0.841)
    Date: 2007-07-18 09:47:15 -0400
    ================================================== ===================
    Trained: Good (Auto)
    Subject: Emailing: Report-41366.pdf
    Identifier: 6AhTovHAh389njylTVuoNg==
    Actions: added rule <From (address) Is Equal to "Litwiller@hurryupoffense.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to "bev Litwiller"> to SpamSieve whitelist
    Date: 2007-07-18 09:57:15 -0400

  7. #7

    Default

    Quote Originally Posted by sharkez View Post
    I am getting lots of pdf spam too, both with the pdf opened and displayed and also as pdf attachments.
    As I said above, I expect SpamSieve to catch the PDF spams. If this isn’t happening for you, please send me a report so that I can see what’s happening on your Mac.

    Quote Originally Posted by sharkez View Post
    I always (almost) hit cntl-apple-s and the message disappears but it seems many are still being trained as "good" and added to my white list, at least that is what the log.log seems to say. I am enclosing it.
    I did not receive an enclosure. In any case, it is normal for SpamSieve to auto-train the whitelist when it predicts that an incoming message is good. If it turns out that the message wasn’t good, when you use the “Train as Spam” command, SpamSieve will disable (uncheck) the rule that was created on the whitelist and add a rule to the blocklist.

    Quote Originally Posted by sharkez View Post
    I'm going to remove many of these from the whitelist and see if I can add them to the black list...
    After using “Train as Spam” the whitelist rule should be disabled, and a rule should have been added to the blocklist. This is actually better than if you had removed the rule from the whitelist, because a the presence of the disabled rule prevents SpamSieve from ever auto-adding that rule to the whitelist again. In short, it should never be necessary to manually add or remove rules from the whitelist or blocklist. You only need to do that if you are creating your own custom rules.

  8. #8

    Default

    Quote Originally Posted by earlchr View Post
    Same here. I've tried training these types of files as SPAM
    Again, please e-mail me a report with your log file and false negative files. I need to see what’s going on in order to help you.

    Quote Originally Posted by earlchr View Post
    but for some reason, they seem to be adding themselves to my whitelist, such as the ones below.
    That’s probably normal.

  9. #9

    Default train as spam not changing rule I think

    Michael,
    Sorry, I've not been back for a bit but I don't think the train as spam is changing the previously created white rule.

    I think you didn't get the attachment because I'm noticing now that the allowed type of attachments here is not .log (or text I guess) even though it looked like it was attached. I'll zip up the files. I'm sending (to spamsieve-fn@c-command.com) a zip with four false neg pdf attachments and the log file.) Is there any way to send the blocklist and whitelist?

    Also, it appears that part of what is going on is just a filter on the address, at least that is what is added to the whitelist. What type of filtering are you doing on the contents?

  10. #10

    Default actually is unchecking...

    Actually, I think it is doing the "removal" since the check mark goes away when I train as spam. So If that is working then why am I still getting lots of pdf spam? Are you only filtering on the address--that would be only marginally useful since spammers use thousands of addresses (even mine!)

    So how are you filtering on the content? Does train as spam also make more likely the catching of a pdf with one of the few typical names (advertisement.pdf, report.pdf, message.pdf, etc.)?

    How about actually looking at the pdf content itself (could be tough--slow...)?

  11. #11

    Default

    Quote Originally Posted by sharkez View Post
    I think you didn't get the attachment because I'm noticing now that the allowed type of attachments here is not .log (or text I guess) even though it looked like it was attached.
    The forum doesn’t allow text or Zip attachments (except from me), and you probably don’t want your log file on the Web, anyway.

    Quote Originally Posted by sharkez View Post
    I'll zip up the files. I'm sending (to spamsieve-fn@c-command.com) a zip with four false neg pdf attachments and the log file.)
    Sounds good.

    Quote Originally Posted by sharkez View Post
    Is there any way to send the blocklist and whitelist?
    Those are stored in:

    ~/Library/Application Support/SpamSieve/Rules

    but there’s probably no need to send them.

    Quote Originally Posted by sharkez View Post
    Also, it appears that part of what is going on is just a filter on the address
    That’s not correct. SpamSieve uses a variety of filters, the most important being the Bayesian classifier.

    Quote Originally Posted by sharkez View Post
    So If that is working then why am I still getting lots of pdf spam?
    I don’t know—that’s why I need you to send the log. :-)

    Quote Originally Posted by sharkez View Post
    Does train as spam also make more likely the catching of a pdf with one of the few typical names (advertisement.pdf, report.pdf, message.pdf, etc.)?
    Yes.

    Quote Originally Posted by sharkez View Post
    How about actually looking at the pdf content itself (could be tough--slow...)?
    SpamSieve 2.6.2 does that, and the next version will do so more.

  12. #12

    Default

    Quote Originally Posted by Michael Tsai View Post


    That’s not correct. SpamSieve uses a variety of filters, the most important being the Bayesian classifier.
    Just looked at the corpus and I know my examples sent to you this morning included one or two message.pdf attachments. The corpus has S:message.pdf listed with only one Spam count from 7/10/07; there is no straight message.pdf listed.

  13. #13

    Default

    Quote Originally Posted by sharkez View Post
    Just looked at the corpus and I know my examples sent to you this morning included one or two message.pdf attachments. The corpus has S:message.pdf listed with only one Spam count from 7/10/07; there is no straight message.pdf listed.
    None of the four false negative files that you sent me included “message.pdf” attachments, though they did have PDFs with other names.

    The log shows two messages (one from 6/27 and one from 7/11) that had “message.pdf” in the subject (that’s what “S:message.pdf” means), both of which SpamSieve figured out were spam on its own. On 7/15 SpamSieve classified a message with subject “Re:” as spam, partially because it contained an attachment named “message.pdf”.

    The log file included some other information that I found useful and that I will use to improve the next version of SpamSieve.

  14. #14

    Default

    My mistake, it looks like you are correct ;-) Spamsieve found the two with message.pdf and made them spam before I saw them. The two i sent had "complaint.pdf" and magazine.pdf"

    It makes me wonder, is it possible to have a rule that filters it out if all that is in the message is a single pdf (no text) not from someone in you address book?

  15. #15

    Default

    Quote Originally Posted by sharkez View Post
    It makes me wonder, is it possible to have a rule that filters it out if all that is in the message is a single pdf (no text) not from someone in you address book?
    You can’t do exactly that, but you could create a blocklist rule that says “Any Attachment Name Ends With .pdf”, and if you have Use Mac OS X Address Book checked it will automatically not affect messages from people in your address book. However, I don’t expect that to be necessary. You can improve SpamSieve’s accuracy by resetting the corpus and re-training it with some more recent messages or simply wait for the next version.

  16. #16

    Default distinguish good and bad pdfs?

    Quote Originally Posted by Michael Tsai View Post
    I don’t think this requires an update. SpamSieve is catching all the PDF spams for me. If it isn’t for you, there may be something else going on. Please check your setup and/or report the ones that are getting through.
    I got quite a number of PDF spam these days as well. I have not elected to
    mark them as junk, because I do receive lots of legitimate pdfs.
    Will Spamsieve differentiate between them?

    best wishes
    T

  17. #17

    Default

    Quote Originally Posted by trb View Post
    I got quite a number of PDF spam these days as well. I have not elected to
    mark them as junk, because I do receive lots of legitimate pdfs.
    If you expect to get good accuracy it’s essential that you tell SpamSieve the truth. If you get a PDF spam and let SpamSieve think that it’s good, not only will more PDF spams get through, but SpamSieve will also start letting other spam messages through, and possibly even think that some good messages are spam.

    Quote Originally Posted by trb View Post
    Will Spamsieve differentiate between them?
    Yes, SpamSieve can learn the difference between spammy PDFs and good ones.

  18. #18

    Default

    SpamSieve 2.6.3 is much better at analyzing messages containing PDF attachments. Some improvements will take effect automatically, but to get the full benefit you’ll need to reset SpamSieve’s corpus and re-train it. I would only bother doing that if you find, several days after updating to 2.6.3, that PDF spams are still getting through. That won’t be the case for most users.

Similar Threads

  1. Mailbox Sieve: expanded filtering
    By dhs in forum SpamSieve
    Replies: 4
    Last Post: 05-21-2007, 02:58 PM
  2. Filtering mail from multiple delivery folders
    By nigelm in forum SpamSieve
    Replies: 1
    Last Post: 04-23-2007, 09:52 AM
  3. Fun bug: infinitely receding PDF
    By brab in forum EagleFiler
    Replies: 3
    Last Post: 01-10-2007, 09:05 AM
  4. Replies: 3
    Last Post: 09-07-2006, 12:31 PM
  5. Filtering to multiple inboxes broken?
    By SantaCruzJimbo in forum SpamSieve
    Replies: 2
    Last Post: 09-01-2006, 04:08 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •