SpamSieve 2.0

September 10th, 2003 (SpamSieve)
SpamSieve Icon

Version 2.0 of SpamSieve is now available.

This is a free update that includes the following changes:

  • SpamSieve now extracts a lot more information from each message. This makes it much more accurate and also makes it learn faster.
  • Now integrates with Eudora 6 (Sponsored or Paid) via a plug-in. It can now process every incoming Eudora message and can be trained using the Junk and Not Junk commands in Eudora’s Message menu.
  • SpamSieve now has a blocklist and a whitelist. These are automatically maintained based on the senders of messages that SpamSieve is trained with. The blocklist makes sure that all messages from known spammers are caught and speeds processing for these messages. The whitelist lets you be sure that certain messages will never be marked as spam; this was possible before, but now you don’t have to clutter your address book with addresses from online retailers, etc.
  • You can now control how conservative or aggressive SpamSieve is at catching spam.
  • SpamSieve can now play a sound or bounce its Dock icon after a batch of non-spam messages has arrived. This is meant to replace your e-mail client’s new mail notification, which you don’t want going off if all the new messages are spam.
  • Shows the number of new good messages in the Dock icon.
  • Now parses HTML so that it can better extract relevant information from HTML messages, and also handle various HTML-based tricks that spammers use to fool filters.
  • New method of calculating word probabilities makes SpamSieve better at discerning which words in the message are important.
  • Includes a corpus of seed spam, to jump-start spam recognition for users who do not have many saved spam messages.
  • The corpus is now stored in databases rather than in a property list. This makes it launch faster and use much less memory, as the corpus doesn’t have to be all in RAM at the same time.
  • The statistics file format (for History.db) has changed in order to enable performance improvements and more statistical displays in future versions.
  • Handles more types of plain text obfuscations, and is much faster at undoing them.
  • Added option for the address book whitelist to only use other people’s addresses, so that spam messages from your own address don’t match the whitelist.
  • Can mark all messages with Habeas headers as good.
  • Can mark all messages with some variant of “ADV” at the start of the subject as spam.
  • Can mark all base64-encoded HTML messages as spam.
  • New probability combiner increases accuracy.
  • Uses stop words to speed processing and reduce false negatives.
  • When filtering a message, considers the number of occurrences of the words, not just which words are present.
  • Can import messages from mbox files.
  • Can import the corpus from and export it to an XML property list (the same format used by 1.x).
  • SpamSieve can now check for updated versions of itself.
  • Added crash reporter.
  • Added Dock menu containing frequently used commands.
  • The entries in the log are more detailed.
  • The corpus now stores the date at which each word was last accessed.
  • Fixed bug where storing statistics would fail on systems that didn’t know about GMT.
  • Fixed bug where SpamSieve could throw away long runs of HTML thinking they were attachments.
  • Added button for opening the Mac OS X Address Book from inside SpamSieve.
  • The Statistics window now has a contextual menu item for copying the displayed information.
  • SpamSieve no longer wastes cycles updating the Statistics window after it’s been closed.
  • The Statistics window is smarter about updating only the portions that could have changed.
  • No longer shows Good Words and Spam Words stats.
  • Logging has less overhead.
  • Updates the history asynchronously, resulting in faster message processing.
  • Checks for mistakes in a background thread.
  • False negatives are now written to disk in a background thread.
  • Re-arranged the Corpus window.
  • Pruning the corpus now works by access date rather than by word counts. Of course, you can manually prune the old way by sorting the Corpus window by Total.
  • Updated to SQLite 2.8.6 and tuned it for speed.
  • Updated to PCRE 4.3.
  • Updated to eSellerate 3.5, which should fix crashes some people saw after registering on 10.2.6.
  • Now looks at headers of subparts of messages from Mailsmith.
  • Time-consuming operations now either have a progress bar or a progress spinner.
  • Better at extracting malformed e-mail addresses from headers.
  • Copying rows from the Corpus window to the clipboard now uses the order of the columns in the window rather than the default column order.
  • Fixed regression where the Entourage scripts no longer created the Spam folder if it didn’t exist.
  • Fixed potential crash with regex replacements at the end of a string.
  • The history and the corpus files can now be aliases.
  • Automatically trims carriage returns and other illegal characters when you paste in your name and serial number.
  • Now saves the name and serial number to disk as soon as they’re entered.
  • The Spam folder in Entourage no longer has to be top-level.
  • Entourage can mark good messages as unread.
  • Type-selecting in table views is quicker.
  • No longer nags constantly when unregistered.
  • Fixed bug where it could look as though SpamSieve had hung if it started up in the background with an empty corpus.

For more information, please see the SpamSieve Manual.