The corpus is a collection of messages, both spam and good, with which you have trained SpamSieve. SpamSieve’s Bayesian classifier analyzes the contents of the messages and uses this information to predict whether future messages are spam or good. The contents of the corpus are managed by SpamSieve; therefore, once you’ve trained SpamSieve with a message, deleting the message from your e-mail program will not affect SpamSieve because the information from that message is stored in the corpus.
Good Messages and Spam Messages
This section shows the lists of good and spam messages that you’ve trained, as well as ones that SpamSieve has auto-trained. If you find that SpamSieve classified a message incorrectly, it’s best to correct the mistake by training it from your mail client. However, you can also use the Train as Good/Spam commands in the corpus window to correct messages that you don’t see in the mail client. Looking at the lists of messages in the corpus is a good way to make sure that no messages were mis-trained and that no mistakes went uncorrected.
The following columns are shown:
- ⚑
- Whether the message has been marked as flagged.
- #
- The number of attachments.
- Subject
- The message’s subject.
- Received
- When the message was received by your mail server.
- Trained
- When you (or SpamSieve’s auto-training feature) added the message to SpamSieve’s corpus.
- Size
- The size of the message’s Raw Source, if it’s stored by SpamSieve. You can sort by this column to find messages that are using a lot of disk space.
The Info tab shows summary information about the message itself, as well as how it was trained.
The Message tab shows a preview of the message’s contents. SpamSieve does not load remote images here, so you are protected from Web bugs. You can also use the Open in External Viewer command to open the message in your mail client.
The Raw Source tab shows the message data that SpamSieve received from your mail client. You can export a message’s raw source by dragging the message from the list to the Finder.
The Structure tab shows information about how SpamSieve interpreted the raw source.
Searching
You can search the Corpus window by entering text to match a message’s metadata or a word. A multi-word query is treated as a phrase search. Searches support wildcards such as * (which matches any number of additional characters) and ? (which matches a single character). To search for a literal wildcard character, you can escape it, e.g. \? to search for a question mark.
Words
This section shows the words that SpamSieve has extracted from the trained messages.
The following columns are shown:
You can remove words that you don’t want in the corpus by selecting them and pressing Delete. Deleting important words can greatly affect SpamSieve’s accuracy, so you shouldn’t make changes without good reason. It is normally better to let SpamSieve manage the words.