5.3.1   Show Corpus

The corpus is a collection of messages, both spam and good, with which you have trained SpamSieve. SpamSieve’s Bayesian classifier analyzes the contents of the messages and uses this information to predict whether future messages are spam or good. The contents of the corpus are managed by SpamSieve; therefore, once you’ve trained SpamSieve with a message, deleting the message from your e-mail program will not affect SpamSieve because the information from that message is stored in the corpus.

The Show Corpus command opens the Corpus window so that you can examine the words that SpamSieve has found in your e-mails. You can click on the name of a column to sort by that column. Click again on the column to reverse the sort direction. The meanings of the columns are as follows:

A word in the corpus.
The number of times the word has occurred in spam messages.
The number of times the word has occurred in good messages.
The total number of times the word has occurred.
The probability that a message is spam, given that it contains the word (and in the absence of other evidence).
Last Used
The date that the word was added to the corpus, or the date that it last appeared in a received message (whichever is later).

You can copy the selected rows to the clipboard or drag and drop them into another application.

With the window sorted by Word, you can type the first few letters of a word to locate that word in the corpus. Similarly, you can sort by one of the other columns and type a number to locate the first word whose value for the sorted column matches the number you typed.

You can edit the spam and good counts associated with a word by double-clicking on the number in the Spam or Good column. Changing the numbers for important words can greatly affect SpamSieve’s accuracy, so you shouldn’t make changes without good reason.

You can remove words that you don’t want in the corpus by selecting them and pressing Delete.

