SpamSieve supports searching in the Corpus, Log, Blocklist, and Allowlist
windows. Using the search field in the toolbar, you can filter the top of the
window to display to show only those items that match the search criteria. You
can open this help page by selecting Search Syntax Reference from the
search field menu.

Additionally, you can choose Edit ‣ Find ‣ Find to search within the
Info, Raw Source, and Structure tabs at the bottom of the window.
The rest of this section concerns the search field at the top.
Search Scope
When searching messages in the corpus or log entries in the log, you can
choose between a Standard search or one of the other search scopes:
- Standard
- This is normally what you want, as it will search almost everything in the
message or log entry. However, in some circumstances you may want to
choose a more specific scope, either to make the search faster or to
narrow the results. Note that this does not search the Raw Message
Source, the Words, or descriptive text in the Type or
Subject column that’s not part of the message or an error.
- Subject
- Searches the subject of the message.
- From
- Searches the name and address of the message’s sender.
- To
- Searches the address where you received the message (which may be
different from what’s shown in the message’s To: header).
- Identifier
- Searches for messages with the given SpamSieve identifier (which will look
something like xiNVGwM5KM7sGk71w7KNZQ==). You can find a message’s
identifier in the Info tab. The identifier is computed based on the
headers of the message, and SpamSieve uses this to determine whether it’s
seeing the same message again (e.g. so it can tell during training whether
you’re correcting a mistake or teaching it a new message).
- Message-ID
- Searches the message’s Message-ID: header. This is the identifier
generated by the message’s sender and may look something like
<4824BBE7-3B56-4702-9F6C-13C45C8D7C7E@c-command.com>. Note that
multiple messages with different SpamSieve identifiers may have the same
Message-ID. For example, if you receive two copies of a message sent to
different e-mail addresses, the Message-ID will be the same but the
SpamSieve identifiers will be different because the messages took
different paths (documented in the Received: header) to reach you.
- Raw Message Source
- Searches the message’s RFC 822 data, i.e. the full message data
(headers, body, attachments) that your mail client downloaded from the
server, as shown in the Raw Source tab. The data may be transfer
encoded (Quoted-Printable or Base64) and include HTML and CSS.
- Rules
- Searches the Text to Match of any Blocklist or Allowlist rules that
were created or edited or that matched a message.
- Matching Words
- Searches words that the Bayesian classifier used to predict whether a
message was good or spam. These are also shown in the Info tab of a
Predicted log entry. This includes regular words found in the message
body as well as special words like S:Apple, R:^relay2^apple^com,
and ^a-style-fontfamilyArialsansserifcolorwhite that SpamSieve uses to
track more specific message characteristics. (For examples, see the
Words tab of the Corpus window.)
- Words
- Searches the corpus words in the message. This is different from
Matching Words in that it searches all the words in a message (or in
a Predicted or Trained log entry) even if SpamSieve deemed them to
be neutral (not a strong indicator of good vs. spam) and so they do not
appear among the significant Words in the Predicted log entry.
Note that Raw Message Source and Words searches are slower than the
other types and are only possible for messages where SpamSieve is storing the
full message data. This includes all messages in the corpus that were trained
using SpamSieve 3.0 or later. If you’re using the Prune full message data in
log setting, only newer log entries will have their full data stored.
Search Query Syntax
Except where noted below, searches are case-insensitive and
diacritic-insensitive. A multi-word query is treated as a phrase search.
Searches support wildcards such as * (which matches any number of
additional characters) and ? (which matches a single character). To search
for a literal wildcard character, you can escape it, e.g. \? to search for
a question mark.
When searching by Identifier or Message-ID, you must search for the
entire identifier, not just a some of the letters. This is case-sensitive, and
wildcards are not supported. Typically, you would know the exact identifier
because you are copying and pasting it from elsewhere. If you need to search
for a fragment of a Message-ID: header you can do that using Raw Message
Source.
When searching by Raw Message Source, searches are case-sensitive and do
not support wildcards. Non-ASCII search terms may not directly match the raw
source because it may be encoded.
Search Examples
- A From search for @apple.com will find messages sent to you from
Apple.
- If you have multiple mail accounts or aliases, a To search can help
you find messages sent to a particular address.
- If you see a message in the Log and copy its Identifier from the
Info tab, you can search for that identifier to find all the log
entries pertaining to that message. For example, if a spam message went
to your inbox, it may have been Predicted: Good and then Trained:
Good (Auto), then you corrected the mistake so it was Trained: Spam
(Manual). Or there may be multiple Predicted entries for the same
message if you mail program kept seeing it as new and sent it to
SpamSieve for analysis multiple times.
- Searching by Message-ID can be useful if you are trying to find a
message that you see in your mail client or in a server log. In that
case, you don’t know the SpamSieve identifier to uniquely identify the
message, but this search will narrow the results down much more than
searching by Subject or From.
- If you see that a message was classified as good or spam due to a
particular rule, you can do a Rules search to see when and why that
rule was created and which other messages were also classified using
that rule. If a rule is not fully reliable, but you’ve locked it so
that SpamSieve keeps it enabled, you can find all the messages
classified using that rule—both correctly and incorrectly classified—to
help evaluate whether you still want to use that rule.
- If you see that the Bayesian classifier predicted a message to be good
or spam (in part) due to a particular word, a Matching Words search
can find other messages where that word also played a role. This can
also be useful (instead of a Words search, as below) in searching the
log for messages that contain a particular corpus word. It will not
find all messages with that word because it’s only searching messages
where that word was one of the ones that SpamSieve deemed important to
that classification. However, it may find some messages that a
Word search doesn’t because, if you’re using the Prune full
message data in log setting, older messages in the log won’t have
their full message data stored, so it won’t be possible to search all
their words.
- Words searches can be useful in checking the training when a
message was not classified correctly. For example, if the Bayesian
classifier incorrectly predicted a spam message to be good, the
Words in the Info tab will show which key words SpamSieve used
to make this determination. Suppose you see v1agra(0.005) there.
This seems like a word that would only appear in spam messages, but
the spam probability being very close to zero means that SpamSieve
thinks it’s a strong indication that the message is good. Something
isn’t right here. You could go to the Good Messages section of the
Corpus window and do a Words search for v1agra to find which
messages SpamSieve was trained with that made it think this word was good.
If you find messages in the Good section that are spam (e.g. trained
by mistake or auto-trained messages that you failed to correct)
you could fix that by training them as spam.