Corpus Scripting Sample
Summary: Demonstrates how to control SpamSieve’s corpus using AppleScript.
Requires: SpamSieve
Install Location: ~/Library/Scripts/SpamSieve Scripts/
Last Modified: 2024-10-04
Description
This script demonstrates how to access SpamSieve’s corpus using AppleScript.
Using get every word (and other operations on very large
numbers of words) will not work, because AppleScript runs out of
memory. (There are typically 100,000 words in SpamSieve’s corpus.) It is better to instead access token infos by index if you want to iterate over all the words.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
tell application "SpamSieve"
tell current corpus
-- Log the basic properties
log {"Corpus File:", file as string}
log {"Spam Messages:", spam message count as integer}
log {"Good Messages:", good message count as integer}
-- Log all the words. The "words" property is deprecated because a large list of strings can overwhelm AppleScript. Instead, we access the token infos by index so that AppleScript never has to load the entire corpus contents at once.
set n to count token infos
repeat with i from 1 to n
set w to word of token info i
log w as string
end repeat
-- If you know a word, you can view its information by looking up the token info with its name.
set t to token info "foo"
log t's good count
log t's spam count
end tell
end tell