Corpus Scripting Sample
Summary: Demonstrates how to control SpamSieve’s corpus using AppleScript.
Requires: SpamSieve
Install Location: ~/Library/Scripts/SpamSieve Scripts/
Last Modified: 2024-10-04
Description
This script demonstrates how to access SpamSieve’s corpus using AppleScript.
Using get every word
(and other operations on very large
numbers of words) will not work, because AppleScript runs out of
memory. (There are typically 100,000 words in SpamSieve’s corpus.) It is better to instead access token infos by index if you want to iterate over all the words.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
tell
application
"SpamSieve"
tell
current corpus
-- Log the basic properties
log
{"Corpus File:",
file
as
string
}
log
{"Spam Messages:",
spam message count as
integer
}
log
{"Good Messages:",
good message count as
integer
}
-- Log all the words. The "words" property is deprecated because a large list of strings can overwhelm AppleScript. Instead, we access the token infos by index so that AppleScript never has to load the entire corpus contents at once.
set
n to
count
token infos
repeat with
i from 1 to
n
set
w to
word
of
token info
i
log
w as
string
end repeat
-- If you know a word, you can view its information by looking up the token info with its name.
set
t to
token info
"foo"
log
t's
good count
log
t's
spam count
end tell
end tell