Corpus Scripting Sample

Summary: Demonstrates how to control SpamSieve’s corpus using AppleScript.
Requires: SpamSieve
Install Location: ~/Library/Scripts/SpamSieve Scripts/
Last Modified: 2024-10-04

Description

This script demonstrates how to access SpamSieve’s corpus using AppleScript.

Using get every word (and other operations on very large numbers of words) will not work, because AppleScript runs out of memory. (There are typically 100,000 words in SpamSieve’s corpus.) It is better to instead access token infos by index if you want to iterate over all the words.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

tell application "SpamSieve"
    tell
current corpus
        
-- Log the basic properties
        
log {"Corpus File:", file as string}
        
log {"Spam Messages:", spam message count as integer}
        
log {"Good Messages:", good message count as integer}
        
        
-- Log all the words. The "words" property is deprecated because a large list of strings can overwhelm AppleScript. Instead, we access the token infos by index so that AppleScript never has to load the entire corpus contents at once.
        set
n to count token infos
        repeat with
i from 1 to n
            set
w to word of token info i
            
log w as string
        end repeat
        
        
-- If you know a word, you can view its information by looking up the token info with its name.
        set
t to token info "foo"
        
log t's good count
        
log t's spam count
    end tell
end tell