Tag PDFs that Need OCR

Summary: Adds the “NeedsOCR” tag to the selected PDF files that do not have any text.
Requires: EagleFiler
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-02-25

Description

When importing from a scanner, you might not have run your OCR program before importing the scanned document into EagleFiler. This script looks at the records that you’ve selected and tags any PDF files that have not yet been run through OCR, so that you can do so, e.g. using the OCR With PDFpen script.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

property pMinimumTextLengthThatCounts : 1
property pIgnoreWhitespace : true

tell application "EagleFiler"
    
set _records to selected records of browser window 1
    
repeat with _record in _records
        
if _record's universal type identifier is "com.adobe.pdf" then
            
with timeout of 5 * 60 seconds
                
set _string to _record's text content
            
end timeout
            
if pIgnoreWhitespace then
                
set _string to my removeWhitespace(_string)
            
end if
            
if length of _string < pMinimumTextLengthThatCounts then
                
set _oldTagNames to _record's assigned tag names
                
set _record's assigned tag names to _oldTagNames & {"NeedsOCR"}
            
end if
        
end if
    
end repeat
end tell

on removeWhitespace(_string)
    
set _string to my replace(_string, " ", "")
    
set _string to my replace(_string, tab, "")
    
set _string to my replace(_string, return, "")
    
return _string
end removeWhitespace

on replace(_string, _source, _replacement)
    
set AppleScript's text item delimiters to _source
    
set _items to every text item of _string
    
set AppleScript's text item delimiters to _replacement
    
return _items as Unicode text
end replace