Tag PDFs that Need OCR
Summary: Adds the “NeedsOCR” tag to the selected PDF files that do not have any text.
Requires: EagleFiler
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-10-02
Description
When importing from a scanner, you might not have run your OCR program before importing the scanned document into EagleFiler. This script looks at the records that you’ve selected and tags any PDF files that have not yet been run through OCR, so that you can do so, e.g. using the OCR With PDFpen script.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
property pMinimumTextLengthThatCounts : 1
property pIgnoreWhitespace : true
tell application "EagleFiler"
set _records to selected records of browser window 1
repeat with _record in _records
if _record's universal type identifier is "com.adobe.pdf" then
with timeout of 5 * 60 seconds
set _string to _record's text content
end timeout
if pIgnoreWhitespace then
set _string to my removeWhitespace(_string)
end if
if length of _string < pMinimumTextLengthThatCounts then
set _oldTagNames to _record's assigned tag names
set _record's assigned tag names to _oldTagNames & {"NeedsOCR"}
end if
end if
end repeat
end tell
on removeWhitespace(_string)
set _string to my replace(_string, " ", "")
set _string to my replace(_string, tab, "")
set _string to my replace(_string, return, "")
return _string
end removeWhitespace
on replace(_string, _source, _replacement)
set AppleScript's text item delimiters to _source
set _items to every text item of _string
set AppleScript's text item delimiters to _replacement
return _items as Unicode text
end replace