Tag PDFs that Need OCR

Summary: Adds the “NeedsOCR” tag to the selected PDF files that do not have any text.
Requires: EagleFiler
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-10-02

Description

When importing from a scanner, you might not have run your OCR program before importing the scanned document into EagleFiler. This script looks at the records that you’ve selected and tags any PDF files that have not yet been run through OCR, so that you can do so, e.g. using the OCR With PDFpen script.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

property pMinimumTextLengthThatCounts

: 1

property pIgnoreWhitespace : true

tell application

 "EagleFiler"

set _records to selected records of browser window

repeat with _record in _records

if _record's universal type identifier is "com.adobe.pdf" then

with timeout of 5 * 60 seconds

set _string to _record's text content

end timeout

if pIgnoreWhitespace then

set _string to my removeWhitespace(_string

end if

if length of _string < pMinimumTextLengthThatCounts then

set _oldTagNames to _record's assigned tag names

set _record's assigned tag names to _oldTagNames

 & {"NeedsOCR"}

end if

end repeat

end tell

on removeWhitespace(_string

set _string to my replace(_string

, " ", "")

set _string to my replace(_string, tab

, "")

set _string to my replace(_string, return

, "")

return _string

end removeWhitespace

on replace(_string, _source, _replacement

set AppleScript's text item delimiters to _source

set _items to every text item of _string

set AppleScript's text item delimiters to _replacement

return _items as Unicode text

end replace