Tag PDFs that Need OCR
Summary: Adds the “NeedsOCR” tag to the selected PDF files that do not have any text.
Requires: EagleFiler
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-10-02
Description
When importing from a scanner, you might not have run your OCR program before importing the scanned document into EagleFiler. This script looks at the records that you’ve selected and tags any PDF files that have not yet been run through OCR, so that you can do so, e.g. using the OCR With PDFpen script.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
property
pMinimumTextLengthThatCounts : 1
property
pIgnoreWhitespace :
true
tell
application
"EagleFiler"
set
_records
to
selected records
of
browser window
1
repeat
with
_record
in
_records
if
_record's
universal type identifier
is
"com.adobe.pdf"
then
with
timeout
of
5 * 60
seconds
set
_string
to
_record's
text content
end
timeout
if
pIgnoreWhitespace
then
set
_string
to
my
removeWhitespace(
_string)
end
if
if
length
of
_string <
pMinimumTextLengthThatCounts
then
set
_oldTagNames
to
_record's
assigned tag names
set
_record's
assigned tag names
to
_oldTagNames & {"NeedsOCR"}
end
if
end
if
end
repeat
end
tell
on
removeWhitespace(
_string)
set
_string
to
my
replace(
_string, " ", "")
set
_string
to
my
replace(
_string,
tab, "")
set
_string
to
my
replace(
_string,
return, "")
return
_string
end
removeWhitespace
on
replace(
_string,
_source,
_replacement)
set
AppleScript's
text item delimiters
to
_source
set
_items
to
every
text item
of
_string
set
AppleScript's
text item delimiters
to
_replacement
return
_items
as
Unicode text
end
replace