OCR With OCRmyPDF
Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, OCRmyPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2025-08-15
Description
This script uses OCRmyPDF to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer. If the PDF file had the “NeedsOCR” tag because you had used the Tag PDFs that Need OCR script, the tag will be removed after OCR has been applied.
There are several ways to use this script:
- Run the script by itself to operate on the selected PDFs in EagleFiler.
- Save the script as an application and drop PDF files onto it to OCR them and then import them into EagleFiler.
- Attach the script to a folder as a folder action and save files into that folder.
See also the Import From Scanner script and EagleFiler’s OCR documentation.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
-- Prerequisite: The script assumes OCRmyPDF is installed via MacPorts.
-- Language parameter: By default the script runs -l eng. You can add more languages, e.g. -l eng+nld.
on
run
tell
application
"EagleFiler"
-- Get all selected records (PDF files) in EagleFiler's front browser window
set
_records to
selected records of
browser window
1
repeat with
_record in
_records
set
_file to
_record's
file
-- the file (alias) associated with this EagleFiler record
my
ocr(
_file)
-- perform OCR on the file using OCRmyPDF
tell
_record to
update checksum
-- update EagleFiler's stored checksum now that the file content changed
my
removeTag(
_record, "NeedsOCR")
-- remove the "NeedsOCR" tag from the record (if it had one)
end repeat
end tell
end
run
on
open
_files
-- Handles files dropped onto this script: OCR them and then import into EagleFiler
my
ocrAndImport(
_files)
end
open
on
adding folder items to
_folder
after receiving
_files
-- Handles folder action: OCR new files added to the folder and then import into EagleFiler
my
ocrAndImport(
_files)
end
adding folder items to
on
ocrAndImport(
_files)
-- Iterate over each file, perform OCR, then import all into EagleFiler
repeat with
_file in
_files
my
ocr(
_file)
-- OCR each file in the list
end repeat
tell
application
"EagleFiler"
import
files
_files
-- import the newly OCR'd files into EagleFiler's library
end tell
end
ocrAndImport
on
ocr(
_file)
-- Convert the file reference (alias) to a POSIX path string for use in the shell command
set
inputPath to
POSIX path of
_file
-- Create a temporary file path for the OCR output (in the system temp directory)
-- This will generate a unique filename like /tmp/ocrmypdfABC123.pdf for the output
set
tempPath to
do shell script
"mktemp /tmp/ocrmypdfXXXXXX.pdf"
-- Run the OCRmyPDF command on the input file, outputting the OCR'd PDF to the temporary file.
-- The PATH is adjusted to include MacPorts directories, and quoted form of paths are used for safety.
do shell script
"PATH=/opt/local/bin:/opt/local/sbin:$PATH; /opt/local/bin/ocrmypdf -l=eng --redo-ocr " &
quoted form of
inputPath & " " &
quoted form of
tempPath
-- Move (rename) the temporary OCR output file back to the original file path, overwriting the original PDF
do shell script
"mv -f " &
quoted form of
tempPath & " " &
quoted form of
inputPath
end
ocr
on
removeTag(
_record,
_tagName)
tell
application
"EagleFiler"
-- Get the list of all tags currently assigned to the record
set
_tags to
_record's
assigned tags
set
_newTags to {}
-- Build a new list of tags excluding the tag we want to remove
repeat with
_tag in
_tags
if
_tag's
name is not
_tagName then
copy
_tag to the end of
_newTags
end if
end repeat
-- Update the record's tags to the new list (the specified tag is now removed)
set
_record's
assigned tags to
_newTags
end tell
end
removeTag