OCR With OCRmyPDF
Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, OCRmyPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2025-08-15
Description
This script uses OCRmyPDF to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer. If the PDF file had the “NeedsOCR” tag because you had used the Tag PDFs that Need OCR script, the tag will be removed after OCR has been applied.
There are several ways to use this script:
- Run the script by itself to operate on the selected PDFs in EagleFiler.
- Save the script as an application and drop PDF files onto it to OCR them and then import them into EagleFiler.
- Attach the script to a folder as a folder action and save files into that folder.
See also the Import From Scanner script and EagleFiler’s OCR documentation.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
-- Prerequisite: The script assumes OCRmyPDF is installed via MacPorts.
-- Language parameter: By default the script runs -l eng. You can add more languages, e.g. -l eng+nld.
on run
tell application "EagleFiler"
-- Get all selected records (PDF files) in EagleFiler's front browser window
set _records to selected records of browser window 1
repeat with _record in _records
set _file to _record's file -- the file (alias) associated with this EagleFiler record
my ocr(_file) -- perform OCR on the file using OCRmyPDF
tell _record to update checksum -- update EagleFiler's stored checksum now that the file content changed
my removeTag(_record, "NeedsOCR") -- remove the "NeedsOCR" tag from the record (if it had one)
end repeat
end tell
end run
on open _files
-- Handles files dropped onto this script: OCR them and then import into EagleFiler
my ocrAndImport(_files)
end open
on adding folder items to _folder after receiving _files
-- Handles folder action: OCR new files added to the folder and then import into EagleFiler
my ocrAndImport(_files)
end adding folder items to
on ocrAndImport(_files)
-- Iterate over each file, perform OCR, then import all into EagleFiler
repeat with _file in _files
my ocr(_file) -- OCR each file in the list
end repeat
tell application "EagleFiler"
import files _files -- import the newly OCR'd files into EagleFiler's library
end tell
end ocrAndImport
on ocr(_file)
-- Convert the file reference (alias) to a POSIX path string for use in the shell command
set inputPath to POSIX path of _file
-- Create a temporary file path for the OCR output (in the system temp directory)
-- This will generate a unique filename like /tmp/ocrmypdfABC123.pdf for the output
set tempPath to do shell script "mktemp /tmp/ocrmypdfXXXXXX.pdf"
-- Run the OCRmyPDF command on the input file, outputting the OCR'd PDF to the temporary file.
-- The PATH is adjusted to include MacPorts directories, and quoted form of paths are used for safety.
do shell script "PATH=/opt/local/bin:/opt/local/sbin:$PATH; /opt/local/bin/ocrmypdf -l=eng --redo-ocr " & quoted form of inputPath & " " & quoted form of tempPath
-- Move (rename) the temporary OCR output file back to the original file path, overwriting the original PDF
do shell script "mv -f " & quoted form of tempPath & " " & quoted form of inputPath
end ocr
on removeTag(_record, _tagName)
tell application "EagleFiler"
-- Get the list of all tags currently assigned to the record
set _tags to _record's assigned tags
set _newTags to {}
-- Build a new list of tags excluding the tag we want to remove
repeat with _tag in _tags
if _tag's name is not _tagName then
copy _tag to the end of _newTags
end if
end repeat
-- Update the record's tags to the new list (the specified tag is now removed)
set _record's assigned tags to _newTags
end tell
end removeTag