OCR With OCRmyPDF

Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, OCRmyPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2025-08-15

Description

This script uses OCRmyPDF to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer. If the PDF file had the “NeedsOCR” tag because you had used the Tag PDFs that Need OCR script, the tag will be removed after OCR has been applied.

There are several ways to use this script:

See also the Import From Scanner script and EagleFiler’s OCR documentation.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

-- Prerequisite: The script assumes OCRmyPDF is installed via MacPorts.
-- Language parameter: By default the script runs -l eng. You can add more languages, e.g. -l eng+nld.

on
run
    tell
application "EagleFiler"
        
-- Get all selected records (PDF files) in EagleFiler's front browser window
        set
_records to selected records of browser window 1
        repeat with
_record in _records
            set
_file to _record's file -- the file (alias) associated with this EagleFiler record
            my
ocr(_file) -- perform OCR on the file using OCRmyPDF
            tell
_record to update checksum -- update EagleFiler's stored checksum now that the file content changed
            my
removeTag(_record, "NeedsOCR") -- remove the "NeedsOCR" tag from the record (if it had one)
        end repeat
    end tell
end
run

on
open _files
    
-- Handles files dropped onto this script: OCR them and then import into EagleFiler
    my
ocrAndImport(_files)
end
open

on
adding folder items to _folder after receiving _files
    
-- Handles folder action: OCR new files added to the folder and then import into EagleFiler
    my
ocrAndImport(_files)
end
adding folder items to

on
ocrAndImport(_files)
    
-- Iterate over each file, perform OCR, then import all into EagleFiler
    repeat with
_file in _files
        my
ocr(_file) -- OCR each file in the list
    end repeat
    tell
application "EagleFiler"
        
import files _files -- import the newly OCR'd files into EagleFiler's library
    end tell
end
ocrAndImport

on
ocr(_file)
    
-- Convert the file reference (alias) to a POSIX path string for use in the shell command
    set
inputPath to POSIX path of _file
    
-- Create a temporary file path for the OCR output (in the system temp directory)
    
-- This will generate a unique filename like /tmp/ocrmypdfABC123.pdf for the output
    set
tempPath to do shell script "mktemp /tmp/ocrmypdfXXXXXX.pdf"
    
-- Run the OCRmyPDF command on the input file, outputting the OCR'd PDF to the temporary file.
    
-- The PATH is adjusted to include MacPorts directories, and quoted form of paths are used for safety.
    
do shell script "PATH=/opt/local/bin:/opt/local/sbin:$PATH; /opt/local/bin/ocrmypdf -l=eng --redo-ocr " & quoted form of inputPath & " " & quoted form of tempPath
    
-- Move (rename) the temporary OCR output file back to the original file path, overwriting the original PDF
    
do shell script "mv -f " & quoted form of tempPath & " " & quoted form of inputPath
end
ocr

on
removeTag(_record, _tagName)
    tell
application "EagleFiler"
        
-- Get the list of all tags currently assigned to the record
        set
_tags to _record's assigned tags
        set
_newTags to {}
        
-- Build a new list of tags excluding the tag we want to remove
        repeat with
_tag in _tags
            if
_tag's name is not _tagName then
                copy
_tag to the end of _newTags
            end if
        end repeat
        
-- Update the record's tags to the new list (the specified tag is now removed)
        set
_record's assigned tags to _newTags
    end tell
end
removeTag