OCR With OCRmyPDF

Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, OCRmyPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2025-08-15

Description

This script uses OCRmyPDF to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer. If the PDF file had the “NeedsOCR” tag because you had used the Tag PDFs that Need OCR script, the tag will be removed after OCR has been applied.

There are several ways to use this script:

Run the script by itself to operate on the selected PDFs in EagleFiler.
Save the script as an application and drop PDF files onto it to OCR them and then import them into EagleFiler.
Attach the script to a folder as a folder action and save files into that folder.

See also the Import From Scanner script and EagleFiler’s OCR documentation.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

-- Prerequisite: The script assumes OCRmyPDF is installed via MacPorts.

-- Language parameter: By default the script runs -l eng. You can add more languages, e.g. -l eng+nld.

on

run



    tell

application

 "EagleFiler"

-- Get all selected records (PDF files) in EagleFiler's front browser window

set

_records to selected records of browser window

 1

        repeat with

_record in _records

set

_file to _record's file -- the file (alias) associated with this EagleFiler record

my

ocr(_file) -- perform OCR on the file using OCRmyPDF



            tell

_record to update checksum -- update EagleFiler's stored checksum now that the file content changed

my

removeTag(_record, "NeedsOCR") -- remove the "NeedsOCR" tag from the record (if it had one)



        end repeat

    end tell

end

run

on

open _files

-- Handles files dropped onto this script: OCR them and then import into EagleFiler

my

ocrAndImport(_files

)

end

open

on

adding folder items to _folder after receiving _files

-- Handles folder action: OCR new files added to the folder and then import into EagleFiler

my

ocrAndImport(_files

)

end

adding folder items to

on

ocrAndImport(_files

-- Iterate over each file, perform OCR, then import all into EagleFiler



    repeat with

_file in _files

my

ocr(_file) -- OCR each file in the list



    end repeat

    tell

application

 "EagleFiler"

import files _files -- import the newly OCR'd files into EagleFiler's library



    end tell

end

ocrAndImport

on

ocr(_file

-- Convert the file reference (alias) to a POSIX path string for use in the shell command

set

inputPath to POSIX path of _file

-- Create a temporary file path for the OCR output (in the system temp directory)

-- This will generate a unique filename like /tmp/ocrmypdfABC123.pdf for the output

set

tempPath to do shell script

 "mktemp /tmp/ocrmypdfXXXXXX.pdf"

-- Run the OCRmyPDF command on the input file, outputting the OCR'd PDF to the temporary file.

-- The PATH is adjusted to include MacPorts directories, and quoted form of paths are used for safety.

do shell script "PATH=/opt/local/bin:/opt/local/sbin:$PATH; /opt/local/bin/ocrmypdf -l=eng --redo-ocr " & quoted form of inputPath & " " & quoted form of tempPath

-- Move (rename) the temporary OCR output file back to the original file path, overwriting the original PDF

do shell script "mv -f " & quoted form of tempPath & " " & quoted form of inputPath

end

ocr

on

removeTag(_record, _tagName

)

    tell

application

 "EagleFiler"

-- Get the list of all tags currently assigned to the record

set

_tags to _record's assigned tags

set

_newTags

 to {}

-- Build a new list of tags excluding the tag we want to remove



        repeat with

_tag in _tags

if

_tag's name is not _tagName

 then

                copy

_tag to the end of _newTags



            end if

        end repeat

-- Update the record's tags to the new list (the specified tag is now removed)

set

_record's assigned tags to _newTags



    end tell

end

removeTag