OCR With PDFpen

Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, PDFpen or PDFpen Pro
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2020-06-03

Description

This script uses PDFpen to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer. If the PDF file had the “NeedsOCR” tag because you had used the Tag PDFs that Need OCR script, the tag will be removed after OCR has been applied.

There are several ways to use this script:

See also the Import From Scanner script.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

on run
    
tell application "EagleFiler"
        
set _records to selected records of browser window 1
        
repeat with _record in _records
            
set _file to _record's file
            
my ocr(_file)
            
tell _record to update checksum
            
my removeTag(_record, "NeedsOCR")
        
end repeat
    
end tell
end run

on open _files
    
my ocrAndImport(_files)
end open

on adding folder items to _folder after receiving _files
    
my ocrAndImport(_files)
end adding folder items to

on ocrAndImport(_files)
    
repeat with _file in _files
        
my ocr(_file)
    
end repeat
    
tell application "EagleFiler"
        
import files _files
    
end tell
end ocrAndImport

on ocr(_file)
    
tell application "PDFpen"
        
open _file as alias
        
tell document 1
            
ocr
            
repeat while performing ocr
                
delay 1
            
end repeat
            
delay 1
            
close with saving
        
end tell
    
end tell
end ocr

on removeTag(_record, _tagName)
    
tell application "EagleFiler"
        
set _tags to _record's assigned tags
        
set _newTags to {}
        
repeat with _tag in _tags
            
if _tag's name is not _tagName then
                
copy _tag to end of _newTags
            
end if
        
end repeat
        
set _record's assigned tags to _newTags
    
end tell
end removeTag