OCR With PDFpen

Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, PDFpen or PDFpen Pro
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2011-09-23

Description

This script uses PDFpen to perform optical character recognition on a scanned PDF file. This makes the contents of the PDF searchable in EagleFiler. Initially, the PDF has only an image layer; after running the script it has an image layer and an invisible text layer.

There are several ways to use this script:

See also the Import From Scanner script.

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

on run
    
tell application "EagleFiler"
        
set _records to selected records of browser window 1
        
repeat with _record in _records
            
set _file to _record's file
            
my ocr(_file)
            
tell _record to update checksum
        
end repeat
    
end tell
end run

on open _files
    
my ocrAndImport(_files)
end open

on adding folder items to _folder after receiving _files
    
my ocrAndImport(_files)
end adding folder items to

on ocrAndImport(_files)
    
repeat with _file in _files
        
my ocr(_file)
    
end repeat
    
tell application "EagleFiler"
        
import files _files
    
end tell
end ocrAndImport

on ocr(_file)
    
tell application "PDFpen"
        
open _file as alias
        
tell document 1
            
ocr
            
repeat while performing ocr
                
delay 1
            
end repeat
            
delay 1
            
close with saving
        
end tell
    
end tell
end ocr