OCR With UNPDF

Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, UNPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-10-02

Description

This script uses UNPDF to perform optical character recognition on a scanned PDF file. It creates a Microsoft Word file with the text of the PDF and then imports both the PDF and the Word file into EagleFiler.

There are several ways to use this script:

Installation Instructions · Download in Compiled Format · Download in Text Format

Script

property _format : "doc"

on open _files
    
my ocrAndImport(_files)
end open

on adding folder items to _folder after receiving _files
    
my ocrAndImport(_files)
end adding folder items to

on ocrAndImport(_files)
    
repeat with _file in _files
        
set _sourcePath to _file's POSIX path
        
set _destPath to my ocr(_sourcePath, _format)
    
end repeat
    
set _files to {_file, POSIX file _destPath}
    
tell application "EagleFiler"
        
import files _files
    
end tell
end ocrAndImport

on ocr(_sourcePath, _format)
    
set _basePath to my removeExtension(_sourcePath, "pdf")
    
set _destPath to _basePath & _format
    
my unpdf(_sourcePath, _destPath, _format)
    
return _destPath
end ocr

on removeExtension(_path, _extension)
    
if _path ends with _extension then
        
set _end to (length of _extension) + 1
        
set _path to characters 1 thru -_end of _path as Unicode text
    
end if
    
return _path
end removeExtension

on unpdf(_sourcePath, _destPath, _format)
    
set _unpdf to "/Applications/deskUNPDF for Mac/Command Line Scripts/deskUNPDF"
    
set _script to _unpdf's quoted form & " "
    
set _script to _script & "-convert -silent -closeOnExit -autolaunch false "
    
set _script to _script & "-outfile " & _destPath's quoted form & " "
    
set _script to _script & "-outputType " & _format's quoted form & " "
    
set _script to _script & _sourcePath's quoted form
    
do shell script _script
end unpdf