OCR With UNPDF
Summary: Uses optical character recognition to add a text layer in a scanned PDF.
Requires: EagleFiler, UNPDF
Install Location: ~/Library/Scripts/Applications/EagleFiler/
Last Modified: 2019-10-02
Description
This script uses UNPDF to perform optical character recognition on a scanned PDF file. It creates a Microsoft Word file with the text of the PDF and then imports both the PDF and the Word file into EagleFiler.
There are several ways to use this script:
- Save the script as an application and drop PDF files onto it to OCR them and then import them into EagleFiler.
- Save the script as an application and set it as the target of your scanner’s software. For example, go to the Application tab of the ScanSnap Manager’s settings, click “Add or Remove,” and choose the script application.
- Attach the script to a folder as a folder action and save files into that folder.
Installation Instructions · Download in Compiled Format · Download in Text Format
Script
property _format : "doc"
on open _files
my ocrAndImport(_files)
end open
on adding folder items to _folder after receiving _files
my ocrAndImport(_files)
end adding folder items to
on ocrAndImport(_files)
repeat with _file in _files
set _sourcePath to _file's POSIX path
set _destPath to my ocr(_sourcePath, _format)
end repeat
set _files to {_file, POSIX file _destPath}
tell application "EagleFiler"
import files _files
end tell
end ocrAndImport
on ocr(_sourcePath, _format)
set _basePath to my removeExtension(_sourcePath, "pdf")
set _destPath to _basePath & _format
my unpdf(_sourcePath, _destPath, _format)
return _destPath
end ocr
on removeExtension(_path, _extension)
if _path ends with _extension then
set _end to (length of _extension) + 1
set _path to characters 1 thru -_end of _path as Unicode text
end if
return _path
end removeExtension
on unpdf(_sourcePath, _destPath, _format)
set _unpdf to "/Applications/deskUNPDF for Mac/Command Line Scripts/deskUNPDF"
set _script to _unpdf's quoted form & " "
set _script to _script & "-convert -silent -closeOnExit -autolaunch false "
set _script to _script & "-outfile " & _destPath's quoted form & " "
set _script to _script & "-outputType " & _format's quoted form & " "
set _script to _script & _sourcePath's quoted form
do shell script _script
end unpdf