Go Back   C-Command Forums > EagleFiler

Reply
 
Thread Tools Display Modes
Old 11-27-2009, 03:31 PM   #1
nonsequito
New Member
 
Join Date: Nov 2009
Posts: 4
Default Searchable PDFs

Question from a trial user:

I have many PDFs of books I do in my research. Is there a way to get them into EF in a searchable way?

Thanks
  Reply With Quote
Old 11-27-2009, 09:55 PM   #2
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

Quote:
Originally Posted by nonsequito View Post
I have many PDFs of books I do in my research. Is there a way to get them into EF in a searchable way?
If the PDFs contain text, EagleFiler can search them. If the PDFs only contain images, you would need to run them through OCR software, which adds a text layer that EagleFiler and other PDF software can read.
  Reply With Quote
Old 11-30-2009, 07:07 PM   #3
bob
 
Join Date: Oct 2008
Posts: 17
Default

ABBYY FineReader and Acrobat can OCR PDFs into Searchable PDFs (text overlay)
  Reply With Quote
Old 11-30-2009, 07:30 PM   #4
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

Another option is PDFpen.
  Reply With Quote
Old 12-14-2009, 11:08 PM   #5
CTS
 
Join Date: Nov 2008
Posts: 89
Default Ocr

I am always confused by the topic of OCR. My Brother printer/scanner appears to come with built-in OCR software. By using the "Brother Control Center" I can select an OCR option and get searchable text.

At the same time, doesn't Image Capture, which comes on the iMac, provide OCR capability? I think I've used that as an alternative to the Brother stuff.

What is gained by using the other programs mentioned in this thread?
  Reply With Quote
Old 12-15-2009, 05:32 AM   #6
brab
 
Join Date: Nov 2006
Posts: 211
Default

Are there plans to integrate an OCR engine in EF?

If not, could someone share a script that automatically does OCR on PDFs using PDFPen on import?

Thanks!
  Reply With Quote
Old 12-15-2009, 10:01 AM   #7
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

Quote:
Originally Posted by CTS View Post
At the same time, doesn't Image Capture, which comes on the iMac, provide OCR capability?
Image Capture does not have built-in OCR, but if you have other OCR software you can probably setup Image Capture to invoke it after scanning.

Quote:
Originally Posted by brab View Post
Are there plans to integrate an OCR engine in EF?
That’s something I’m considering. Sorry, that’s all I want to say for now.

Quote:
Originally Posted by brab View Post
If not, could someone share a script that automatically does OCR on PDFs using PDFPen on import?
That sounds like a great idea for a script! I’ll see what I can do.
  Reply With Quote
Old 12-15-2009, 01:38 PM   #8
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

Quote:
Originally Posted by brab View Post
If not, could someone share a script that automatically does OCR on PDFs using PDFPen on import?
I’ve just written a script to do this, but there seem to be two bugs in PDFpen that prevent it from working. I’ve reported them to SmileOnMyMac, and I’ll update this thread when we have a resolution.
  Reply With Quote
Old 12-16-2009, 05:44 AM   #9
brab
 
Join Date: Nov 2006
Posts: 211
Default

Great news, thanks!
  Reply With Quote
Old 02-03-2010, 08:16 AM   #10
brab
 
Join Date: Nov 2006
Posts: 211
Default

I guess that no update to this thread means there has been no news on this front?
  Reply With Quote
Old 02-03-2010, 09:57 AM   #11
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

Quote:
Originally Posted by brab View Post
I guess that no update to this thread means there has been no news on this front?
Correct. The last PDFpen update was on December 16, just one day after I reported this bug. Hopefully they’ll get to it in the next update.
  Reply With Quote
Old 02-15-2010, 09:39 PM   #12
Michael Tsai
Developer
 
Join Date: Aug 2006
Posts: 4,128
Default

The PDFpen developer sent me a workaround for the problem, so I’ve posted the OCR With PDFpen script.
  Reply With Quote
Old 02-16-2010, 03:50 AM   #13
brab
 
Join Date: Nov 2006
Posts: 211
Default

Great news, thanks!

I have a tiny suggestion: maybe the script could add the tag "ocred" or something similar to help finding pdfs which have not yet been ocred.
  Reply With Quote
Old 02-18-2010, 11:57 AM   #14
cmoore
 
Join Date: Sep 2009
Posts: 19
Default

Another solution would be to use the ocr to EagleFiler script that michael has. Make a target folder, and add that script to the folder as a folder action, have you scanner deposit the pdfs it makes to that folder, and the rest just happens. I know it works because I've been using it to scan my receipts into EagleFiler. My Backlog went away in short order with that script which is here.
  Reply With Quote
Old 02-22-2010, 12:49 PM   #15
kaybee
 
Join Date: Jan 2010
Posts: 8
Default

Thank you, thank you, thank you for the the OCR With PDFpen script! This was really the last feature that made me vacillate between EagleFiler and DevonThink. Now, for me, EagleFiler is a clear winner!
  Reply With Quote
Old 02-22-2010, 08:21 PM   #16
kaybee
 
Join Date: Jan 2010
Posts: 8
Default

There is a 20% discount on PDFpen and PDFpen Pro available until 02/28/10:

http://www.smileonmymac.com/mpu/
  Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Links to specific pages within PDFs marick EagleFiler 1 03-03-2008 01:15 PM
Issues importing OCR PDFs from DevonThink Pro spi EagleFiler 4 02-19-2008 03:04 PM
rotated pdfs chipbrock EagleFiler 2 11-30-2007 10:39 AM
Viewing PDFs brab EagleFiler 5 08-02-2007 04:11 PM


All times are GMT -4. The time now is 07:52 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.