Import Web PDF Screen Readability Issue

Michael,

I’m a long-time user of DEVONthink who has been searching for an alternate solution that would keep my files accessible via the filesystem, (which DEVONthink does not.) I’d like to be able to access my data in the future even if the app I’m using to file stops being developed.

Obviously, this leads me to think about shifting either to EagleFiler or Together.

After evaluating the two apps for a bit, I’m strongly leaning towards EagleFiler for a variety of reasons, (first among them being that Together seems a bit buggy.) However, there is one show stopper that is keeping me away from EagleFiler at the moment.


  • If I import a web PDF into Together from a URL, I get a nice single page PDF with no extensive borders around the web page.

  • If I import a web PDF into EagleFiler from a URL, I get a multi-page PDF which chops up the web page with page breaks. EagleFiler also adds an (unwanted) extensive border around the web page.

The extensive border EagleFiler adds is annoying. The page breaks are the show stopper for me. It all adds up to a web PDF that lacks screen readability.

I have no idea where the difference in the way EagleFiler and Together import Web PDF’s comes from. But I strongly prefer Together’s way of doing so, in that it creates a far more screen-readable version of the web page.

But, as stated, I prefer the rest of EagleFiler over Together. So I wish you’d change the way EagleFiler imports web PDF’s, or at least add an esoteric preference to allow users to import web PDF’s that lack borders and page breaks.

(I’d assume I don’t need to provide sample web PDF’s for you to examine, as you can test the difference between your app and Together for yourself.)

Thanks for reading, and keep up the good indie developing work.

I consider the multi-page PDF a feature. You can set a multi-page PDF to view in continuous mode or in paginated mode. Single-page PDFs are essentially unprintable—either you print at actual size and only the first page fits, or you print the entire document on a single page, at an absurd scale.

Aside: When I used Together to create a PDF of my blog, it cut off the bottom inch or two. I’m not sure whether that’s an application bug or a bug in the OS related to creating this sort of PDF.

EagleFiler has an esoteric preference for setting the margin when creating a multi-page PDF. The default is 36 points (half an inch), but you could set it to zero if you wanted. (Beware that removing the margin may cause clipping when printing.) Then there would be no white border, although you would still see page breaks. How does that work for you?

I’d be interested to know why the margins and/or page breaks matter so much to you. I prefer to read a page at a time, and it’s very natural for me. Do you use a window that’s much smaller than the printed page? If reducing the margin doesn’t address the problem for you, I will consider adding an option for single-page PDFs, but I’d like to try to understand the underlying readability issue first.

“EagleFiler has an esoteric preference for setting the margin when creating a multi-page PDF. The default is 36 points (half an inch), but you could set it to zero if you wanted.”

Very cool. Solves the problem.

“I will consider adding an option for single-page PDFs, but I’d like to try to understand the underlying readability issue first.”

Great. Explanation below. The beauty of esoteric prefs is you can satisfy the edge cases without confusing everyone else. (And kudos to you for adopting the Eudora style of clickable esoteric pref links…)

“Single-page PDFs are essentially unprintable—either you print at actual size and only the first page fits, or you print the entire document on a single page, at an absurd scale.”

Partially true. I’m already aware of the printing issue, but there are workarounds.


“I’d be interested to know why the margins and/or page breaks matter so much to you. I prefer to read a page at a time, and it’s very natural for me. Do you use a window that’s much smaller than the printed page?”

Let me try to explain my workflow.

I skim through many, many web pages per day. I see a web page I’m interested in, and wish to 1) read it later in a form as close as identical to how it appears in my browser and 2) archive (and tag) it for retrieval (and content search) should I want it in the future.

The obvious solution would be to use webarchives, as they display the same as the original URL does.

However, webarchives are a bit wonky in needing to re-connect to the internet in odd ways, and more importantly, I want to be able to access these documents 5 or 10 years down the line. I have no confidence that Apple will continue to support the file format that far out. (Obviously, Adobe could choose to orphan PDF as well, but I feel better about the chances with PDF than webarchive. And, of course, PDF leaves me free to leave the Apple platform down the line, even if I don’t think that likely at the moment.)

So I essentially want web PDF’s that mimic webarchives as closely as possible. And I find the arbitrary page breaks (and margins) incredibly distracting to reading the documents. It may seem like a minor deal to you, but my desired workflow involves reading many, many web pages in PDF form, and the page breaks are distracting enough for me to be trying to explain the issue to you in this kind of detail.

FWIW, I do most of my work on a small screen laptop, but I’d object to the page breaks just as much if I were working on a large screen. I want to read the web pages as they were created, without the flow being chopped up arbitrarily.

I do relatively little printing, and am willing to go through workarounds to print long single page PDF’s when necessary.

I’d be interested to know what your workarounds are, since I’m sure people will ask if I add this feature.

Since the Web archive format is easy to reverse-engineer, and most of the code is open source, I’m not worried about it going away. I expect it won’t be long before it’s readable on Windows. Worst case, I will convert my Web archives to another format when the time comes.

PDF is about as future-proof as you can get aside from plain text. However, the cost is that you throw away some flexibility now, e.g. about font choices and sizes, page width, access to semantic markup, etc.

I’m not trying to convince you one way or the other; I just want to explain some of the issues for whoever else is reading this.

Fair enough. If you send me your e-mail address, I’ll contact you regarding the development of the single-page PDF feature.

Hi again Michael!
Good continuation of the new year!
I am very interested to import single-page PDFs too becuase I think like both of you that Webarchive maybe doesn’t is future proof.
So if you can have an option for single-page PDF like you can in Together. I’m very thankfull for this!

Sorry for the delay. I’ve been playing around with one of your competitors’ v2.0 beta.

There are many, many ways to skin this cat.

The simplest to explain would be to use the built-in Preview Automator action “Render PDF Pages as Images” to convert the PDF to TIFF format. Then choose any of multiple ways to automate splitting the TIFF into page sized chunks for physical printing.

But again, in my workflow, I print a small fraction of a percentage of all of the documents I want to archive into PDF for long-term search. I understand this is an edge-case requirement, which is why I was thinking an esoteric pref would be perfect. That way, no one will be requiring support from you should they need to print out long vertically sized documents.

I’m perpetually in a state of email disarray. It’d actually be easier if you notified me via this thread if you ever getting around to implementing this. I’ll check back periodically, since the competitors’ beta doesn’t seem to really fit my requirements as well as I think your product would.

That will work, but I don’t think it’s a suitable solution for the average user.

EagleFiler 1.4.5 adds an experimental preference to create single-page PDFs for reading on-screen. It has not been extensively tested, and if it does make it into EagleFiler proper it would probably be a real option in the preferences window, but hopefully this will be useful in the interim. You can turn it on or off by clicking these links.

Agreed!

Hence why I was thinking it belonged as an esoteric pref rather than in the Preferences Window. The user should assume responsibility for the consequences should they wish to capture pages in this manner, which means that a user who isn’t aware of esoteric prefs should never have the choice.

Woo-hoo!

I shall test and report back.

EagleFiler 1.5 adds a new Web page format called PDF (Single Page).