Scalability?

Over at the 43Folders board (link below), someone posted that they imported 4000 web archives into EF and found it to be very slow. There wasn’t any information about the specifics of the Powerbook he was running EF on.

Has anyone else loaded a lot of data into EF and found issues with slowness or other problems with large databases?

Thanks.

http://board.43folders.com/showthread.php?p=6471#post6471

A few notes:

  • EagleFiler is designed to scale up well. By that I mean that the memory use and database size will remain reasonable even with thousands of files or millions of e-mail messages.

  • It’s a new product. The priorities for 1.0 were getting the architecture right, making it work reliably, and adding the necessary features. There has not, as yet, been a lot of optimization or performance tuning.

  • If you import 4,000 Web archives, the first thing it will do is start indexing them all. EagleFiler is multi-threaded, so it will let you work during the indexing, but this will slow down what you’re doing in the foreground, especially on a single-core Mac. On my PowerBook, with a similar EagleFiler library, I find browsing to be reasonably fast, although there are some pauses when I make changes to the library.

  • Performance is much better on Intel-based Macs. Obviously, the Intel Macs are relatively new and fast, but I what I mean is that compared to “equivalent” PowerPC Macs, it will run faster on Intel. The kind of processing that EagleFiler does is just better suited to the Intel chips.

  • Don’t forget that EagleFiler supports multiple libraries. My goal is to make libraries with thousands of files fast (I have many thousands of files and several million messages myself), but if the current code isn’t fast enough on your current hardware, you can increase the speed by separating your files into two or more libraries.

  • Going forward, there are a number of areas that I want to optmize, but it would really help if people could contact me at eaglefiler@c-command.com about specific things that they are finding to be slow.

Awesome, thanks for the information Michael.

I have to say, I’m very impressed with the attention and support you give.

I use a Scansnap to obtain large numbers of un-OCRed PDF files. PDFPen is great for annotation, and also to remove unwanted pages.

For storage, I moved from DevonThink to plain Finder. To Yep. And now trying out EF, Yojimbo. Tried Journler, which I don’t think is meant for large numbers of PDF storage.

The above post has made me lean towards EF. However, I’ve imported a subset of my PDFs: about 1000 files. At this time, things like drag and drop into folders are sluggish enough that I’m reluctant to do it. If EF is going to get faster, I think I’ll wait till it does, before organising my imported PDF files.

Any chance the developer can extend the 30-day trial period for this purpose?

Other things on my wish list:
When selecting multiple PDF files, can we see multiple icons of the PDFs, instead of just “3 PDFs selected”. In fact, could we then drag and drop those icons?

I batch scan my PDFs, and so the “creation date” is not meaningful. Unfortunately it’s uneditable, even via Applescript. Any chance of using this field, especially in the event that smart folders are implemented? At this time, I’m considering having tags that say “10October”, “11November”, etc.

Anyway, the ability to find files via a calendar selection (ala iPhoto) would be nice.

Real smart folders, including the ability to “Not” show files based on some criteria.

A keystroke to delete the current search text.

A list of keyboard shortcuts (ala gmail).

The ability to secondary sort all the documents. e.g. sort by container, and then secondary sort by kind (so that I can group the folders together).

Applescript renaming of the item titles is quite slow (using set title of xxx to yyy)

3 panes in horizontal format?

One question:
Why do people use EagleFiler to archive their email from mail.app? Mail.app seems capable of storing email by itself, and aren’t we just swapping one storage mechanism for another?

Thanks.

dmc asked
Has anyone else loaded a lot of data into EF and found issues with slowness or other problems with large databases?

I have a lump of data amounting to about 20GB I’ve imported, mounted in a sparse disk image on a buffalo terrastation as a network attached storage device. - Not i would have thought the fastest implementation.
I do not call it a data base because it is so unstructured - really just a collection of mainly graphic and pdf files. i went to ef so I did not have to sort it out, as long as I can find what I need it can stay a disorganised mess.

Adding data took several hours, but finding stuff is almost instantaneous.

I would find it useful to know when ef has finished indexing - it does not seem to show in the activity box, and sometimes i want to put the system to bed.

Ian

If there are particular tasks that you find slow, please contact me at eaglefiler@c-command.com and describe exactly what you did. I will try to duplicate the problem here and then eventually send you a pre-release build of EagleFiler that’s faster than that regard. This is the best way to ensure that I’m aware of the areas that matter to you.

No, but I sometimes reset the trial period after a major update.

Could you tell me more about what you’re trying to do? How are you ending up with three PDFs selected that you want to do something with? Why not just drag and drop them from the records list?

You can use AppleScript to get the file from EagleFiler and then tell the Finder to change its creation date. I do plan to make the creation date more accessible from the interface, though.

Noted. In the meantime, you could probably use tags for this.

That’s on the to-do list.

What do you suggest? Right now you can use Command-Option-F, followed by Delete.

What do you mean?

This is on the to-do list.

Again, please contact me via e-mail and send me the script that you’re using.

This is on the to-do list.

EagleFiler provides more powerful searching, much faster browsing and searching when there is a lot of mail, more efficient on-disk storage, and storage in a standard format. Also, moving messages out of Mail and into EagleFiler speeds up Mail.

EagleFiler 1.2 is much faster.

In my short time playing with 1.2, it does seem much faster.

As usual, nice work. Thanks.

Here’s a data point: am using EF 1.2 with a 1.2 Gb library collection, of which roughly half are 80K+ emails, and most of the rest are several hundred PDF’s and web archives. I can confirm that EF works impressively well on a modern high-end MacBook Pro. RAM use is in the dozens of MB’s and CPU use is 0% when idling - IOW, perfect. I can definitely see myself using this for many years to come, scaling up easily as more contents get added (and Moore’s law will do the rest).

Kudo’s and many thanks to M.T. for getting everything “just right”.

This was added in EagleFiler 1.4.

Here’s a list of shortcuts.