Import performance

I set up import to bring in a large number of files - perhaps 30G worth. The indications looked like it was not going quickly, but I let it run overnight.

In the morning, I found that EagleFiler had stalled, at some point; I had to kill the process. But many files had been imported - about 32,000 - so I decided to try it that way.

Each time I open EF, it takes 2-3 minutes. Then it takes several minutes to re-index. (I want to leave it open all the time, so I trust that those delays will not be important; but they still seem a bit excessive, in terms of your responses to other users, Michael.)

Sometimes the response to a search - always simple, single-word - is almost instantaneous. At other times, the beachball spins and spins…and I go to Spotlight and find what I was looking for quickly.

Now I’m trying to import the balance of my files. I tried to do it in a large clump, as I had done at first; but EF, after a few minutes, started showing up as “not responding” in Activity Monitor, even though it was using 70%-104%(!?) of CPU capacity.

So I backed off and am putting in a few folders at a time. Activity Viewer shows several “barber poles” but Growl notifications of files being inserted are very infrequent. Activity Monitor now says EF is “not responding,” and in fact I can’t command-tab to it; I have to hide obscuring windows to see EF.

I love, and very much need, what EF promises. Any suggestions will be warmly received!

Warmly,
Joel

More observations
If I am reading the Activity Viewer correctly, it is taking 30-45 seconds to import each document. Is that reasonable?

Joel

Do you mean that it was beachballing or that the importing progress in the Activity Viewer had halted?

What kind of processor does your Mac have, and how much RAM do you have?

The indexing scan when you open the library is just checking the modification dates of the files to see if they have changed since EagleFiler last indexed them (since it’s possible to modify the files without going through EagleFiler). If the files had previously been indexed, and they haven’t changed, this should be relatively quick (and it’s in the background, of course).

2-3 minutes to open a library does seem excessive, although your library has more files than most. This is why I’m curious what hardware you have.

I think the fast response is the “normal” speed. However, if you’ve made a change and it’s updating the index (this should be visible in the Activity Viewer), then the search will wait until it’s finished updating the index file (not necessarily until it’s finished indexing everything, but rather until it’s finished with the batch of documents that are currently being written to the index file), which can take a long time if the index is large.

EagleFiler is multi-threaded, so if you have more than one processor core, it can use more than 100% CPU. Which items were showing in the Activity Viewer when it stopped responding? If you wait long enough it should eventually start responding again. It would help if you could use Activity Monitor to sample EagleFiler and then e-mail me the sample files so that I can see what it was doing.

Importing speed has improved a lot in the past few versions, but obviously there’s still more work to be done. Here are some tips:

  • If you deselect all the sources in the source list, EagleFiler won’t have to update the main browser window after each file is imported. This can make a big difference, especially since if you had the Library source selected it would have to display and sort all the files after each addition.
  • It will import faster while you have a menu pulled down, since this postpones interface updates and writing database changes to disk.
  • Importing a folder of files is currently faster than importing a large selection of individual files.
  • Indexing speed is determined by the number of files that have text content, and the amount of text that each has. I’m working on some improvements to speed the indexing and get rid of the long pauses, but one thing that you can do now is reduce the size of the index by putting your files into multiple smaller libraries rather than one huge one.

Unless they’re huge files, that sounds like way too long, so I will be interested to see your sample reports.

It was beachballing, but I could see no progress in importing.

It’s a new MacBook Pro, 15", 2.16 GHz Core 2 Duo, 2G ram, 120G drive; 10.4.9

I suspect the index is large.

What shows in the Activity Viewer seems to be the current import line at the top, which attempts a time estimate (that seems to get longer and longer); the current folder beneath that; and the current file in the folder. Scrollbar says there are other lines, but beachball prevents me from getting to them.

Alas, Activity Monitor doesn’t sample right for me. I tried this with another program that wouldn’t install right, and got the same result: the wait bar on the sample window fills up, indicating “done,” but it won’t go away or do anything, or even allow me to close it. Any ideas?

Were you doing anything else other than importing, e.g. were you moving a file to a different folder?

That’s too bad. I’d really like to know what’s causing the freeze. You could try sampling using the command line. Something like:

sample EagleFiler 10 10 -file ~/Desktop/EagleFilerSample.txt

Also: what kind of files were you importing, and about how large were they?

No.

I did this, and it worked. (Wonder why the GUI version hangs?) I’ve attached the file.

The Activity Viewer predicted 13 hours yesterday; now it is saying (after importing roughly 1/6 of the files) that it will be another 74 hours.

The import is running, however; I will let it keep going. Any idea why I cannot cmd+tab to it? And the beachball does not allow me to get in to do anything - except if I rest on a menu item for, say, 30 seconds, a menu drops down, and importing apparently speeds up (as you said it would) significantly - down to 1-3 seconds per item. Wish I could leave it down…hmm, I’ll check and see if OS X’s accessibility options will allow that.

But right now, it won’t let me even see the EF menu bar.

Files are very diverse; docs, images, ppt files, etc. But mostly Word docs.

It doesn’t look like it completely worked. There are no symbols in the sample file, just addresses such as “0x93268ddb.” Does this happen when you sample other applications? I wonder if there’s something wrong with your OS installation.

Probably, not being able to Command-Tab is related to it beachballing, but I’m not sure why the latter is happening. Did you have all the sources de-selected?

Hmm, for reference, on my machine (also a MacBook Pro, Core 2 Duo) 1.1.6 takes about 1 second per item without holding down the menu. Is it much faster for you if you import into a new library rather than into the existing large one?

I went through hours with SuperDuper!'s author, trying to see why it wouldn’t work, and he suggested something similar. Also couldn’t sample. At his recommendation, did an archive/update of OS X. Didn’t seem to change anything.

I have never seen a sample dump, so didn’t know if what I sent was normal.

Yes.

I will try that and report back.

Sorry, I don’t know what to suggest other than a clean install.

It should look something like this:

Analysis of sampling pid 208 every 10.000000 milliseconds
Call graph:
    300 Thread_0f07
      300 0x5f471
        300 0x5f54a
          300 NSApplicationMain
            300 -[NSApplication run]
              300 0x6cea
                300 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
                  300 _DPSNextEvent
                    300 BlockUntilNextEventMatchingListInMode
                      300 ReceiveNextEventCommon
                        300 RunCurrentEventLoopInMode
                          300 CFRunLoopRunInMode
                            300 CFRunLoopRunSpecific
                              300 mach_msg_trap
                                300 mach_msg_trap

I guess there is a volume issue in the db. I opened a new library and began importing into it, and behavior is pretty much what you report, Michael - 1-2 seconds per item with no menus held down; faster, with.

Searches on the 35,000-record original library are basically instantaneous.

Questions that are probably in the manual: Can I search across multiple libraries at the same time? Is combining libraries a big deal, or desirable? How about splitting up libraries, if performance degrades?

I really appreciate your personal and patient attention!

Warmly,
Joel

Right—as the library gets larger, it takes longer to add additional records. I think most of the slowness comes not from the database but from EagleFiler updating the user interface as records are added. This is also the period during which it will beachball. I’m working to speed that up.

I’m curious to know how many records were in the library when you said it was taking 30–40 seconds per record. I have a test library with about 10,000 records, and it takes about 2 seconds to add each additional one.

Sorry, that’s not currently possible (except via Spotlight).

There is not currently a way to move files from one library to another except to import the files from one into another and then delete the originals. Importing from another library does not preserve the EagleFiler metadata such as tags and notes, although this will be addressed in an upcoming release.

As to whether multiple libraries are desirable, I suppose it depends on the particular documents and on how they’re organized. Personally, I find it convenient to organize my files into different libraries, based on how I access them. But if there’s no natural way to divide your documents, then I guess it would only make sense for performance reasons. In that case, of course, it would be much more efficient to build multiple libraries from the beginning, rather than to make a huge one and then try to split it up.

EagleFiler 1.2 makes importing much faster and also allows you to drag files between libraries.