How do I figure out why indexing is hanging?

Hi,

I’ve been trying to import a lot of files into EagleFiler. I used it a while back (in the 1.1 or 1.2 timeframe) but got out of the habit for a while. Now I’m trying to catch up by bringing all of the documents and notes I’ve saved in the meantime into an existing EagleFiler library.

I upgraded to 1.3.5 and started importing. My problem is that while the import went fine, indexing did not. After several attempts to get the indexing to complete, I followed instructions I found in another indexing thread and asked EagleFiler to rebuild the indices.

At this point, the Activity window shows a single task, indexing records, and the progress bar is stuck at 16749 out of 81866 with an estimated completion time of 17 hours or so. That estimate is up from 2 or 3 hours and EagleFiler has been stuck there for over an hour.

This all brings me to my question: assuming that a particular file is causing my trouble, how do I tell which file is the culprit? I.E. which one is #16749?

The EagleFiler.log shows a line like:

tools.pyc:155 Spotlight importer found no text content for…

for the last few lines.

The main system log (“All messages” under Leopard) shows a message from mdimport stating that it imported those same files “with no plugIn”.

Please help! I’d really like to get back to using EagleFiler but I’m a bit stuck unless I can get the indexing to complete.

Thanks,

Dave

If you click the “x” button to cancel the indexing, EagleFiler will show in the log which files it was working on at the time. Secondly, if you stop the indexing and open the library again, EagleFiler will resume the indexing process. It indexes the files in a random order, so even if there’s a troublesome file, chances are good that it will get a lot more of the files indexed before encountering it again.

Thanks for the quick response!

I had to force quit the app earlier but I’ve restarted and it is reindexing. At this point, it seems to be stuck again so I’ll wait a bit longer to make sure and then cancel so I can see what the trouble is.

I’ll post again once I have more information.

Thanks,

Dave

Well, I was hoping to have more information by now but EagleFiler is now stuck canceling the indexing operation. The indexing got stuck at 34653 of 81865 over night and so I clicked cancel when I noticed about 2 hours ago. Two hours later and we’re still canceling.

According to top, efindextool is hogging one whole CPU (100% with an RSIZE of around 1200K). EagleFiler itself is putting along at around 1 or 2 percent with an RSIZE of only 25MB so that’s not bad at all.

I’ve sampled both but I don’t see anything immediately obvious. Shall I email or attach those samples?

Dave

There are two phases to indexing. First, EagleFiler reads a bunch of files (with help from eftexttool), and then it adds them to the index (using efindextool). The first part can be cancelled at any time: EagleFiler will kill the current eftexttool processes. The second part cannot be cancelled because the Search Kit APIs are not interruptible, and killing the efindextool process might corrupt the index. So if you try to cancel while efindextool is running, EagleFiler simply waits for it to finish. (If you need to force-cancel the indexing, it’s best to kill efindextool yourself. Then EagleFiler itself can be quit cleanly.)

If updating the index is hanging, there’s probably nothing EagleFiler can do about that. Either the index file is damaged or there’s a bug in Search Kit, or else everything is fine and it’s just taking a really long time. This last possibility is doubtful because EagleFiler only updates the index in small bits at a time.

Probably resetting the indexes and turning off indexing for phrase searches would really speed it up.

If you e-mail me a sample of efindextool, I’ll take a look and pass it on to Apple.

Thanks, that explains the behavior I’m seeing. Somewhere along the line, efindextool gets wedged trying to update the index but EagleFiler can’t cancel that process. (As a feature request, perhaps EF shouldn’t offer to cancel indexing at that stage or at least warn the user that it may not work?)

Okay, I’m trying the reset again. I think this is reset #3 or 4 and at least one of those times I manually deleted the Records index file before rebuilding.

I’ve emailed you the sample from efindextool. Hopefully that will give you or somebody at Apple a hint as to what is going wrong.

Thanks,

Dave

Perhaps the latter. Offering to cancel is definitely the right thing, since in most cases it will be able to do so cleanly in just a few seconds.

Thanks. This is completely different from other samples I’ve seen, so I’m going to paste it below in case other people run into this problem.

Analysis of sampling efindextool (pid 28324) every 10 milliseconds
Call graph:
987 Thread_2503
987 start
987 _start
987 main
987 processActions
987 SKIndexAddDocumentWithText
987 IAIndexAddDocWithTextStream
987 TIAIndex::Add(OpaqueIADocKeyRef*, OpaqueIATextStreamRef*, unsigned char)
987 TermIndex::AddDocInternal(IADoc*, unsigned int, OpaqueIATextStreamRef*)
987 TermIndex::DefaultInvertDocumentToUpdateSet(OpaqueIATextAnalysisRef*, IADoc*, OpaqueIATextStreamRef*, unsigned long*)
987 IADefaultTokenizerGetNextToken
987 UniCharParserGetNextWord
987 _CFStringTokenizerTokenize
987 MeCab::TaggerImpl::parseToNode(char const*, unsigned long)
987 MeCab::Viterbi::analyze(char const*, unsigned long)
987 MeCab::Viterbi::viterbi(char const*, unsigned long)
987 MeCab::TokenizerImpl<mecab_node_t, mecab_path_t>::lookup(char const*, char const*)
987 MeCab::TokenizerImpl<mecab_node_t, mecab_path_t>::lookup(char const*, char const*)

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
MeCab::TokenizerImpl<mecab_node_t, mecab_path_t>::lookup(char const*, char const*) 987

Previously, I’ve seen about half a dozen cases where efindextool got stuck while trying to update the index file. As I said, I think this was due to a damaged file or a bug in Search Kit.

From your sample, it looks like it was not in the process of reading or writing to the index file. (Thus, I would not expect resetting it to help.) Rather, it got stuck trying to tokenize (split into words) the text that EagleFiler asked it to index. MeCab seems to be a tokenizer for Japanese, so most likely the problem is triggered by indexing a particular document in that language. If we can find the text in question and reproduce the problem, perhaps Apple can fix the tokenizer.

If it gets stuck indexing again, try Control-clicking on the .eflibrary file in the Finder and choose “Show Package Contents.” Inside, there will be a “Temporary Items.noindex” folder that contains a bunch of numbered folders. One of the more recently modified ones will contain a file called “Queue.plist”. This contains the data that EagleFiler sent to efindextool. If you open it, you should be able to see the text that’s being added to the index and also the files that it came from. If you can figure out which file is causing the hang (e.g. by adding it to a new library and seeing if that causes its indexing to hang) please send it to me.

Okay, I think I’ve found at least one of the culprits.

It turns out that I’ve got one or more binary files with an extension of VGA that are getting interpreted as SimpleText documents for some reason. In my most recent hang, the file is 6.6MB of binary data and that’s evidently choking the MeCab::TokenizerImpl::lookup() call.

It may be that it will eventually finish but the last few times I let it sit for several hours before giving up so I’m not hopeful.

I guess the follow-on question now is whether there’s a way to mark that file as binary so that I can keep it with its related files rather than have to remove it from EagleFiler’s control. Is there a way built-in to EagleFiler to say “don’t index the contents of this file”?

Barring that, does anybody know how to get EagleFiler (and the Finder, for that matter) to treat the file as binary?

Thanks,

Dave

Not currently, but that’s a feature I’m considering.

EagleFiler treats a file as text if the OS says its UTI is text, if it starts with a BOM, or if the file command says it’s text. You can use this Terminal command:

defaults write com.c-command.EagleFiler NonTextExtensions -array vga

to tell EagleFiler not to treat the “.vga” files as text. However, since it then won’t recognize the file type, EagleFiler will try to index them using Spotlight importers, and it’s possible that it will end up trying to put the same text into the index. It’s worth a try, though.