Index Issues

Hi. I’ve been importing a lot of email from Apple Mail and have several indexing questions.

I’ve read the manual (and other threads here) about the indexing options. My initial imports were set to the default phrase indexing option, and after about 200,000 messages the indexing during import began to take hours. I then switched to word index and noticed a change from an estimate of 14 hours to import the next file to 5. Several files later the imports were estimated at 4 hours yet took 5-6 each time. This file now has ~663,500 messages and the .eflibrary file is 5.06GB.

I started a second file for a separate group of Mail folders and immediately set it to word indexing before any import was done. This file continues to import at a much faster speed than my initial file. The latest folder of ~14,000 messages took ~15 minutes to index. This file currently has ~219,000 records and its .eflibrary file is 175MB.

According to the manual:

If you hold down the Command and Option keys when opening a library, EagleFiler will show the Rebuild Indexes dialog. When you rebuild an index, EagleFiler deletes the old index file and builds a new one from scratch.

I followed these instructions for both files yet I’m unconvinced all records in the first file were actually included in the new index type of word searching. Does the total number of messages in a file impact the time required to index? If not, then similar size imports should approximately take the same time to import, and that’s not what I’m experiencing.

I just started another reindex of the first file, selecting Records, Notes and Messages to be reindexed by Word. The .eflibrary file reset as expected, which assures me that it is building a new index of all. However, EF estimates that this process will require ~20 hours to complete. Does that make sense to you? If it can import ~14,000 per 15 minutes (as done in the second file) then shouldn’t it take ~11 hours to complete 663,500?

Is there a practical limit to the number of email messages per file?

Suggestions?

Those sound like abnormally long indexing times. What kind of Mac are you using?

If you contact me via e-mail, I can send you a test version of EagleFiler that will log some information such as which type of indexing is being used for each file and how many documents are in the index.

Yes, and especially for phrase indexing. It’s more work to update an index file that has more records, and file fragmentation can also play a role.

The estimate may not be accurate. When starting from scratch, it’s based on the assumption that each mailbox will take the same amount of time to index as the mailbox that’s currently being indexed. However, 20 hours does seem like too long for that number of messages.

Per library, no. Per mailbox, indexing does get slower if you have more than 100,000 or so messages per index (especially with phrase indexing). On the other hand, once the index is built it will never need to be updated, and searching will be faster than with multiple smaller indexes.