Criteria for 2 files to be "duplicates"?

I ran into some unexpected behavior today that is forcing me to re-evaluate how I thought EF works. I have an old PDF file already in EF. I then moved it to my iPad, edited it there, and tried to drop it back into EF. I expected that this would give me 2 files, one of which had the original version, and the second with all my edits (annotations such as highlighting and comments). Instead, I got a message that the new file was not imported because it already existed. Yet the sizes of the two files are quite different: 523 KB versus 468 KB.

When I looked at the manual, carefully, it says

Normally EagleFiler will prevent you from importing a file if a file with the same contents (in the data fork) is already in the library.
Is my problem that all the annotations are outside the data fork? In any case, is there an easy way to get the behavior I want?

One option is to turn off the esoteric preference for detecting duplicates temporarily, import the updated versions, then turn the esoteric preference back on. But is there any way to redefine what EF treats as a “duplicate,” so that two files must be completely identical in order to be treated as duplicates??

thanks
Roger

Which iPad app are you using? All the ones I’m aware of store edits in the data fork, so that should cause EagleFiler to see them as different files.

It’s a regular preference, but turning it off shouldn’t be necessary. When EagleFiler reports the duplicate, there is a button to reveal the file in the library that it thinks is identical. Does this show the file that you expected?

Also, you can use the md5 command in Terminal to get the checksum of a file:

$ md5 EagleFiler-1.8.dmg 
MD5 (EagleFiler-1.8.dmg) = 1f9eb4d33d2d63a94be1d5b1500c5642

This is what EagleFiler uses to determine duplicates. Do your two PDF files have different checksums?

1 Like

The problem is more basic than I realized. The files can be very different, and still result in refusing to import the second one.

EXPERIMENT: Two files, same file name but different number of pages, different file size, and different MD5 checksumsum.

MD5 (/Users/Rbohn/Desktop/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf) = c8df9839b8f50efba395756b5540d8f5

MD5 (/Users/Rbohn/Documents/REB docs '12/Teaching/Cases & teaching material 2011/Files/Capstone industry dynamics/Readings/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf) = 3fcd0dbcd9a8afd48e7b84de7e16d96d

The first file is one I am trying to drop in from the Finder. Second one is the one already in EagleFiler.
The resulting error message is:

Duplicate Record: “What Litigation Finance Is Really About | The New Yorker.pdf” matches “What Litigation Finance Is Really About | The New Yorker.pdf” in library

Console error log. I have selected all lines with the string ‘Litigation Finance’ in them:

2017-10-15 23:10:36.413939-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:11:42.228418-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:12:02.677481-0700 0x7fe4 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:811 index.pyc:811 [LogFileIndexing] File needs indexing because ctime or mtime [2017-10-16 06:11:28 +0000, 2017-10-02 23:58:25 +0000] is later than indexing date (2017-10-02 23:58:27 +0000): <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf>
2017-10-15 23:12:03.003209-0700 0x7fe4 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:735 [LogFileIndexing] 0.33 seconds to extract text from <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf> of size 468 KB
2017-10-15 23:12:03.003358-0700 0x7fe4 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:743 [LogFileIndexing] Adding text of size 15 KB to indexing queue: <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf>
2017-10-15 23:12:03.007764-0700 0x7fdf Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:702 u"/Users/Rbohn/Documents/REB docs '12/Teaching/Cases & teaching material 2011/Files/Capstone industry dynamics/Readings/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf")]
2017-10-15 23:12:33.915922-0700 0x7fdf Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:710 u"/Users/Rbohn/Documents/REB docs '12/Teaching/Cases & teaching material 2011/Files/Capstone industry dynamics/Readings/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf")]
2017-10-15 23:15:45.076267-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:16:23.964257-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:16:36.830706-0700 0x8fd6 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:811 index.pyc:811 [LogFileIndexing] File needs indexing because ctime or mtime [2017-10-16 06:16:04 +0000, 2017-10-02 23:58:25 +0000] is later than indexing date (2017-10-16 06:12:03 +0000): <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf>
2017-10-15 23:16:37.109341-0700 0x8fd6 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:735 [LogFileIndexing] 0.28 seconds to extract text from <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf> of size 468 KB
2017-10-15 23:16:37.109515-0700 0x8fd6 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:743 [LogFileIndexing] Adding text of size 15 KB to indexing queue: <PDFRecord [9293] What Litigation Finance Is Really About | The New Yorker.pdf>
2017-10-15 23:16:37.113539-0700 0x8fd3 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:702 u"/Users/Rbohn/Documents/REB docs '12/Teaching/Cases & teaching material 2011/Files/Capstone industry dynamics/Readings/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf")]
2017-10-15 23:16:54.523438-0700 0x8fd3 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] index.pyc:710 u"/Users/Rbohn/Documents/REB docs '12/Teaching/Cases & teaching material 2011/Files/Capstone industry dynamics/Readings/IID week 1 readings/What Litigation Finance Is Really About | The New Yorker.pdf")]
2017-10-15 23:17:12.874953-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:25:31.924559-0700 0x15c1 Error 0x0 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] jobs.pyc:618 Error: <AddFileJob What Litigation Finance Is Really About | The New Yorker.pdf> DuplicateRecordError: DuplicateRecordError
2017-10-15 23:25:47.203557-0700 0x15c1 Error 0x1c03b 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Preparing to search for “"what litigation finance”: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:25:47.203965-0700 0x15c1 Error 0x1c03b 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Tag search for “"what litigation finance”: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:25:47.215209-0700 0x15c1 Error 0x1c03b 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Searched 3 indexes (8224 documents) for “"what litigation finance” and found 0 records (partial words: 0, phrases: 1): 0.01 seconds (0.01 CPU seconds)
2017-10-15 23:25:47.466299-0700 0x15c1 Error 0x1c03e 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Preparing to search for ““what litigation finance””: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:25:47.466512-0700 0x15c1 Error 0x1c03e 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Tag search for ““what litigation finance””: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:25:47.496510-0700 0x15c1 Error 0x1c03e 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Searched 3 indexes (8224 documents) for ““what litigation finance”” and found 5 records (partial words: 0, phrases: 1): 0.03 seconds (0.02 CPU seconds)
2017-10-15 23:26:02.646887-0700 0x15c1 Error 0x1c047 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Preparing to search for ““what litigation finance””: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:26:02.647107-0700 0x15c1 Error 0x1c047 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Tag search for ““what litigation finance””: 0.00 seconds (0.00 CPU seconds)
2017-10-15 23:26:02.667602-0700 0x15c1 Error 0x1c047 463 14 EagleFiler: (MJTFoundation) [com.c-command.EagleFiler.MJTLogger] init.pyc:101 [LogSearchSpeed] Searched 3 indexes (8224 documents) for ““what litigation finance”” and found 5 records (partial words: 0, phrases: 1): 0.02 seconds (0.02 CPU seconds)

What have I done wrong?
Roger

Did you edit the one in the library after importing it into EagleFiler? If so, you may need to use the Update Checksum command on it. You can also use the Verify command to find files in the library that have been modified since their last checksum update.

If you want, you can turn off the indexing debug logging using the links at the bottom of this page.

1 Like