Find doublettes?

ric456 · May 10, 2010, 4:37am

I have a very “tree based” system of File management which just does not work anymore. So I want to try a database based approach. This seems to work rather nice but over the time a lot of double files have been accumulated in different Folders. This is most times unwanted but was sometimes necessary. I am in the process of basically mirroring the file structure and then I want to rearrange the files with tags etc…
This means I had to allow double files because I just don’t have the time right now to review hundreds of “could not import” errors.

Is it possible to find doubles after the database has been created? I have a basic Idea how this should work out and hope to get the basic transition within this week done. Then I’d like to search the doubles and depending on what and where they are, remove them or change the filing accordingly which would be easy if an intelligent Folder with all doubles could be created: Is this possible, e.g. display all files that have no unique Hash?

Michael_Tsai · May 10, 2010, 7:48am

There’s no built-in feature to do this, but it should be straightforward to write an AppleScript that finds such records and assigns a special tag to them so that you can easily display them.

ric456 · May 15, 2010, 11:44am

Thank you for your comment. I have read a little bit on Apple Script. Can you tell me where to find the commands that Eaglefiler understands?

“Export to CSV” shows how a record is “dissected” into its parts but I am missing how the Hash value is accessed. (This is how EagleFiler searches dupliactes right?)

I would basically iterate over all records with the Hash of the first, second etc and mark those that have the same Hash.
Or is there a getAllRecordsWithHash(X) command in FIler?

Sorry, I am a Java/C programmer

Michael_Tsai · May 16, 2010, 5:01pm

You can use the Open Dictionary command in AppleScript Editor.

Right. You’re looking for the checksum property.

Right. Here’s a script to get you started:

tell application "EagleFiler"
    tell library document 1
        set _processedIDs to {}
        set _records to library records
        repeat with _record in _records
            set _id to _record's id
            set _checksum to _record's checksum
            if _id is not in _processedIDs and _checksum is not "" then
                set _duplicates to (*every library record where its checksum is _checksum and its id is not _id*)
                copy _id to end of _processedIDs
                repeat with _duplicate in _duplicates
                    set _tagNames to _duplicate's assigned tag names
                    set _duplicate's assigned tag names to _tagNames & {"duplicate"}
                    copy _duplicate's id to end of _processedIDs
                end repeat
            end if
        end repeat
    end tell
end tell

The italicized expression is like getAllRecordsWithHashExceptSelf(X). The _processedIDs list is so that we don’t count A as a duplicate of B if we had previously found B as a duplicate of A. The check for an empty checksum is to avoid counting folders.