Suggestion: add search method to find duplicates within smart groups

The smartgroup features is really great, but I miss one thing to it.
It would be awesome if it’s possible to implement a new search method to find all duplicates in Eaglefiler within the smart group.
When I download lots of webarchives from links (Import URLs function), I know I have duplicates of url links, so the best thing would be to see all the url:s in Eaglefiler like you can in Devonthink and when delete all webarchives with same url links that is duplicates and you only have 1 unique link of each url link.
Or does this feature to find duplicates exist already now in Eaglefiler?

EagleFiler normally tracks and removes duplicates by content, not URL. For bookmarks, this amounts to the same thing, but for other file formats you can end up with multiple files with the same URL and different content. (Sometimes this is desirable, e.g. if you want to track how a page has changed over time.) This script will scan the selected records and remove the duplicates by URL.

Thanks for the script Michael!
I have try to run the script after I place the script in the right location.
I marked all my webarchive files and run the script from the script icon.
But nothing happens. I have 38840 records and I am sure I have duplicate url links. So something must be wrong here.

Can’t you implement a new column in EagleFiler there you can show the whole url link, like you can in devonthink?
If you can do this, when it’s possible to delete duplicate url links manually, but now it’s impossible to know which links who are duplicates.

Do you mean you ran it from Apple’s script menu?

Did you select the records first? With a large number of records, it might simply be slow, and you might not know if the script is still running. How about testing it with a smaller number of records?

Do you want a source URL column other than to remove duplicates manually?

Yes, that’s correct!

Yup, I have try to duplicate one webarchive and try to only mark 3, the two identical webarchives and one unique, one of the two identical archives doesn’t go to the trash.

Yeah, that would be nice. But it would also be great if the remove duplicate script is working too. But it shouldn’t hurt to have this source URL column too :slight_smile:

In that situation, only one should go to the trash. If it’s still not working as you expect, please create a test library containing a few of your Web archives and e-mail it to me so I can see what’s happening.

The remove duplicate script works now after I created a new database and try to copy a source url and have some unique source url links too. EF remove the duplicates and only keep one link from the duplicates. So now the script work like it should :slight_smile:

So I think the remove duplicate script have some limit of a quantity number of webarchives to work. The question is how many webarchives can I have in a database before I exceed this limit for the script to work?

I think it will work with any number; it will just take a while. If you run the script in Script Editor instead of from the menu, you can see that it keeps running. I’ve optimized the script, and it’s much faster now, but it will still be slow when dealing with thousands of records. For more speed, I would need to implement this in EagleFiler itself.

ok, yeah I try to move about 12000 records from one database to an other database. It would be great if I could move files from one database to an other without copy them. So I move the files from finder from one database and move them to the other database “to import” folder and after I imported all this records in the other database, the “old” remove duplicate script works, but as you say. This take a really long time and I think it doesn’t finish yet.
So I will try the optimized script asap to see the difference.
edit:
This optimized script was much faster than the old one. Really great work Michael!

But as I say in earlier post, couldn’t you also have a source url column if I want to sort all the source url:s from the list and delete them manually.

That’s breaking the rules. You should not move files in the Finder. In order to move files from one library to another, drag and drop within EagleFiler and then delete the records from the source library after it’s done copying. Or, you can copy the files in the Finder.

How is it going to be faster for you to go through thousands of URLs yourself?

Hmm, it’s really great that a script fix this duplicate removement. I don’t know how long time it takes. But maybe it can be good to have the option to see the source url links too in a column. But it not as necessary for me like it was before the optimized script.

I’ve added this in EagleFiler 1.5.