Removing duplicate email messages

I’m using EagleFiler to archive my messages from Apple Mail. Due to some duplicate downloading, and possibly some accidentally repeated archiving, I have duplicates of hundreds, if not thousands, of messages in EagleFiler. I thought I could use the script to Remove Duplicates, but it does nothing. On checking, I see in the AppleScript dictionary that “selected records” excludes email messages for some reason.

Is there no automated way to remove duplicate email messages?

By the way, Andreas Amann’s AppleScript from Mail Scripts, “Remove Duplicates…”, which used to remove duplicate messages within Mail, no longer works under Lion, and will not be updated, according to the author’s website.

EagleFiler checks for duplicates at the file level. For e-mail, this normally means the mailbox level, since multiple e-mail messages are stored in a single mailbox file. EagleFiler currently does not support surgery within a mailbox file. The plan is to eventually add a built-in command for removing duplicate messages. But, for now, there are basically two approaches that you could take:

  1. Make a copy of the mailbox file and use a utility such as formail to remove the duplicates. Then import the (slimmed) mailbox back into EagleFiler. This will not preserve the tags or notes on the messages, but it should be relatively fast.
  2. Create a folder in EagleFiler next to your mailbox. Drag the messages from the mailbox to the folder. EagleFiler will create a separate .eml file for each message. Import the folder into a separate EagleFiler library. Since each message is a file, the normal EagleFiler duplicate checking will kick in. Then you can select the non-duplicate messages and tell EagleFiler to merge them back into a mailbox. This method will
    preserve any message metadata.
1 Like

I also would be grateful for an easy way to remove duplicate emails. I use EagleFiler to store old emails and have ~200,000 stored.

I imported a mailbox with 32,500 messages – for some reason there are now 2 of each (65,000 in total). I’d like to remove the duplicates in this mailbox, as well as the smaller number in my other EagleFiler libraries.

I have read the other posts on good importing practices; I don’t know how to use formail.

So I am using method #2 from Michael’s post: the first step (moving messages to the new folder) is super-slow on my 8-core MacPro – estimated time to complete is >24 hours.

Any suggestions would be appreciated.

Still 21h left on step 1…Is there any way to speed this up? If 65,000 messages takes 3 days, it will take 9-10d to process the remaining 200,000 messages.

Any suggestions will be gratefully received.

I don’t recommend the separate .eml file method if you have lots of messages. It will take a long time, and the OS can run into problems with that many individual files.

I suggest that you cancel the process, delete the .eml files, and use the mailbox files for now. I’m working on an AppleScript to make it easier to remove duplicates using formail.

thanks!

Please try the new Remove Duplicate Messages script.

I’m sure that I’m doing something wrong, but I can’t get the script to work.

I open the library, select the mailbox (messages) in the record list, and select the script from the scripts menu. But nothing seems to happen and the duplicates remain.

I looked at the bullet points under “description” for known issues, but none seem to apply to me.

You’re supposed to select the mailbox file, not the messages, in the records list.

Thanks Michael,

appreciate it, really useful for me.

Jan

Hi there Michael,

Is it just me, or has the location of where to place this script in Mac OSX 10.8.4 change?
I can’t seem to find ~/Library/Scripts/Applications/EagleFiler/?

I feel a little foolish but I tried everything. :slight_smile:

Chris

I added this script to FastScripts and assigned it keyboard shortcut.

I do not think it can be started through EagleFiler, even though may be it could be setup as a service?

Thanks for the response. So basically you load the script into FastScripts, run it, and it will automatically detect duplicates for whatever file structure you have selected in the right hand pane of EagleFiler?

That’s correct. I can run it from FastScripts menu or from keyboard shortcut. Saves me from lot of mess.

The location has not changed. You may have to create the folder if necessary. After enabling the system Script menu (in AppleScript Editor’s preferences) you can (in EagleFiler) go to the Script menu and choose Open Scripts Folder > Open EagleFiler Scripts Folder to have it create the folder for you. This works similarly with FastScripts.

I would like to remove duplicate emails from my EagleFiler library. It seems that the best current method for dealing with a mailbox containing 20,000 messages is to use the Remove Duplicates script referenced above. I have a couple of questions before I run this operation…

Is this still the best method for removing duplicate emails? I am running the latest EagleFiler in Mavericks.

The description of the script states that tags will be lost. It says EagleFiler metadata will be preserved. I have many messages with MailTags data added in. Is there a way for me to preserve those tags while removing duplicate messages?

Thank you!

Yes.

The tags and notes on the individual messages will be lost. The tags and notes on the mailbox itself will be preserved.

Not currently. It’s best if you can avoid importing duplicate messages in the first place. For example, delete the messages from your mail program after importing them into EagleFiler; then you won’t re-import them the next time.

The documentation of the Remove Duplicate Messages script says - Install Location: ~/Library/Scripts/Applications/EagleFiler/

Is that in the system library or in the user library?

Thank you!

User. The “~” means your home folder.