Merging mailboxes and duplicate messages

The manual wasn’t quite clear; if I merge two mailboxes in EagleFiler, will it automatically detect duplicate messages and either remove them or give me the option of reviewing them? Or does it simply add the two mailbox files together regardless of whether or not there are duplicates?

Merging does not remove the duplicates.

Hmm. Would it make sense to include this functionality? Personally, while I’m thinking that I’ll store the majority of my email in EagleFiler since I don’t really need access to it, I also have some archive folders in Mail for messages that I do occasionally need to reference (or need to show up in Mail’s search lists when they go through all the mailboxes).

Maybe I just need to shift my workflow to import messages straight out of the inbox, merge the resulting mbox in EagleFiler with the older archive, and then move the messages in Mail. That would probably work just as well.

I may add duplicate removal at some point, but I think it would be better for you to shift your workflow such that duplicates aren’t a problem. For example, if you moved the messages into a different mailbox after archiving them into EagleFiler, then duplicates wouldn’t make their way into EagleFiler and you wouldn’t have to move messages from EagleFiler back to Mail.

Duplication removal a great idea
I just started utilizing EagleFiler as an email archive application and I do think that it is quite a great program! Previously, I was utilizing MailSteward, but I did not like its search functionality at all - it was quite limiting. EagleFiler is very user-friendly due to the well put together interface.

I have older MBox mailboxes that I exported for importation into a program such as MailSteward. I knew that I would probably find a better program to handle MBox files and that was my reasoning for the general MBox export. My one primary worry was incrementally updating MBox mailbox archives. When I discovered EagleFiler, my problems were just about solved! EagleFiler solves this problem as I can now combine mailboxes, export an MBox archive and overwrite the previous archive.

My problem lies here:

I have duplicate messages that are present due to the merging of new and prior MBox records. The presence of a duplication removal feature would be a great addition to help clear out these past and present mergings.

1 Like

I don’t understand the source of the duplicates. There are two easy ways to avoid them:

  1. If you delete the messages from your mail program after exporting to mbox, then the next time you could export just the new messages, and when you merge them (or not) in EagleFiler there will be no duplicates.
  2. Likewise, if you don’t delete the messages after exporting, the next time you could export a new mailbox, import it into EagleFiler, and delete the old mailbox in EagleFiler, and again you would have no duplicates.

Hmmm…
One way or another users can still end up with duplicate messages in an archive. Instructing users on how to avoid duplicate emails is a nice customer service feature, but it’s a hardly a feature on the application, itself. A handy feature is to locate such duplicates. This idea has been expressed by more than one person at more than one time. As a developer, it is usually a good idea to listen to a customer base. After all, a duplication technique is one more feature to add to the feature list and, hence, boost the price of the product.

Even MailSteward, a similar application, has a duplication removal feature and it’s quite nice.

If there’s an easy way to prevent duplicates in the first place, it’s better for users to do that than to import duplicates and then ask EagleFiler to remove them. At minimum, this will be faster and one less step. And it works today. I was not clear, reading your first post, whether you understood how to prevent duplicates.

That said, I happen to think that a duplicate removal feature would be useful, and I’ll probably add it at some point. As you say, it’s possible (though less likely) to end up with duplicates when following proper import practices.

Of course, but it’s not simply a matter of listening to customers and doing exactly what they say. First, I want to understand why they want the feature. There may be a better way of addressing the underlying need.

Second, which ones should I listen to? There are far too many ideas (even good ones) suggested by more than one person to implement all of them in a reasonable amount of time. Duplicate message removal is probably not among the 100 most common requests. So, simply by counting requests, it would probably not be a good idea to work on this right now. Fortunately, the number of votes isn’t all that matters. For example, I’m planning some features that no one has requested, because I think they’ll be great. The bottom line is that if you think something should be a high priority, it’s much better if you can explain why you think it’s important.

For example, I’m planning some features that no one has requested, because I think they’ll be great. The bottom line is that if you think something should be a high priority, it’s much better if you can explain why you think it’s important.

Touche…to a point. Everything outside of numerical statistics or concrete facts is subjective. As you stated, you think features you add will be great. That’s an opinion, unless you have concrete facts and statistics to back it up. Barring that, many people may find various features a waste of time.

I can say the same thing for duplication removal. It’s a subjective process to say that such a feature is important unless I have statistics and facts to back it up and many may call such a feature a waste of time.

You’re the developer and I applaud you for your efforts and the very fact that EagleFiler actually exists. Outside of statistics, facts or even actual coding implementation, all I can state is what I believe is useful, much as you have stated what you believe is useful.

I’ll leave this one to time or, perhaps, an AppleScript…hmm…I suppose I could do that one as I do know some AppleScript…

I’m not asking for statistics—what I want to know is, why is it important to you? That is, how is it that you have lots of mbox files with duplicates? Certainly this can happen, but I don’t think it’s very common. Maybe there’s a situation or common workflow that I didn’t know about. Secondly, are the duplicates causing problems for you (if so, how), or is more a matter of “cleanliness” (which I can understand)? These kinds of details are much more interesting to me than statistics like how many people would vote for a particular feature.

If you’re looking to do this yourself, I think the easiest way would be using the formail -D command in Terminal. This should be done before importing the mbox into EagleFiler.

I just imported a bunch of emails from an Entourage/Exchange environment and have noticed many duplicates (which existed in the source mailboxes; I think something happened during an Exchange upgrade to cause them). Has anything changed in EagleFiler to make this easier, or do I still need to export each mbox, run formail -D on it and reimport it?

Thanks.

I haven’t had a chance to add a “Remove Duplicate Messages” feature to EagleFiler yet, so I think “formail -D” is the way to go.

Please try the new Remove Duplicate Messages script.