Mailbox Sieve: expanded filtering

dhs · May 21, 2007, 6:40am

I have always wondered why no one implements a baysian filtering mechanism for all folders. It seems like this would be an easy enhancement to something like spamsieve.

It would work by allowing the user to direct mail to whatever mailbox they wanted. There would be a seperate corpus for each mailbox. After a mailbox has been “trained” with several messages, it should be able to automatically determine what messages go where. An enhancement would be for it to dynamically set the corpus by training with any message in a mailbox. So the user simply puts mail in a mailbox, then spamsieve learns what goes where.

devonthink does this, but it is very inconvenient to use with mail as it is not integrated and must be used as a seperate application

comments?

Michael_Tsai · May 21, 2007, 8:23am

This is something that I’m considering. It could be useful in certain cases, though I think the benefit is less clear than with spam filtering. For regular mail filing, in most cases I think it’s easy to make a 100%-accurate rule that does what you want. Another issue is that it’s much more difficult to integrate n-way classification into a mail program. Lastly, there are a lot of engineering issues that make this a less straightforward enhancement to the classifier than you might think.

dhs · May 21, 2007, 8:46am

It is true making a filter is much more accurate. But I think if you look at the average persons mail, they don’t use them at all, just too much trouble. For example, I get a new friend julia. Yes I can add a new filter, or modify an old one, just like for my other 50 friends.
But this alternative is just to add a new mailbox and put this one message there. Simple and effective. It could even redo the corpus every time messages were moved. So if I created a mailbox oldfriends, and dragged a bunch of emails there, the corpus would be rewritten, without going in to edit and figure out 10 old filters.

Devon think also has a mechanism where before a message (file) is moved the user is queried with the most likely mailboxes (folders). I find most of the time the first choice is correct, even with small numbers of files in a folder.

thanks for reading all this!

Michael_Tsai · May 21, 2007, 10:34am

I agree that this is how it should work. The problem that is to get this kind of interaction the classifier would need to be very tightly integrated with the mail program. No mail program offers anywhere near the kind of hooks that would be needed to implement feature this as an add-on. And if it can’t be tightly integrated, manually creating rules is probably easier in most cases.

dhs · May 21, 2007, 10:58am

dhs
Yes I agree it would have to be tightly integrated, otherwise it would cause more problems. The frustrating part is DevonThink comes very close to doing this. With scripts it can automatically receive and filter message, even reply. I think the hooks are already available in Mail, if one used DevonThink to determine the correct mailbox to use (mirroring the structure in apple mail) it could tell Mail where to put messages. I don’t really have the time to look into this.

In reality, I wish the entire file system was organized using some AI mechanism. It couldn’t do a worse job than most people (including myself) just dumping everything in the documents folder or on the desktop.