Parts of a bigger plan

My name is Bertrand Mansion, I live in Paris (France) where I mostly do programming and design for the company I founded in 2000. I enjoy reading sf, heroic fantasy, thrillers, comics...

Filtering bounces with bayesian filter

Handling bouncing mails can be very annoying. The content of the bouncing message depends too much on the server that sent it so trying to classify it using regular expressions is not a very effective solution. I have seen professional and commercial solutions fail to effectively identify a bouncing mail from Gmail…

As I am working on improving our professional emailing system right now, I had to find a bullet proof solution and I thought why not use the same technology used to classify spam in order to classify bouncing mails.

A lot of filters use bayesian analysis to filter spam based on the content. The same could be done with bounces. As of today, I have identified 5 main reasons for bounces

  • recipient unknown
  • over quota
  • detected as spam
  • recipient on vacation
  • others

For what I am trying to do, I don’t need to go into too much details. Using these categories, my goal is to use bayesian text analysis to find out in which category the bounced mail fits the best. And based on that, apply the corresponding rule, for example “recipient unknown” means hard bounce, while “recipient on vacation” means soft bounce so recipient won’t be unsubscribed immediately.

For this project, I have developed a PHP wrapper for CRM114, which I will use to first learn about the different type bounces by training it with the bounces I get, then I will use it to guess what the bounces I get are about.

I am currently writing a web administration tool to help CRM114 decide which mail header fields and body parts are interesting for its content analysis.

— 1 year ago