Frustrated by the barrage of spam that my email addresses recieve daily, I decided look into the myriad of Spam filtering solutions freely available on the internet. Research on the topic led me to POPFile and I have been using it for the past six months. To date it has classified 22,842 emails with 98.98% accuracy. I’ve recieved a total of 17,966 spam emails during those six months, give or take a few that were either unclassified or classified incorrectly. Wow!
POPFile describes itself as “an automatic mail classification tool.” The software uses a naÃƒÂ¯ve Bayes algorithm to sort through the incoming emails and classify them as Spam or not Spam. Basically it creates a bridge between your mail client (Outlook / Outlook Express / Thunderbird / etc.) and your email server.
When you check your email, all incoming emails are sent to POPFile which runs statistical tests on the email content to decide whether the email is spam or a valid email. This is done by looking at the email content and matching the words against a dictionary of words that POPFile creates and manages automatically. Most spam emails usually have similar content “viagra, cialis, mortgage, payment, average, online, pharmacy, satisfaction, deals” etc etc and POPFile counts the numbers of spam-related words per email, creates a word matrix along with probabilities of each word being found in a spam message along with the other words as well, and then uses a decision chart along with the words’ interdependencies to classify the email.
If the email is classified as spam, then POPFile modifies the Subject line of the email and prefixes “[SPAM]” to the email’s subject. Your mail client’s builtin sorting operations can then place emails containing the word “[SPAM]” in the subject line into a temporary folder that you can either delete right away or sort through later at your convenience. You also have the option to automatically quarantine and delete all incoming email that has been marked as spam by POPFile.
Furthermore, you can create Magnets which will force incoming mail from specific email address and domains or with specific subject lines to not be checked by POPFile. This will ensure that all incoming emails from your work and/or friends will always reach your mail client.
As I mentioned earlier, this freely-available piece of software has classified the 22,842 emails I’ve recieved since January 2nd with 98.98% accuracy. I no longer look at the SPAM folder in my mail client and I just delete its content right away. I’ve created magnets for important email contacts and the rest are sorted for me automatically :)
Also, here are some official real-time POPFile stats popfile_stats.html collected from POPFile users who opted to take part in reporting feature. Quite impressive!