« Reading List: Bad Money | Main | Reading List: De Havilland Comet »

Saturday, October 18, 2008

Computing: Filtering Forged Junk Mail Bounces with Procmail

Thanks to the multi-level defence in depth I've deployed against junk E-mail, described in earlier postings here and here, I have been receiving very little junk mail addressed directly to me—maybe one or two on a typical day (out of the five or ten thousand which are blocked at various steps in the mail processing pipeline). One problem remains: in addition to directly addressed mail, there are the bounces from junk mail forged as originating from my E-mail address. When you've had the same E-mail address for 14 years, this is an almost inevitable occurrence. Since the bounces originate from legitimate mail transfer agents, the connection-level filters and greylist which are so effective in deterring impatient non-standards-compliant robot mailers do not filter these messages, which must be caught by subsequent content filtering. Since most bounces quote the original message, content filtering has worked pretty well, but still I'd continue to get around ten of these bounces (out of several hundred which arrive) in my mailbox every day.

Then last week things blew up. A new, massive, junk mail campaign has been launched, which sends forged messages which look something like the following:

From:	"Latisha Voss" <REDACTED@fourmilab.ch>
X-Mailer: The Bat! (v2.00.2) Educational
Reply-To: REDACTED@fourmilab.ch
To: REDACTED@voliacable.com
Subject: qcjpl

http://REDACTED.com
tjf b, d konu.

I have redacted my E-mail address in the interest of privacy, and the name of the site to which the recipient is directed to avoid furthering the cause of the junk mailer. The target sites in these messages (they change once or twice a day to prevent filtering based on the URL) are all Flash pages which use the Flash redirect scam to send the user to a pill-pushing site. The forged name of the sender and the random text which follows the URL is different in every message.

Messages like this slide right past most content filtering; there is nothing constant in the content which identifies them as junk, and the frequently-changing target URL makes it impractical to filter based upon it. The sheer volume of these messages since they exploded last week makes it imperative to do something—more than 500 per day were making it past all the filters and landing in my mailbox. Since I run Procmail as the penultimate line of defence (the last is the Bayesian filter in Mozilla Thunderbird), I decided to see if I could devise a rule which would catch these messages. After several experiments, I came up with the following, which I'll show here as for a user named “Chef Rodent” with an E-mail address of chef@ratburger.org:

:0 HB:
* -1^0
*   1^0 ^From +.*MAILER\-DAEMON
*   1^0 ^From:.*<chef@ratburger\.org>
*   -1^0 ^From:.*Chef +Rodent
blowback

This uses a Procmail weighted test in a very simple manner to identify messages with a “From” line including “MAILER-DAEMON”, a “From:” line with the user's E-mail address (which will appear in the body, showing the rejected message), but which do not include the user's correct name on the “From:” line. These characteristics are true of these forged messages, yet should be sufficiently rare as to generate few false positives. (And if there are a few, I don't care—it's just E-mail. Anybody who wants to be sure to contact me should use the feedback form or send a FAX.)

This isn't perfect; nothing is in the world of junk mail filtering. Some bounces don't include “MAILER-DAEMON” in the “From” line, and others don't quote the bounced message. But, in my experience, this rule will catch about 99% of the bounced forged messages. You may want to mop up others with additional rules, but for the moment I'm happy with the results of this rule by itself. The rule files forgery bounces in a “blowback” folder; after I gain more confidence in it, I'll just send them directly to “/dev/null”.

If you have a complicated .procmailrc file, this rule should probably be placed after any whitelist and blacklist rules and before content-based filters.

Posted at October 18, 2008 20:15