Spam Filtering - Some useful perspectives.

Tags:

Was recently asked by a colleague about the current spam situation at a generic level - and more specifically 'what can be done about it'.

I proceeded to send him what could've almost be called a Tirade - Of exactly what'll need to happen before spam will be a thing of the past. Not sure i'll go there just yet...

In an unrelated tack, a recent discussion on NZLUG cited some problems an individual was having sending emails to Hotmail.com, as they were apparently requiring SPF records of domains sending them mail - and were deferring through to failure, inbound mail that didn't have it.

I checked - they're not. (I don't currently publish SPF, yet I can send to them fine.). But I threw some terms into Google and found a gem of a link - http://www.richi.co.uk.

http://richi.co.uk/blog/2005/06/yet-more-on-hotmails-move.html

Which References http://www.computerworld.com/blogs/node/440

http://richi.co.uk/blog/2005/06/hopefully-last-on-this-subject_24.html

http://richi.co.uk/blog/2005/05/why-challengeresponse-is-bad.html

I consider all of the above to be useful reads.
I also have to confess I do agree with much of the following - anyone looking to implement spam filtering needs to bear the below in mind...

A common theme in spam filtering is that there is no one, single, silver bullet to fix this problem. Not CTA, not Bayesian, and certainly not challenge/response.

Examples of the techniques employed at the first stage:

* Valid HELO or EHLO?
* Valid PTR or RDNS?
* Greylisting/tempfailing
* Throttling (prevents illegal pipelining)
* IP reputation/blacklists
* SPF/SenderID/DKIM

More broadly, there's a general problem with content filtering: it's expensive. In a world where 70-90% of the port 25 connection attempts are unwanted, we don't want to be wasting MTA horsepower on receiving the message and performing complex analysis on the content. Moreover, we want to be able to reject the message with an SMTP 5xx code—this allows us to avoid the collateral-damage causing "backscatter" of bounce messages. This means that we need to keep the connection open while we run the rules, which isn't pleasant.

In other words...

To sum up, spam filters are increasingly running an initial set of anti-spam rules at the connection level, before the SMTP DATA transaction even starts. If these rules generate a high enough score, it's 5xx no spam for you, and goodnight Vienna. Only if the filter's unsure will the message make it to the second, content filtering stage.

Why load your system up unnecessarily? And why accept the message (and risk a bounced bounce - due to the use of invalid source addresses?

Adding SPF presence checks to the existing SPF rule allows Hotmail and others to reject more spam without expensive content filtering. This shouldn't cause any additional false positives, unless Hotmail does something dumb with the score weights.

Note we're talking about score weights. Not about rejecting mail outright on the basis of an SPF failure.

Any providers out there who're using SPF to reject mail outright (and not simply treating it as a 'part of the formula') are asking for trouble. In fact, as noted above, there is really no 'silver bullet' and any defence against spam these days should be multi-tier. The biggest risk is that spam filtering itself becomes too prone to false-positive - because that'll make email entirely too unreliable.

And thats part of the challenge, isn't it. We're on the defensive when it comes to spam, because the Offensive, is too hard / too gutless / too illegal and/or immoral.