[nycbug-talk] upcoming meeting on Postfix

Miles Nordin carton at Ivy.NET
Fri Oct 24 16:31:30 EDT 2008


>>>>> "rp" == Robin Polak <robin.polak at gmail.com> writes:

    rp> how Postfix handles SPAM

Well, it doesn't, a bunch of add-on widgets and doo-hickeys do this,
and in general each of them can be combined with Postfix, qmail, or
big honkin' messes like Zimbra or Scalix or Sun
JMS/IMTA/iPlanet/One/whateverthefuckitscalledthisweek so it's not
really postfix-related.

The spam stuff built into postfix is a bunch of standards-compliance
checks that Max dumped in an earlier message which are not very
interesting and are described in non-Postfix-specific terms here:

  http://en.wikipedia.org/wiki/Anti-spam_techniques_(e-mail)#Enforcing_RFC_standards

The interesting part in spam fighting is elsewhere, the filters like
crm114/dspam/spamassassin, and filter autotraining by users who
manually move their messages into or out of a spam folder or forward
them to some magic address.  The other thing that interests me is
collecting rrdtool statistics about spam to numerically compare
filtering methods by accuracy and CPU efficiency---to do that, I'd
like a way to operate a filter without using its decisions so I could
compare it to other filters and see if it's generating false
positives.  I haven't heard of stuff like this---``How much of the
spam I'm currently rejecting is also rejected by this much cheaper
filter?'' and ``how much more mail would I be rejecting if I used this
filter, and can I have a look at the new rejects?''  I doubt this
speaker knows about such things since the topic spreads far out from
Postfix.

A way of relating spam to postfix and queueing strategies would be to
talk about clustering---how do you run backscatter-proof spam scanning
for a domain with enough traffic that the spam scanning is too CPU
intensive for one host to handle?  What variables determine how big
the cluster needs to be, and is it always CPU-bound or is it
disk-bound sometimes?  Postfix has a lot of clustering stuff built in
but there are probably non-obvious things.  Obviously you need a list
of valid usernames in LDAP which can be pushed out to the edge, or
else mail sent to invalid usernames will turn into backscatter.  And
you may have per-user spam preferences as well like whitelists, or
likes-spam flags, or a spammyness-level knobs, or
please-dont-delete-viruses-i'm-researching-them, that need to be
replicated on all the spam-filter nodes.  But less obvious the
greylisting database probably has to be shared with LDAP or SQL or or
else the whole idea of greylisting won't work any more.  Are there
other tiny secret things which need to be replicated cluster-wide?

But I think a spam meeting will quickly degenerate into a raucous
bikeshed discussion, which is arguably what meetings are best for but
in this case it might be a waste of the speaker's famousness.  I've
not much interest in a generalist meeting like the extremely tiresome
Asterisk meeting at Unigroup which could be replaced by one URL
linking to a blog post.  On like five occasions there were two-minute
digressions about ``where the links would be posted,'' and I wanted to
scream.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL: <https://lists.nycbug.org:8443/pipermail/talk/attachments/20081024/e2e19d97/attachment.bin>


More information about the talk mailing list