Encryption on Mailing lists sensless?

Wed Nov 19 12:17:19 CET 2014

On 19/11/14 01:31, Robert J. Hansen wrote:
> No.  Client-side, you get to inspect (fully) only your data, and you
> have to develop a statistical model of spam based on only your data.
> When Gmail filters, it inspects (fully) traffic to *millions* of users,
> and uses that to create a model no individual user can hope to match.

I agree with several other important points you raise, but this one is not a big
deal. I have a highly customized mail setup. My SpamAssassin downloads rules
from the internet, but trains its Bayesian filter on only the e-mail I
personally receive.

Everyone who has ever sent me a non-spam mail is added to a whitelist. Mail from
whitelisted people never gets automatically moved to the Spam box, and my mail
client shows their messages in a different color. As soon as I receive a spam
mail from such an address, it is immediately (manually) deleted from the
whitelist (actually moved to the greylist so it's not added to the whitelist
again next time).

I have an empty blacklist. It exists, though. It would cause mail to be silently
deleted. Somebody once had the honour of having me create it and put him on it :).

SpamAssassin throws spams in a Spam folder for me to check every few weeks. I
sort them by subject line so I can quickly scan through. Checked spam that I
perceived as spam is still kept around for quite a while, just in case someone
writes to me "I wrote you months ago and you haven't replied". Then I can go
back to everything I've already written off as spam to see if I looked past
their mail.

This setup works great for me. If I get a few false positives in a year, it is a
lot. They are so scarce that I'm completely unsure what the actual number is. I
do get false negatives, but it doesn't feel like more than 10 each week. Every
now and then a short surge of nearly identical spams, though.[1]

I still think your overall point stands, and stands tall. But the spam filtering
issue; from personal experience, I don't think that's a really major issue.

If it were, I'm sure we can think of some way to have publicly available
training data that can be refined by individuals who can feed it back to the
publicly available data. It might need some thought: you don't want to have a
really classified mail which got qualified as spam to upload new words to the
public data. So probably most individuals would only adjust existing weights,
and only some setups would contribute new words. This could come from spamtraps
and organisations or even individuals who send in complete training mails. And
perhaps this all is even not necessary, and the system would be just as
effective with a big corpus of data where only weights are changed by submissions.

But this is all a bit beside the point. The point is that spam filtering works
just fine on an individual level, for me. And if it would create problems, I'm
sure we can think of things that would solve that specific issue.

Peter.

PS: By the way, some mail is already denied at the mailserver and never enters
the system. The most important instance of this is mail purporting to come from
myself, but not originating from within my own network. Lots of spammers send
you spams from your own address, be it in the envelope or in the headers. I run
my own webmail server, so even if I need to send myself a message and I didn't
bring my laptop, it would still originate from my own webmail server.

[1] Actually that is a case where the distributed solution truely excels:
quickly homing in on the latest mass mailing. The sheer number of identical
mails alone is a big warning sign, and a lot of people will start reporting them
as spam.

-- 
I use the GNU Privacy Guard (GnuPG) in combination with Enigmail.
You can send me encrypted mail if you want some privacy.
My key is available at <http://digitalbrains.com/2012/openpgp-key-peter>