hashed user IDs [was: Re: Security of the gpg private keyring?]

Robert J. Hansen rjh at sixdemonbag.org
Wed Mar 9 14:39:35 CET 2011


On 3/9/2011 8:11 AM, Ben McGinnes wrote:
> * Anyone trawling through keys on a public server or downloading
>   random keys cannot see who owns that key or what their email address
>   is, but anyone who knows Joe or his email address can search the
>   keyservers for that data because the hash can be calculated from the
>   data they do have (e.g. joe at example.net) and search for the key with
>   the matching hash.

There are a couple of major problems here:

1.  There's not all that much entropy in an email address.  Let's say
that I want to harvest email addresses.  I create a list of, say, the
top thousand email providers in the world, and then every five-character
lowercase username.  For each five-character lowercase username, compute
the hash for that user name at each of the top thousand email providers.
 For each hash, look it up in the database.  Total work factor: about 11
billion hashes have to be made, probably under a terabyte of data --
very practical.

        (a) And don't forget that with services like Amazon's cloud,
            massive data crunching distributed across hundreds of
            machines costs a few pennies per processor-hour.  This
            has the potential to ruin your entire day: cloud computing
            shifts the fulcrum of computational leverage *immensely*.

2.  To really gain benefit from this scheme, you must:

        (a) have a non-trivially-brute-forceable email address
        (b) want to be able to hide your email address

If you don't care ("b" fails), then this scheme is just an
inconvenience.  If you have a brute-forceable email address ("a" fails),
then this scheme offers no benefit.

3.  Deploying this scheme means:

        (a) people can no longer do fuzzy searches for email
            addresses ("show me all user IDs that look like this
            pattern")
        (b) finding people's certificates may be made more
            difficult due to (a)

4.  My suspicion is the number of users covered by (2) is pretty small.
 My suspicion is the number of users impacted by (3) is pretty large.
My suspicion is we do not have a very good handle on just how difficult
we need to make things, given the resources available to spammers in (1a).



More information about the Gnupg-users mailing list