keys.openpgp.org not sending confirmation email

Tue Sep 17 21:21:43 CEST 2019

On 17.09.2019 17:21, Werner Koch wrote:
> On Tue, 17 Sep 2019 15:08, gnupg-users at gnupg.org said:
> 
>> See also dkg's thoughts on the matter on the openpgp-wg mailing list, to align
>> the specification with reality:
> 
> OpenPGP has never defined what goes into the User ID except for the
> encoding which should be UTF-8.  Anything else does not belong into the
> specs unless the X.509 mess is a desired outcome.  Thus the current
> wording is sufficient and has served us well over the last 25 years [1]:
> 
> | ## User ID Packet (Tag 13)
> | 
> | A User ID packet consists of UTF-8 text that is intended to represent
> | the name and email address of the key holder.  By convention, it
> | includes an RFC 2822 [](#RFC2822) mail name-addr, but there are no
> | restrictions on its content.  The packet length in the header specifies
> | the length of the User ID.

I totally agree that the ID in general should be encoded in UTF-8 and
that there should be no further restrictions. That would have solved me
from the hassle which has initiated this discussion (which, by the way,
gets more and more interesting).

However, just in case this discussion really leads to updates of
conventions or even RFCs, I'd like to throw in the following thought:

Some years ago (might be more than ten years, I really don't remember),
there was an article in the well-known German computer magazine c't,
written by one of Heise's long-time authors. In that article, the author
complained and was really upset because some "idiots" (I am citing him
here) constantly generated PGP keys with *his* email address in the ID,
and uploaded those keys to all key servers they knew.

Consequently, he constantly was receiving encrypted emails (probably
with important content - think of him being an investigative journalist)
which he could not decrypt. It was unclear if the "idiots" just wanted
to bother him, if they seriously wanted to prevent sensitive material
from being mailed to him, or if this was part of a greater attack on the
PGP system as a whole (assuming that other journalists were suffering in
a similar manner).

The current policy of some key servers (including keys.openpgp.org) is
to send a confirmation email to each email address which is contained in
an uploaded key's ID. Although this policy probably mainly has arisen
from data protection obligations, it makes the scam described above much
more difficult (if it doesn't prevent it at all).

Of course, a key server can only send a confirmation email if there is a
valid email address in the ID of a key. Hence, a future convention /
standard eventually should provide some sort of structure:

One part of the ID must contain the (valid) email addresses (addr) which
should be associated with the key, and another part has absolutely no
restrictions, or perhaps the only restriction that the @ character is
forbidden (to prevent people from smuggle in something which resembles
an email address).

Key servers then must send a confirmation email each time a key is
uploaded (if its ID contains an address entry), and must use only the
address entries (addr) to match search queries which contain the @
character.

I am aware that this is similar to what we have today, i.e. to the
convention described above. However, that convention, as experience
shows, is interpreted differently by different software packages, and it
is not mandatory.

I am also aware that this would not prevent other sorts of scams. For
example, an attacker could upload a key with his own correct email
address in the respective ID "field", but with my name in the "free
text" part. However, I believe that the overwhelming majority of people
who are searching for my public key are searching by my email address
(and not by my name). Perhaps Vincent has a statistic ...

If there were dedicated "fields" for the address part (addr) in the ID,
the complicated parsing of the name-addr wouldn't be necessary; just the
addr part (i.e. each field) would have to be parsed. The rest of the ID
wouldn't have to be checked, except for the absence of the @ character,
which is trivial.

A very simple ID structure providing that sort of "protection" which
immediately came to my mind is the following:

- The ID may consist of several lines, separated by LF.

- Lines are not restricted in length (as long as the ID as a whole does
not get too big).

- A line is said to be empty if and only if it contains the LF character
and no other character.

- If the first line is empty, the ID does not contain an email address
(addr), and the free text part of the ID starts at the second line.

- If the first line is not empty, it must contain exactly one valid
email address (addr), which is the first email address to be associated
with the key, and all following non-empty lines must contain valid email
addresses (addr) as well, one per line, which are the further email
addresses to be associated with the key. That array of lines with valid
email addresses (addr) is ended by an empty line. The free text part of
the ID starts after that empty line.

The free text part of the ID is encoded in UTF-8 and may contain any
character except the @ character.

Software must not generate IDs other than described above. Key servers
must refuse publishing keys with IDs other than described above. Key
servers must send a confirmation email to all email addresses (addr)
which are in a key's ID, and must not publish an email address as being
associated with a certain key until it has received an answer to the
respective confirmation email.

As far as I have understood, that structure could easily be put into tag
13; no new tag would have to be created (although a new tag for the
email addresses (addr) would be a huge improvement).

We even could allow the @ character in the free text part of the ID if
the key servers would provide two different kinds of search: Search by
mail address (addr) (here, the key server is only allowed to match the
search string against the dedicated address entries), and search by free
text (here, the key server is only allowed to match the search string
against the free text part).

As a final note, I am aware that many problems with scams can be
prevented by mutual key signing or by publishing keys on web sites. But
to be honest, I don't know many people who have their PGP / GPG keys
signed by a noteworthy number of other people. Meeting others just for
this purpose is painful, and validating the other party correctly is
only possible with some knowledge. And of course, not every private
person has a web site, and only a few companies publish more than one
email address with its public key on their web site, even if they have
dozens of employees with have PGP / GPG installed.

Actually, I currently don't know anybody who I could ask to sign my
keys, and furthermore, the problem is bigger the other way around. Can I
trust the key which I found on the key server for the intended
recipient's email address? Can I at least be sure that the key server
has sent a confirmation email to that email address and has received the
answer? Or has it failed to do so due to a malformed email address, but
finds that address nevertheless because it performs a full-text search
against the key IDs?

So, in summary, my point is that there should be as few restrictions on
the ID as possible, but it should be made sure that everything which
resembles an email address (addr) and which could be found by querying a
key server actually has been validated by having sent a confirmation
email and having received an answer. This especially means that email
addresses (addr) can't be part of surrounding free text, because parsing
(and possibly removing) them from such an environment is technically
complicated, implementation-wise error prone and subject to different
interpretation. There is some risk that the confirmation / validation
failed due to a malformed or wrongly interpreted addr, but that this
addr can be found nevertheless on the key server. Therefore, we need to
structure the ID so that there is one dedicated place where addrs can be
stored and easily parsed, which again means to disallow them in normal
free text.

Just my two cents - and being happy and grateful for what we have
already! Maybe it's too late - having had a long day ...

Regards,

Binarus