Encoding of user ID strings

Werner Koch wk at gnupg.org
Tue May 24 08:26:54 CEST 2016


On Mon, 23 May 2016 20:19, rjh at sixdemonbag.org said:

> At first blush it appears the answer is "no, but most people use UTF-8."
>  If so that's fine, but I'll have to silently discard a number of user

OpenPGP requires that the user id is UTF-8 encoded.  Older PGP versions
did not care about encoding and used whatever the system provided.  Thus
there are lot's of (e.g.) Müller with wrong encodings.  This is what GPA
uses to fix the problem for most of the western world's PGP users:

--8<---------------cut here---------------start------------->8---
  /* Due to a bug in old and not so old PGP versions user IDs have
     been copied verbatim into the key.  Thus many users with Umlauts
     et al. in their name will see their names garbled.  Although this
     is not an issue for me (;-)), I have a couple of friends with
     Umlauts in their name, so let's try to make their life easier by
     detecting invalid encodings and convert that to Latin-1.  We use
     this even for X.509 because it may make things even better given
     all the invalid encodings often found in X.509 certificates.  */
  for (s = string; *s && !(*s & 0x80); s++)
    ;
  if (*s && ((s[1] & 0xc0) == 0x80) && ( ((*s & 0xe0) == 0xc0)
                                         || ((*s & 0xf0) == 0xe0)
                                         || ((*s & 0xf8) == 0xf0)
                                         || ((*s & 0xfc) == 0xf8)
                                         || ((*s & 0xfe) == 0xfc)) )
    {
      /* Possible utf-8 character followed by continuation byte.
         Although this might still be Latin-1 we better assume that it
         is valid utf-8. */
      return g_strdup (string);
     }
  else if (*s && !strchr (string, 0xc3))
    {
      /* No 0xC3 character in the string; assume that it is Latin-1.  */
      return g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NULL);
    }
  else
    {
      /* Everything else is assumed to be UTF-8.  We do this even that
         we know the encoding is not valid.  However as we only test
         the first non-ascii character, valid encodings might
         follow.  */
      return g_strdup (string);
    }
--8<---------------cut here---------------end--------------->8---

[ Feel free to reuse - I put this snippet into the public-domain or under
  CC-0 ]



Shalom-Salam,

   Werner


-- 
Die Gedanken sind frei.  Ausnahmen regelt ein Bundesgesetz.
    /* EFH in Erkrath: https://alt-hochdahl.de/haus */




More information about the Gnupg-users mailing list