Charset Problems

Miguel Coca mcoca at
Wed Dec 11 16:27:02 CET 2002

On Wed, Dec 11, 2002 at 15:03:13 +0100, Werner Koch wrote:
> Furthermore, I bet that this user ID is not correctly encoded.  PGP
> for example does not get it right and does always assume Latin-1.
> BTW, did anyone check whether PGP 8 fixes this.
> IIRC, Miguel implemented heuristics in GPA to workaround bad encodings
> right before displaying.

Actually, you suggested the code and I used it almost verbatim. This is the
function that makes sure a string is in the proper format (as this is done
for GTK+2, we want it in UTF-8, if you want latin-1, reverse the

static gchar *
string_to_utf8 (const gchar *string)
  const gchar *s;

  if (!string)
      return NULL;
  /* Make sure the encoding is UTF-8.
   * Test structure suggested by Werner Koch */
  for (s = string; *s && !(*s & 0x80); s++)
  if (*s && !strchr (string, 0xc3))
      /* The string is Latin-1 */
      return  g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NULL);
      /* The string is already in UTF-8 */
      return g_strdup (string);

(this comes from GPA's src/gpgmetools.c)

It always returns a newly allocated string, so that you can safely free it
when you are done with it. Also, there are a couple of cases where this
doesn't work and leads to funny characters on screen (as the input is
wrongly interpreted as utf-8), but this is a lot less common.

Miguel Coca (mcoca at     
GNU SpaceChart:
       OpenPGP: E60A CBF4 5C6F 914E B6C1  C402 8C4D C7B6 27FC 3CA8
