Charset Problems

Miguel Coca mcoca at gnu.org
Wed Dec 11 16:27:02 CET 2002


On Wed, Dec 11, 2002 at 15:03:13 +0100, Werner Koch wrote:
> Furthermore, I bet that this user ID is not correctly encoded.  PGP
> for example does not get it right and does always assume Latin-1.
> BTW, did anyone check whether PGP 8 fixes this.
> 
> IIRC, Miguel implemented heuristics in GPA to workaround bad encodings
> right before displaying.

Actually, you suggested the code and I used it almost verbatim. This is the
function that makes sure a string is in the proper format (as this is done
for GTK+2, we want it in UTF-8, if you want latin-1, reverse the
conversion):

static gchar *
string_to_utf8 (const gchar *string)
{
  const gchar *s;

  if (!string)
    {
      return NULL;
    }
  /* Make sure the encoding is UTF-8.
   * Test structure suggested by Werner Koch */
  for (s = string; *s && !(*s & 0x80); s++)
    ;
  if (*s && !strchr (string, 0xc3))
    {
      /* The string is Latin-1 */
      return  g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NULL);
    }
  else
    {
      /* The string is already in UTF-8 */
      return g_strdup (string);
    }
}

(this comes from GPA's src/gpgmetools.c)

It always returns a newly allocated string, so that you can safely free it
when you are done with it. Also, there are a couple of cases where this
doesn't work and leads to funny characters on screen (as the input is
wrongly interpreted as utf-8), but this is a lot less common.

Regards,
-- 
Miguel Coca (mcoca at gnu.org):               http://zipi.fi.upm.es/~e970095/
GNU SpaceChart:     http://www.gnu.org/software/spacechart/spacechart.html
       OpenPGP: E60A CBF4 5C6F 914E B6C1  C402 8C4D C7B6 27FC 3CA8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : /pipermail/attachments/20021211/8e57d4f3/attachment.bin


More information about the Gnupg-devel mailing list