Charset Problems

Miguel Coca mcoca@gnu.org
Wed Dec 11 16:27:02 2002


--mYCpIKhGyMATD0i+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 11, 2002 at 15:03:13 +0100, Werner Koch wrote:
> Furthermore, I bet that this user ID is not correctly encoded.  PGP
> for example does not get it right and does always assume Latin-1.
> BTW, did anyone check whether PGP 8 fixes this.
>=20
> IIRC, Miguel implemented heuristics in GPA to workaround bad encodings
> right before displaying.

Actually, you suggested the code and I used it almost verbatim. This is the
function that makes sure a string is in the proper format (as this is done
for GTK+2, we want it in UTF-8, if you want latin-1, reverse the
conversion):

static gchar *
string_to_utf8 (const gchar *string)
{
  const gchar *s;

  if (!string)
    {
      return NULL;
    }
  /* Make sure the encoding is UTF-8.
   * Test structure suggested by Werner Koch */
  for (s =3D string; *s && !(*s & 0x80); s++)
    ;
  if (*s && !strchr (string, 0xc3))
    {
      /* The string is Latin-1 */
      return  g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NUL=
L);
    }
  else
    {
      /* The string is already in UTF-8 */
      return g_strdup (string);
    }
}

(this comes from GPA's src/gpgmetools.c)

It always returns a newly allocated string, so that you can safely free it
when you are done with it. Also, there are a couple of cases where this
doesn't work and leads to funny characters on screen (as the input is
wrongly interpreted as utf-8), but this is a lot less common.

Regards,
--=20
Miguel Coca (mcoca@gnu.org):               http://zipi.fi.upm.es/~e970095/
GNU SpaceChart:     http://www.gnu.org/software/spacechart/spacechart.html
       OpenPGP: E60A CBF4 5C6F 914E B6C1  C402 8C4D C7B6 27FC 3CA8

--mYCpIKhGyMATD0i+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE991mojE3Htif8PKgRAicHAJ9atLXbwUtTG6O7sDlfRu27r3sriQCfbwlo
9OSlcjb6/wsjNqvwQ7sYL6I=
=CEak
-----END PGP SIGNATURE-----

--mYCpIKhGyMATD0i+--