Japanese and UTF8

Edmund GRIMLEY EVANS edmundo@rano.org
Thu, 17 Feb 2000 12:03:54 +0000


Werner Koch <wk@gnupg.org>:


> now that we have a Japanese translation, we have to do a conversion from
> EOC_JP to UTF-8, because UTF-8 is the required encoding for user IDs
> and some other strings in OpenPGP.
>
> I don't think that the currently used simple mapping approach works
> with that character set, because it is a simple one-to-one mapping and
> I expect that EOC_JP uses state shifting.
Probably.
> What is a portable way to this conversion? I had some talks about
> that in Tokyo and it boiled down to let the OS/libc do it. Okay, how?
The official API uses iconv_open, iconv and iconv_close and is defined in iconv.h. The version in glibc-2.1 doesn't do Japanese and deviates from the standard. I hope glibc-2.2 will have a more correct and complete implementation. Bruno Haible has a portable libiconv that provides the same functions and does do Japanese. (I'm using it now, linked with mutt, on a glibc-2.1 machine.) There's concise info and relevant links at: ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO-5.html#ss5.1 If you're going to use iconv, then you might want to get rid of the charset tables in util/strgutil.c. On the other hand, you might want to do what I did with mutt: leave them in but use them only if configure fails to detect iconv. I can send you my configure.in for mutt, if you want; it's mostly adapated from Bruno Haible's configure.in for clisp, if I remember correctly. You probably know more about aoutconf and can tell me what I did wrong ... Edmund