Japanese and UTF8

Edmund GRIMLEY EVANS edmundo@rano.org
Fri, 18 Feb 2000 17:02:27 +0000


Werner Koch <wk@gnupg.org>:


> > * To assume outside is always fixed charset, say
> > ISO-2022-JP (or EUC-JP or whatever).
> > * To assume that the charset is specified explicitly.
>
> You know have to do this using the --charset option which defaults to
> latin-1. Therefore I should use libiconv.
There is a function that lets you discover the charset of the current locale. This would be a better default than iso-8859-1, when the function is available. The trouble is that this function is not very widely available, and some people have broken locales, so you probably also need a --charset option that overrides the locale, if you want gnupg to be portable and easy to install.
> There is still one problem: Are there any control characters outside
> of the 0..127 range? I assume yes and than I need a way to test for
> them. For security reasons we can't print any data without checking
> first.
I think the offially correct way to remove control characters from a string before printing it is to use mbtowc and iswprint. Both these functions take account of the locale. As usual, you would have to supply your own simplified version for use where configure fails to find them in a library. Do you have to line non-ascii things up in columns at all? I need that in mutt. I use wcwidth to tell me how many character cells a character will occupy on the display. For a printable character, the result is 0, 1 or 2. Since wcwidth returns -1 for a non-printable character (except null), you don't need iswprint when you're using wcwidth. If I remember correctly, wcwidth is in glibc-2.2, but it's not part of the UNIX98 standard. In mutt I supply my own definition, copied from Markus Kuhn. You probably don't need wcwidth. int mbtowc(wchar_t *pwc, const char *s, size_t n); int iswprint(wint_t wc); int wcwidth (wchar_t wc); Edmund