Japanese and UTF8
Edmund GRIMLEY EVANS
edmundo@rano.org
Fri, 18 Feb 2000 17:02:27 +0000
Werner Koch <wk@gnupg.org>:
> > * To assume outside is always fixed charset, say
> > ISO-2022-JP (or EUC-JP or whatever).
> > * To assume that the charset is specified explicitly.
>
> You know have to do this using the --charset option which defaults to
> latin-1. Therefore I should use libiconv.
There is a function that lets you discover the charset of the current
locale. This would be a better default than iso-8859-1, when the
function is available.
The trouble is that this function is not very widely available, and
some people have broken locales, so you probably also need a --charset
option that overrides the locale, if you want gnupg to be portable and
easy to install.
> There is still one problem: Are there any control characters outside
> of the 0..127 range? I assume yes and than I need a way to test for
> them. For security reasons we can't print any data without checking
> first.
I think the offially correct way to remove control characters from a
string before printing it is to use mbtowc and iswprint. Both these
functions take account of the locale.
As usual, you would have to supply your own simplified version for use
where configure fails to find them in a library.
Do you have to line non-ascii things up in columns at all? I need that
in mutt. I use wcwidth to tell me how many character cells a character
will occupy on the display. For a printable character, the result is
0, 1 or 2. Since wcwidth returns -1 for a non-printable character
(except null), you don't need iswprint when you're using wcwidth. If I
remember correctly, wcwidth is in glibc-2.2, but it's not part of the
UNIX98 standard. In mutt I supply my own definition, copied from
Markus Kuhn.
You probably don't need wcwidth.
int mbtowc(wchar_t *pwc, const char *s, size_t n);
int iswprint(wint_t wc);
int wcwidth (wchar_t wc);
Edmund