[Patch] internatinal domain names for email addresses
David Shaw
dshaw at jabberwocky.com
Thu Jan 6 21:26:16 CET 2005
On Thu, Jan 06, 2005 at 09:13:06PM +0100, Thomas Kuehne wrote:
> David Shaw wrote:
> >On Thu, Jan 06, 2005 at 07:05:15PM +0100, Thomas Kuehne wrote:
> >
> >>>>>Will the regexp code in GnuPG do the right thing when matching utf8
> >>>>>against utf8 ?
> >>>>
> >>>>I'm currently somehow failing to test it ;)
> >>>>
> >>>>Does the UID "___\u00FC___" match the regexp "___\u0075\u0308___" ?
> >>>
> >>>
> >>>It shouldn't.
> >>
> >>"\u00FC" and "\u0075\u0308" are canonical-equivalent.
> >>They should match.
> >>
> >>http://www.unicode.org/faq/normalization.html
> >>http://www.unicode.org/reports/tr15/
> >
> >
> >Are both of those strings utf8? I was under the impression that the
> >utf8 spec disallowed multiple ways to encode a particular character
> >for security reasons (so people couldn't "hide" illegal characters in
> >the encoding).
>
> UTF-8 only encode codepoints (0x00FC, 0x0075 and 0x0308).
>
> Unicode rules how those codepoints are interpreted.
> Some "characters" can be represented via different codepoint
> representations.
>
> A simple example is "ü"
>
> U+00FC LATIN SMALL LETTER U WITH DIAERESIS
> or
> U+0075 LATIN SMALL LETTER U
> U+0308 COMBINING DIAERESIS
>
> There are much more complicated cases in polytonic Greek, Hangul(Korean)
> and Hebrew.
>
> One way to ease the problem would be to specify one of the 4 so called
> normalization forms in RFC2440 3.4. (Text).
Ah, I did not understand. Wow, that's a massive headache.
David
More information about the Gnupg-devel
mailing list