[Patch] internatinal domain names for email addresses

David Shaw dshaw at jabberwocky.com
Thu Jan 6 21:26:16 CET 2005


On Thu, Jan 06, 2005 at 09:13:06PM +0100, Thomas Kuehne wrote:
> David Shaw wrote:
> >On Thu, Jan 06, 2005 at 07:05:15PM +0100, Thomas Kuehne wrote:
> >
> >>>>>Will the regexp code in GnuPG do the right thing when matching utf8
> >>>>>against utf8 ?
> >>>>
> >>>>I'm currently somehow failing to test it ;)
> >>>>
> >>>>Does the UID "___\u00FC___" match the regexp "___\u0075\u0308___" ?
> >>>
> >>>
> >>>It shouldn't.
> >>
> >>"\u00FC" and "\u0075\u0308" are canonical-equivalent.
> >>They should match.
> >>
> >>http://www.unicode.org/faq/normalization.html
> >>http://www.unicode.org/reports/tr15/
> >
> >
> >Are both of those strings utf8?  I was under the impression that the
> >utf8 spec disallowed multiple ways to encode a particular character
> >for security reasons (so people couldn't "hide" illegal characters in
> >the encoding).
> 
> UTF-8 only encode codepoints (0x00FC, 0x0075 and 0x0308).
> 
> Unicode rules how those codepoints are interpreted.
> Some "characters" can be represented via different codepoint
> representations.
> 
> A simple example is "ü"
> 
> U+00FC LATIN SMALL LETTER U WITH DIAERESIS
> or
> U+0075 LATIN SMALL LETTER U
> U+0308 COMBINING DIAERESIS
> 
> There are much more complicated cases in polytonic Greek, Hangul(Korean)
> and Hebrew.
> 
> One way to ease the problem would be to specify one of the 4 so called
> normalization forms in RFC2440 3.4. (Text).

Ah, I did not understand.  Wow, that's a massive headache.

David



More information about the Gnupg-devel mailing list