[Patch] internatinal domain names for email addresses

Thomas Kuehne thomas-gmane at kuehne.cn
Thu Jan 6 21:13:06 CET 2005


David Shaw wrote:
> On Thu, Jan 06, 2005 at 07:05:15PM +0100, Thomas Kuehne wrote:
>
>>>>>Will the regexp code in GnuPG do the right thing when matching utf8
>>>>>against utf8 ?
>>>>
>>>>I'm currently somehow failing to test it ;)
>>>>
>>>>Does the UID "___\u00FC___" match the regexp "___\u0075\u0308___" ?
>>>
>>>
>>>It shouldn't.
>>
>>"\u00FC" and "\u0075\u0308" are canonical-equivalent.
>>They should match.
>>
>>http://www.unicode.org/faq/normalization.html
>>http://www.unicode.org/reports/tr15/
>
>
> Are both of those strings utf8?  I was under the impression that the
> utf8 spec disallowed multiple ways to encode a particular character
> for security reasons (so people couldn't "hide" illegal characters in
> the encoding).

UTF-8 only encode codepoints (0x00FC, 0x0075 and 0x0308).

Unicode rules how those codepoints are interpreted.
Some "characters" can be represented via different codepoint
representations.

A simple example is "ü"

U+00FC LATIN SMALL LETTER U WITH DIAERESIS
or
U+0075 LATIN SMALL LETTER U
U+0308 COMBINING DIAERESIS

There are much more complicated cases in polytonic Greek, Hangul(Korean)
and Hebrew.

One way to ease the problem would be to specify one of the 4 so called
normalization forms in RFC2440 3.4. (Text).

Nevertheless user input needs to be normalized.

Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 155 bytes
Desc: OpenPGP digital signature
Url : /pipermail/attachments/20050106/74ada1a3/signature.bin


More information about the Gnupg-devel mailing list