Encoding of user ID strings

Andrew Gallagher andrewg at andrewg.com
Mon May 23 23:03:06 CEST 2016


> On 23 May 2016, at 20:19, Robert J. Hansen <rjh at sixdemonbag.org> wrote:
> 
> Is there any way to determine the encoding for a user ID string?
> 
> At first blush it appears the answer is "no, but most people use UTF-8."

You can tell fairly reliably if someone is using either vanilla ascii or UTF8, in the cases of "7-bit characters only" and "8 bit characters in strictly valid UTF8 order" respectively. An off the shelf UTF8 validator should be able to do that for you. 

In the case of "all 8-bit characters, no 7-bit" you're dealing with either a practical joker or EBCDIC. Same thing really...

After that you're into heuristics. There are quite a few programs out there that attempt to detect encodings statistically, but with such a short string of data you might as well pick a number. ;-)

A



More information about the Gnupg-users mailing list