Unicode and --with-colons
Robert J. Hansen
rjh at sixdemonbag.org
Sat Apr 1 10:57:04 CEST 2017
C:\Users\Robert J. Hansen\Desktop> gpg --fixed-list-mode --with-colons
--list-key 0x3ADBFA6D00A1E6FE
=====
[... trimmed ...]
uid:-::::1436536488::100E4A12486A5261E374B3B0CA16CF0516F4367C::Ludwig
Hügelschäfer <ludwig at hammernoch.net>:
=====
"That's an odd encoding," I said to myself. "It must be UTF-8 presented
as ASCII or Windows-1252. Let's look, shall we?"
=====
C:\Users\Robert J. Hansen\Desktop> gpg --fixed-list-mode --with-colons
--list-key 0x3ADBFA6D00A1E6FE > ludwig.asc
C:\Users\Robert J. Hansen\Desktop> python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("ludwig.asc") as fh:
... bytes = fh.read()
...
>>> bytes
'ÿþt\x00r\x00u\x00:\x00:\x001\x00:\x001\x004\x009\x001\x000\x003\x004\x004\x004\x009\x00:\x000\x00:\x003\x00:\x001\x00...'
=====
Weirder and weirder. GnuPG is outputting data in UTF-16LE, complete
with a correct byte-order mark... but is first taking what is
(apparently) the UTF-8 of Ludwig's name, giving each byte a null pair
byte, and calling it UTF-16.
Looking at the output from just a plain --list-key, it appears correct:
=====
\x00H\x00ü\x00g\x00e\x00l\x00s\x00c\x00h\x00ä\x00f\x00e\x00r
=====
So -- what's the canonically approved way to convert this mangled form
back into Unicode? Is this mangled form a deliberate design choice, or
is this a bug?
More information about the Gnupg-users
mailing list