gpg --with-colon contains non-UTF-8 data

Ingo Klöcker kloecker at kde.org
Wed May 7 09:26:00 CEST 2025


On Dienstag, 6. Mai 2025 21:58:33 Mitteleuropäische Sommerzeit Uwe Kleine-
König wrote:
> [I would have put this in the bug tracker, but I cannot create an
> account there. The hint says to request access on the mailing list, but
> https://lists.gnupg.org/pipermail/gnupg-users/2025-April/067578.html
> didn't work. So here comes another bug report on this mailing list.]

I guess Werner overlooked your request for an account.

> 	$ gpg --recv-keys --keyserver hkps://keyserver.ubuntu.com
> DE6162B5616BA9C9CAAC03074A55C497F744F705 ...
> 	$ gpg --list-keys --with-colons 
DE6162B5616BA9C9CAAC03074A55C497F744F705 |
> file - /dev/stdin: ISO-8859 text

The relevant output is
uid:-::::1254225361::DAE1213A33A1912544622223DEA25CE02D212561::Toke Høiland-
Jørgensen <toke at toke.dk>:::::::::1746601390:1:
uid:-::::1390257493::F7C00C8A2EAFD18CBEAA5AD703F7DAAF0A9FF40D::Toke Høiland-
Jørgensen <toke.hoiland-jorgensen at kau.se>:::::::::1746601390:1:
uid:-::::1254436708::5BB9504785E4BDFB1A528C5657D2548881DE3267::Toke Høiland-
Jørgensen <toke at tohojo.dk>:::::::::1746601390:1:
uid:-::::1254436103::F9CE85F6CBFD7011C8318DBF4E373959E9324987::Toke H�iland-
J�rgensen <toke at tohojo.dk>:::::::::1746601390:1:
uid:-::::1254225297::2639847E42B838E0E5114670190C51AFC09B1E7E::Toke Høiland-
Jørgensen <tohojo at ruc.dk>:::::::::1746601390:1:

> file considers this ISO-8859 because one of the UIDs isn't proper UTF-8,
> but "Toke Høiland-Jørgensen <toke at tohojo.dk>" encoded in latin1.

Yes, apparently one UID was created with a non-compliant OpenPGP app. I'm 
wondering why they didn't revoke it since they added the same UID properly 
encoded just a few minutes later.

> It's clear that this UID is invalid, but I'd claim that there is a bug
> in the documentation (gpg(1) claims in the description for --with-colons
> that the output is UTF-8) or in gpg itself (because it doesn't quote the
> invalid chars).

I don't think there's a bug. --with-colons assumes correctly UTF-8 encoded 
UIDs and therefore outputs the UIDs as-is. It's impossible to guess the 
encoding that was used if it's not UTF-8. More than 20 years ago I added some 
heuristics for correcting wrongly encoded UIDs to KMail and it worked somewhat 
for my sample of OpenPGP keys, but this sample was very latin-1-biased.

> Fun fact: Without --with-colons the UID is emitted as:
> 
> 	uid           [ unknown] Toke H\xf8\x69land-J\xf8\x72gensen
> <toke at tohojo.dk>
> 
> which is UTF-8 and needlessly encodes the i and the r following the two
> ø as hex escape.

The output function probably assumes an unprintable 2-byte UTF-8 character and 
therefore escapes 2 bytes.

> I wonder if cleaning the key should remove that UID?

Why should it? The UID is neither expired nor revoked. That the UID is not 
UTF-8 encoded has no influence whatsoever on the validity UID.

Regards,
Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.gnupg.org/pipermail/gnupg-users/attachments/20250507/0a88f89f/attachment.sig>


More information about the Gnupg-users mailing list