gpg --with-colon contains non-UTF-8 data
Ingo Klöcker
kloecker at kde.org
Wed May 7 09:26:00 CEST 2025
On Dienstag, 6. Mai 2025 21:58:33 Mitteleuropäische Sommerzeit Uwe Kleine-
König wrote:
> [I would have put this in the bug tracker, but I cannot create an
> account there. The hint says to request access on the mailing list, but
> https://lists.gnupg.org/pipermail/gnupg-users/2025-April/067578.html
> didn't work. So here comes another bug report on this mailing list.]
I guess Werner overlooked your request for an account.
> $ gpg --recv-keys --keyserver hkps://keyserver.ubuntu.com
> DE6162B5616BA9C9CAAC03074A55C497F744F705 ...
> $ gpg --list-keys --with-colons
DE6162B5616BA9C9CAAC03074A55C497F744F705 |
> file - /dev/stdin: ISO-8859 text
The relevant output is
uid:-::::1254225361::DAE1213A33A1912544622223DEA25CE02D212561::Toke Høiland-
Jørgensen <toke at toke.dk>:::::::::1746601390:1:
uid:-::::1390257493::F7C00C8A2EAFD18CBEAA5AD703F7DAAF0A9FF40D::Toke Høiland-
Jørgensen <toke.hoiland-jorgensen at kau.se>:::::::::1746601390:1:
uid:-::::1254436708::5BB9504785E4BDFB1A528C5657D2548881DE3267::Toke Høiland-
Jørgensen <toke at tohojo.dk>:::::::::1746601390:1:
uid:-::::1254436103::F9CE85F6CBFD7011C8318DBF4E373959E9324987::Toke H�iland-
J�rgensen <toke at tohojo.dk>:::::::::1746601390:1:
uid:-::::1254225297::2639847E42B838E0E5114670190C51AFC09B1E7E::Toke Høiland-
Jørgensen <tohojo at ruc.dk>:::::::::1746601390:1:
> file considers this ISO-8859 because one of the UIDs isn't proper UTF-8,
> but "Toke Høiland-Jørgensen <toke at tohojo.dk>" encoded in latin1.
Yes, apparently one UID was created with a non-compliant OpenPGP app. I'm
wondering why they didn't revoke it since they added the same UID properly
encoded just a few minutes later.
> It's clear that this UID is invalid, but I'd claim that there is a bug
> in the documentation (gpg(1) claims in the description for --with-colons
> that the output is UTF-8) or in gpg itself (because it doesn't quote the
> invalid chars).
I don't think there's a bug. --with-colons assumes correctly UTF-8 encoded
UIDs and therefore outputs the UIDs as-is. It's impossible to guess the
encoding that was used if it's not UTF-8. More than 20 years ago I added some
heuristics for correcting wrongly encoded UIDs to KMail and it worked somewhat
for my sample of OpenPGP keys, but this sample was very latin-1-biased.
> Fun fact: Without --with-colons the UID is emitted as:
>
> uid [ unknown] Toke H\xf8\x69land-J\xf8\x72gensen
> <toke at tohojo.dk>
>
> which is UTF-8 and needlessly encodes the i and the r following the two
> ø as hex escape.
The output function probably assumes an unprintable 2-byte UTF-8 character and
therefore escapes 2 bytes.
> I wonder if cleaning the key should remove that UID?
Why should it? The UID is neither expired nor revoked. That the UID is not
UTF-8 encoded has no influence whatsoever on the validity UID.
Regards,
Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.gnupg.org/pipermail/gnupg-users/attachments/20250507/0a88f89f/attachment.sig>
More information about the Gnupg-users
mailing list