Some thoughts on working with GnuPG

Mon Feb 9 20:27:09 CET 2015

Hi!

Back in October Smári posted an article with the problems he encountered
while integrating GnuPG into mailpile.  See
https://www.mailpile.is/blog/2014-10-07_Some_Thoughts_on_GnuPG.html .

I asked him whether I may comment on this over at gnupg-user and with
2.1.0 out of the way I started to draft a response.  However, I then
figured that 2.1 bug fixes are more important and thus I did not
finished that response.  Find below what I already wrote.  I yanked the
text from the browser, thus there may be formatting issues; I also
skipped parts which are not relevant for my comments.

> support. Financially supporting the GnuPG project is also something
> people should be doing.

Thanks.

> One of the things I'm largely to blame for in Mailpile is the GnuPG
> interface. It's a chunk of Python code that executes the GnuPG binary,
> tosses information at it, and figures out what to do with the
> output. There are lots of libraries for doing this, but after a great
> deal of exploration I found that all of the Python libraries that did
> this were insufficient for our needs, and the only thing crazier than
> manually forking out GnuPG in our situation would be to use the PGPME
> library.

Well, it is called GPGME and that was what I suggest a year ago.  I have
not much experience with Python bindings.  Back when we did the Freenigma
thing (crypto gateway) the folks who were responsible for the code also
used direct calls to gpg instead of writing or using a Python binding.
I arrived only later to the project.

PYME (https://bitbucket.org/malb/pyme) is a SWIG generated interface to
GPGME and actively developed.  It is not a high level interface which
integrates into some Python coding patters but it should be a solid
interface to GPGME and thus GnuPG.  To make it more visible it would be
nice to have it in the GPGME distribution, much like we have a CL
binding there too.  However, this requires a person who takes care of it
(and one feeling responsible for CL would also be appreciated).  I
definitely can't take care about language bindings in addition to the
core GPGME maintenance.

> PGPME is almost as confusing and annoying as calling GnuPG directly, but
> it also requires us to ship architecture-specific libraries to
> everybody, something we're actively avoiding. Having to ship GnuPG

That are two issues.  Confusing? Obviously I have a different opinion.
Architecture specific stuff - sure it is and I would wish we good have a
better distribution system.  I may have remarked on this in the past
already: I wish we can work on a separate GnuPG core system including
the core utils and libraries which is installed on Windows at a standard
place so it may be used by any software not just GnuPG/Gpg4win/Whatever.

That is something we should really improve on.

> we're not, so it isn't. On top of that, the available Python bindings
> for PGPME are very flaky (last updated in 2008!), and not developed or
> maintained by the GnuPG team.

I noticed that there is new activity since January this year.  Too late
for Mailpile, but fixing PYME would probably be better than running in
all problems again.  GPGME is well maintaoned with new releases hand in
hand with new GnuPG features.

> As a result, we've got a roughly 1200 line chunk of code in Mailpile
> that has the fun and useful task of chatting with GnuPG, and the
> stupifyingly annoying task of working around all of GnuPG's
> inconsistencies.

Your choice ;-)

> The problems with GnuPG seem to fall roughly into two broad categories:
> inconsistent output structure, inconsistent interfaces. These are both
> ripe with surprising behaviour and confusing failure modes. In addition
> to these categories, it appears that the larger meta problem is that no
> single statement about its problems is going to remain a stable
> statement, as these problems disappear and reappear at odd intervals as

Marcus (the former GPGME maintainer) and me put a lot of work in APIs
which are backward compatible.  If there is an occassional bug, it will
be fixed - but we need to know about it.

> wit, I have over the course of Mailpile development added, removed, and
> readded a workaround for a bug, although I think I'm safe to say that it
> does not exist post GnuPG 2.1. The comment of that workaround in the
> code illustrates the issue perfectly:
>
> def list_secret_keys(self):
>        #
>        # Note: The "." parameter that is passed is to work around a bug
>        #       in GnuPG < 2.1, where --list-secret-keys does not list
>        #       details about key capabilities or expiry for
>        #       --list-secret-keys unless a selector is provided. A dot
>        #       is reasonably likely to appear in all PGP keys, as it is
>        #       a common component of e-mail addresses (and @ does not
>        #       work as a selector for some reason...)

This has been reported in 2008 as bug 945. It is in general not a
problem because most key managers work by listing the public keys and
then check whether a correspondig secret key exists.  Actually this is
how 2.1 works internally.  Fixing this bug is hard becuase
thesecring.gpg is not an identical copy of pubring.gpg and thus you will
run into a lot of problems even with the missing details fixed.

You can't use '@' as a selector because:

 * - If the username starts with an '@', we assume it is a part of an
 *   email address

It must have got lost from the man page - sorry for that.  But did I see
a bug report?

>        #       BRE: Put --fingerprint at the front and added selectors
>        #            for the worlds MOST POPULAR LETTERS!  Yaaay!

But the output might still be totally wrong.

> First, a word on discoverability. If you ever intend to do anything with
> GnuPG, you first need to read and internalize a document aptly titled
> DETAILS, which contains a lot of the details about what's going on with

or you use GPGME ;-)

> GnuPG output. I have dutifully read, memorized chunks of, and bookmarked
> this file for posterity. It is immensely helpful. For example, it gives
[...]
> Now here comes issue the first: this is essentially a colon separated
> value (CSV!) data structure, but the data being provided is a)
> inconsistent, and b) structured.

You mean that there is no top-down design?  Right that is how life is.

GnuPG stated as a PGP 2 replacement and soon I figured that a machine
readable interface is useful to avoid problems with localization and
changing output intended for humans.  Over the years more and more
status information has been added to this interface - in a compatible
way.  This makes it a bit hard to use but it does not break existing
applications.  If you can start from scratch, you can do a nice API
design but we were not able to do that.

> Notably, the first output line says "there is a public key," and the
> line after it says "here is a fingerprint." Naively one might think that
> these are unrelated. But in fact, all of the lines from the one starting

It basically reflects the OpenPGP key structure.  You can't put
everything into one line - it would be be too hard to debug and extend.

> with pub up to the next one that starts with either pub or sec are
> actually details about the nature of the public key mentioned in the pub
> line - although to make things worse, the fpr lines after the sub lines
> refer to the sub line but not the pub line. Confused yet?

Sure they do.  The fingerprint lines for the subkeys refer to the
subkeys - how could that be different?

> In reality, parsing this isn't too terrible, but it can only be done in

Right, it can be done robustly on the command line using awk.

> the handy DETAILS document, my first version of a parser was overly
> generic and terribly inefficient, because I kept trying to avoid
> inconsistencies.

For a reason we put a lot of thought into the GPGME API.  Designing an
API for a complicated protocol is a tough job.  Thus we do not even have
an API for key signing - it is just too hard to come up with a generic
solution.  The idea was to look what people are using and when we see
the same pattern over and over we can introduce such an API.

> Some of the columns are meaningless for some of the output lines, but
> more shockingly, some of the columns are MISSING sometimes. Three of the

Missing?  They are empty!

> columns just simply evaporate if the line is an fpr-type line. On top of
> that, there's no really good reason why the fingerprint needs to be a
> separate output line rather than just being added in at the right

There is one.  Computing a fingerprint takes some time and back in 1998
we tried to avoid that overhead and require the use of --fingerprint for
including such lines.  That is actually PGP 2 design.  The fingerprint is
also long which would make a "pub" line longer than good for easy
debugging.

> place. According to the DETAILS file, field 10 is for "User ID" - which
> is to say, the name, e-mail address, and comment associated with the
> key. Things that the fingerprint emphatically is not.

It is a different record type and re-using a field which can be used to
search for a key should not be the worst idea.

> It this point you'll notice that field 5 contains the Key ID. And for
> added pain, the key ID is variously the last 8 or the last 16 nibbles
> (hexadecimal digits) of the fingerprint.

That is only true for v4 keys.  v3 key ids are different - you can't
derive the key id from the fingerprint.

Sorry, I had to stop commenting here.

Salam-Shalom,

   Werner

--
Die Gedanken sind frei.  Ausnahmen regelt ein Bundesgesetz.