S-expressions, trailing NULs, libgcrypt and gpg-agent

Mon Jul 29 21:40:50 CEST 2019

Hi GnuPG/libgcrypt folks--

i've been trying to dig more deeply into GnuPG and libgcrypt's use of
S-expressions, in particular the inclusion of non-standard trailing NUL
bytes.

https://dev.gnupg.org/T4652 is one such example in GnuPG itself -- the
S-expression returned by the agent in response to PKDECRYPT command
contains an extra NUL byte -- but it looks like a potential issue baked
deep in libgcrypt too. In particular, this line in _gcry_sexp_sprint()
in src/sexp.c seems problematic:

      *d++ = 0; /* for convenience we make a C string */

The inclusion of a trailing NUL is clearly not called for by the
traditional S-expression definitions or documentation
(https://people.csail.mit.edu/rivest/Sexp.txt) which focuses on
providing length-prefixed strings and other standardized formats that
don't rely on a trailing NUL terminator (and indeed, may embed NUL bytes
on their own).

So a consumer of an S-expression that has a trailing NUL appended needs
to do something special with it.  This causes problems for things like
nettle's sexp-conv tool:

0 dkg at alice:~$ printf '(3:abc)' | sexp-conv 
(abc)
0 dkg at alice:~$ printf '(3:abc)\0' | sexp-conv 
(abc)
Invalid token.
1 dkg at alice:~$ 

The inclusion of a trailing NUL also seems to be an invitation to
misparse the data, because legitimate S-expressions can contain embedded
NULs, so there's no way to use a standard C-string function like
strlen() on such an expression safely.

That said, some S-expression parsers in C might use functions like
strtol() or strtoul(), which are outright dangerous to use on an
arbitrary buffer if you can't guarantee that some sort of terminating
non-numeric character will be present in the stream (strtoul could read
off the end of the buffer in that case).

This is arguably a bug in strtoul(), but that's an issue with POSIX,
which we can't fix here, other than to say "don't use it for parsing
arbitrary potentially-adversarial S-expression-like-things".  But it's
more than a little ironic to fail at parsing a
canonically-length-prefixed format due to a missing delimiter :(

The right answer seems to be to demand that all S-expression parsers be
aggressively defensive when parsing S-expressions, especially from an
unknown source.

Sadly, libgcrypt explicitly documents the spurious trailing NUL byte in
its API:

 -- Function: size_t gcry_sexp_sprint (gcry_sexp_t SEXP, int MODE,
          char *BUFFER, size_t MAXLENGTH)

     Copies the S-expression object SEXP into BUFFER using the format
     specified in MODE.  MAXLENGTH must be set to the allocated length
     of BUFFER.  The function returns the actual length of valid bytes
     put into BUFFER or 0 if the provided buffer is too short.  Passing
     'NULL' for BUFFER returns the required length for BUFFER.  For
     convenience reasons an extra byte with value 0 is appended to the
     buffer.

I'm not sure what "convenience" is intended to mean here -- but it looks
to me like a hassle for users of libgcrypt.  The user that consumes such
a buffer cannot use a standard S-expression parser on the result
(because of the spurious trailing NUL), but they also can't treat it as
a NUL-terminated string.

Even weirder, gcry_sexp_sprint () returns a different length value when
invoked with BUFFER = NULL than it does when MAXLENGTH is as large as it
needs to be.  (e.g. when BUFFER == NULL, it might return 4, but when
supplying it with a large enough buffer to write in, it will return 3 --
but will also write a NUL byte to the fourth octet).

No such documentation is available for clients of gpg-agent who want to
use PKDECRYPT or PKSIGN, though.  I don't know what justifies the
inclusion of the extra trailing NUL byte there.

These idiosyncrasies make it difficult for a user of gcrypt or client of
gpg-agent to reliably consume an S-expression with any
standards-compliant S-expression tools/parsers, because they have to
know about and keep track of this odd variation.  If the goal is
interoperability with other tools, it would be great to be able to
reduce the cognitive load on adopters.

How to fix?
-----------

I'm struggling to imagine how this could be safely corrected, because
the documentation is explicit about baking in the bug.

I guess an SONAME bump for gcrypt might be appropriate, but i also can't
imagine being confident in fixing anything that relies on this behavior
because it is a subtle difference and not easy to catch in an automated
way (not like a change in the arguments to a function, or a function
rename).

A cleaner fix on the libgcrypt side would be to add a new function:

   gcry_sexp_sprint2  (gcry_sexp_t SEXP, int MODE, char *BUFFER, size_t MAXLENGTH)

which is identical to gcry_sexp_sprint () except that it does not add
the trailng NUL byte.  Then you could deprecate gcry_sexp_sprint, and
callers could convert to the new one explicitly, without requiring an
SONAME bump.  When the project is ready to fully remove
gcry_sexp_sprint(), that can happen at the next SONAME bump.

I don't really know to fix gpg-agent's PKDECRYPT or PKSIGN functions
safely.  As Andre points out in T4652, due to forwarded agents, we can't
guarantee that we're talking to the same version of the backend.  It
looks like we could add an option to PKDECRYPT to return the value
without a trailing NUL (invoked as "pkdecrypt --no-trailing-nul" or
something), but i don't know how you'd know as a client of the agent
whether that's possible or not ("getinfo cmd_has_option" appears to be
unreliable: https://dev.gnupg.org/T4661).  Perhaps an additional OPTION
would be better, so a caller can test at runtime?  But pretty much all
of my ideas here involve at least one additional roundtrip between the
agent and the client of the agent, which -- especially for high-latency
forwarded connections -- adds to the lag of an operation that is already
one of the elements that makes gpg slower than it needs to be.

Short term takeaway?
--------------------

Anyway, i think my conclusion for the short term is that anyone parsing
S-expressions from GnuPG or libgcrypt needs to assume that there will be
an extra trailing NUL that needs to be stripped off before handling.

If the overall S-expression is expected to be a list and not an atom, it
is probably safe to simply strip all trailing NUL(s?) backwards, down to
the trailing close-paren.  However, if the overall S-expression might
just be an atom, the only option is to assume that a single literal NUL
has been added to the end, and to explicitly strip ignore it, until we
get a cleaner resolution for the long term.  (otherwise you could get
into trouble when parsing a token like `4:abc\0\0' and "overstrip" the
NUL that is part of the intended literal)

Does anyone see any cleaner way out of this situation for users that
interact with either the agent or libgcrypt?

        --dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.gnupg.org/pipermail/gnupg-devel/attachments/20190729/b8512779/attachment.sig>