25519 usability and performance in Libgcrypt

Peter Wu peter at lekensteyn.nl
Mon Jul 30 14:54:11 CEST 2018


Hi,

I have been looking at integrating X25519 (Curve25519 ECDH) into an
application that already uses Libgcrypt. I was not impressed with the
ease of integration nor its performance characteristics and will further
elaborate this.

The manual lacks documentation on using X25519. The closest thing to
documentation is tests/t-cv25519.c. It contains two interfaces:

- a low-level MPI implementation (test_it). This is probably the best
  implementation that is available in Libgcrypt.

- a high-level implementation through gcry_pk_encrypt (test_cv). This
  performs twice as slow as necessary. Internally it computes a scalar
  multiplication, twice (given secret scalar k, it computes kdG and kG).
  One use case is to compute the public key for a given private key, so
  only kG is necessary (and kdG will be equal in this case). Another is
  to compute the shared secret kdG and thus kG is again superfluous.

In fact, the comment for test_cv claims to use gcry_pk_decrypt which it
does not do. Using gcry_pk_decrypt would be a better approach, given k
and dG, compute kdG. The interface is however rather painful. Lots of
internal details leak out of this supposedly high-level interface (this
also applies to gcry_pk_decrypt). This is best demonstrated by code:


    /*
     * Computes Q = X25519(k, 9) where k and q are both encoded in little-endian
     * (as specified by the Curve25519 paper).
     * Returns 0 on success and -1 on failure.
     */
    int x25519_base(unsigned char *q, const unsigned char *k)
    {
        unsigned char k_be[32];
        gcry_sexp_t s_pk = NULL;
        gcry_sexp_t s_bp = NULL;
        gcry_sexp_t s_result = NULL;

        // Issue 1: Gcrypt MPI library stores integers in big-endian format (MSB
        // first) while Curve25519 keys are usually little-endian. This seems
        // also visible when using the "%b" format with gcry_sexp_build.
        for (int i = 0; i < 32; i++) {
            k_be[31 - i] = k[i];
        }

        // Issue 2: unlike gcry_pk_encrypt, there is no clamping in
        // gcry_pk_decrypt. So do it manually. Note big-endian.
        k_be[0] &= 127;
        k_be[0] |= 64;
        k_be[31] &= 0xf8;

        // Build private key from a valid Curve25519 secret (after clamping).
        int r = gcry_sexp_build(&s_pk, NULL,
                                  "(private-key"
                                  " (ecc"
                                  "  (curve \"Curve25519\")"
                                  "  (flags djb-tweak)"
                                  "  (d%b)))", 32, k_be);
        assert(r == 0);

        // Issue 3: internal details leak out: why can't the public key directly
        // be passed?
        // Issue 4: trying to pass 32 bytes without prefix triggers another bug:
        // 9 is treated as +0900000000000000 due to _gcry_ecc_mont_decodepoint.
        unsigned char basepoint[33] = { 0x40, 9 };
        r = gcry_sexp_build(&s_bp, NULL, "(enc-val(ecdh(e%b)))", 33, basepoint);
        assert(r == 0);

        // Issue 4 also occurs with this alternative code:
        // gcry_mpi_t mpi_bp = gcry_mpi_set_ui(NULL, 9);
        // gcry_sexp_build(&s_bp, NULL, "(enc-val (ecdh (e%m)))", mpi_bp);

        // Finally compute the scalar multiplication.
        r = gcry_pk_decrypt(&s_result, s_bp, s_pk);
        if (r == 0) {
            // Issue 3 continued: why is this internal prefix detail exposed?
            size_t res_len = 0;
            unsigned char *buffer =
                (unsigned char*)gcry_sexp_nth_buffer(s_result, 1, &res_len);
            assert(res_len == 1 + 32);
            assert(buffer[0] == 0x40);
            for (int i = 0; i < 32; i++) {
                q[i] = buffer[1 + i];
            }
            gcry_free(buffer);
        }

        gcry_sexp_release(s_result);
        gcry_sexp_release(s_bp);
        gcry_sexp_release(s_pk);
        return r == 0 ? 0 : -1;
    }

The above x25519_base code can easily be adjusted to compute a shared
secret (adjust "basepoint" in the above example), but this should be
sufficient to illustrate some issues.

Finally I will conclude with a comparison with other implementations.
Sodium/NaCl provides a very nice, simple API. The above code could be
replaced with:

    int x25519_base(unsigned char *q, const unsigned char *k)
    {
        return crypto_scalarmult_base(q, k);
    }

As for the performance, I conducted some benchmarks on an Ivy Bridge
desktop CPU (i7-3770) with GCC 7.3.0 and Ubuntu 18.04. I ran 5×8192
iterations of the above function. Results (from fastest to slowest):

Sodium          (baseline, has optimized asm impl.)
TweetNaCl-O3    18.42x slower (with car25519 fix and -O3).
TweetNaCl-O2    37.59x slower (with car25519 fix and -O2).
Gcrypt (MPI)    38.95x slower (2.11x tweetnacl)
gcry_pk_decrypt 39.12x slower (2.12x tweetnacl)
gcry_pk_encrypt 78.48x slower (4.26x tweetnacl)

Built with libsodium23 1.0.16-2 and libgcrypt20 1.8.1-4ubuntu1.1.
TweetNaCl is a pure C implementation from https://tweetnacl.cr.yp.to/

If performance is important, I would not use Libgcrypt for X25519 since
it has no optimized routines and use Sodium instead.
Since reuse of existing dependencies (Libgcrypt) would be nice, the next
best option is the low-level MPI interface. Not gcry_pk_encrypt because
it is twice as slow. Not gcry_pk_decrypt because it still requires
manual clamping and ugly conversions.
-- 
Kind regards,
Peter Wu
https://lekensteyn.nl



More information about the Gcrypt-devel mailing list