multiple timing side channels

Sat Nov 7 23:08:27 CET 2015

This morning I had occasion to glance at the libgcrypt source code,
and to my unpleasant surprise I found a collection of independent,
obvious timing side channels in the code:

- In twisted Edwards scalar multiplication, _gcry_mpi_ec_mul_point
branches depending on whether a bit in a secret scalar is set:

(mpi/ec.c, _gcry_mpi_ec_mul_point)
  1232                _gcry_mpi_ec_add_points (&tmppnt, result, point, ctx);
  1233                if (mpi_test_bit (scalar, j))
  1234                  point_set (result, &tmppnt);

Presumably this is intended to run in constant time, because the
scalar is a secret -- there's even a comment above saying `we use
constant time operation' in that case.  But secret-dependent branches
are not that.  (Easy fix: add point_swap_cond like point_set, using
mpi_cond_swap, and use it here.)

- In conditional mpi swapping, _gcry_mpi_cond_swap branches depending
on the swap condition:

(mpi/mpiutil.c, _gcry_mpi_cond_swap)
   576    mpi_limb_t mask = ((mpi_limb_t)0) - !!swap;

Some compilers may not turn this into a branch -- but some do.
Presumably this is intended to run in constant time because it is used
in code that operates on secrets, so I suggest it be documented as
such.  (Easy fix: saturate instead of !!, e.g. iterate swap |= (swap
<< (1<<i)) | (swap >> (1<<i)) for i in 0..6.)

- In mpi bit testing, _gcry_mpi_test_bit branches depending on the
value of the limb:

(mpi/mpi-bit.c, _gcry_mpi_test_bit)
   109      return (limb & (A_LIMB_1 << bitno))? 1: 0;

Again, some compilers may use a constant-time conditional move here --
but some will use a branch.  (Easy fix: return 1 & (limb >> bitno).)

- ~All general-purpose modular reduction involves numerator- and
denominator-dependent branches, e.g.:

(mpi/mpih-div.c, _gcry_mpih_divrem)
   319              if( n0 >= dX ) {

Here n0 and dX are limbs of the numerator and the denominator.  This
includes modular reduction for elliptic curve arithmetic, in which the
numerator is secret; and modular reduction for RSA, in which the
numerator (plaintext message) or denominator (p, q) can be secret.

- Many arithmetic routines normalize their inputs and outputs, so that
secrets flow into nlimbs and thence into loop counts and memory
reference patterns.

I don't have exploits for these -- but I hope the world is at a stage
in crypto engineering where it is not necessary to demonstrate remote
exploitability of every obvious secret-dependent branch and memory
reference.  These are all I found in half an hour of code inspection,
when until I found the first two I hadn't even thought to look for
timing side channels.

I started writing a patch, but the code is so riddled with
secret-dependent branches and memory references that it's not an easy
effort.  In order to avoid a new paper year after year demonstrating a
new timing side channel exploit on GnuPG, I suggest:

1. Avoid general-purpose division.  For divisors fixed by an
algorithm, divisor-specific reduction is usually better, especially
for divisors chosen for it such as 2^255 - 19 or 2^448 - 2^224 - 1.
For a priori unknown divisors, Montgomery reduction is likely much
faster anyway -- and together with Barrett reduction, it is never
necessary to combine attacker-controlled data with secrets in a
general-purpose division.  (For RSA, you need to divide a constant,
4^k, by p and by q, every time you load a key, but that's all.)

2. Eliminate mpi normalization.  Most k-bit multiprecision integers
that libgcrypt handles are uniformly distributed in [0, 2^k), or at
least in [0, p) where 2^(k - 1) <= p <= 2^k.  It's hard to imagine
that there's much value in saving a handful of integer operations on
the last limb once in every ~2^32 cases for an mpi operation -- but
this frequency is high enough that it's not hard to imagine devising a
timing attack where you learn something after a billion messages.

3. Eliminate mpi altogether for arithmetic in fixed finite fields,
such as GF(2^255 - 19) as used in Curve25519.  There's plenty of
easy-to-use, high-quality, high-performance, constant-time code to
compute it -- faster and more safely than the generic mpi code.

4. Eliminate the generic elliptic-curve abstraction, especially for
new curves.  For modern curve design, it offers no benefits over
curve-specific code, and makes variable-time code much more tempting.
Applications don't care that there are elliptic curves or points on
them involved -- applications deal in opaque octet strings.

5. Eliminate mpi_is_secure.  This interleaves code paths that may
operate on secret or public data.  Better to statically distinguish
the code paths that operate on secrets.  While it is OK to use code
paths designed for secret data on public data, making the code paths
conditional makes it harder to audit.  Auditability is critical for
code that millions of people rely on for crypto.

6. Aggressively reject all new code, and prune old code, that uses
secret-dependent branches and memory references.  Kocher's paper was
published in 1996; it shouldn't take twenty years for the world to
learn its lesson.  I understand why the RSA code written long ago
might be vulnerable -- but twisted Edwards arithmetic was designed
from the beginning to make constant-time evaluation easy.