From grothoff at gnunet.org  Tue Aug 14 10:51:31 2018
From: grothoff at gnunet.org (Christian Grothoff)
Date: Tue, 14 Aug 2018 10:51:31 +0200
Subject: Ohhhh jeeee: mulm_25519: different sizes
Message-ID: <33eebc83-3262-4dba-c20f-79d1b977aede@gnunet.org>

Hi!

Just a quick crash report:

libgcrypt from git master causes an assertion on my AMD 1950X when
running GNUnet's src/util/test_crypto_ecc_dlog logic:

.Ohhhh jeeee: mulm_25519: different sizes
FAIL test_crypto_ecc_dlog (exit status: 134)

With GDB:
.Ohhhh jeeee: mulm_25519: different sizes

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) ba
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7b242f1 in __GI_abort () at abort.c:79
#2  0x00007ffff7e8e587 in _gcry_logv (level=50, fmt=0x7ffff7f2c4a4
"mulm_25519: different sizes\n", arg_ptr=0x7fffffffdd18) at misc.c:142
#3  0x00007ffff7e8e94d in _gcry_log_bug (fmt=0x7ffff7f2c4a4 "mulm_25519:
different sizes\n") at misc.c:229
#4  0x00007ffff7f157b1 in ec_mulm_25519 (w=0x55555555a1f0,
u=0x55555555b6d0, v=0x55555555b630, ctx=0x55555555a010) at ec.c:431
#5  0x00007ffff7f1760d in add_points_edwards (result=0x55555555b8a0,
p1=0x55555555b670, p2=0x55555555b750, ctx=0x55555555a010) at ec.c:1305
#6  0x00007ffff7f17cc7 in _gcry_mpi_ec_add_points
(result=0x55555555b8a0, p1=0x55555555b670, p2=0x55555555b750,
ctx=0x55555555a010) at ec.c:1416
#7  0x00007ffff7e8c659 in gcry_mpi_ec_add (w=0x55555555b8a0,
u=0x55555555b670, v=0x55555555b750, ctx=0x55555555a000) at visibility.c:580
#8  0x00007ffff7f70fd9 in GNUNET_CRYPTO_ecc_dlog (edc=0x555555559bd0,
input=0x55555555b670) at crypto_ecc_dlog.c:259
#9  0x0000555555555580 in test_dlog (edc=0x555555559bd0) at
test_crypto_ecc_dlog.c:99
#10 0x0000555555555972 in main (argc=1, argv=0x7fffffffe138) at
test_crypto_ecc_dlog.c:186


Note that on the same system using Debian's 1.8.3-1 package things work,
so either it is the build or a regression.

I used:

./configure  --prefix=/home/grothoff
--with-libgpg-error-prefix=/home/grothoff --enable-ciphers="blowfish aes
twofish" --enable-digests="crc md5 sha1 sha256 sha512"
--enable-kdfs=scrypt CFLAGS=-g -O0 -Wall


Happy hacking!

Christian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.gnupg.org/pipermail/gcrypt-devel/attachments/20180814/52093c0e/attachment.sig>

From grothoff at gnunet.org  Tue Aug 14 10:51:19 2018
From: grothoff at gnunet.org (Christian Grothoff)
Date: Tue, 14 Aug 2018 10:51:19 +0200
Subject: libgcrypt performance issue, diagnostic & workaround
Message-ID: <fd8de722-1463-a2ab-117c-472d80d7255c@gnunet.org>

Hi!

Running the GNU Taler exchange (which uses libgcrypt) on a many-core
(32-thread) system I noticed that futex syscall was dominating CPU
utilization.  Digging deeper into it, the reason is the
SECMEM_LOCK/SECMEM_UNLOCK logic, despite the fact that I (properly)
disable the use of SECMEM via gcry_control on startup
(gnunet/src/util/crypto_random.c):


/**
 * Initialize libgcrypt.
 */
void __attribute__ ((constructor))
GNUNET_CRYPTO_random_init ()
{
  gcry_error_t rc;

  if (! gcry_check_version (NEED_LIBGCRYPT_VERSION))
  {
    FPRINTF (stderr,
             _("libgcrypt has not the expected version (version %s is
required).\n"),
             NEED_LIBGCRYPT_VERSION);
    GNUNET_assert (0);
  }
  /* Disable use of secure memory */
  if ((rc = gcry_control (GCRYCTL_DISABLE_SECMEM, 0)))
    FPRINTF (stderr,
             "Failed to set libgcrypt option %s: %s\n",
             "DISABLE_SECMEM",
	     gcry_strerror (rc));
  /* Otherwise gnunet-ecc takes forever to complete, besides
     we are fine with "just" using GCRY_STRONG_RANDOM */
  if ((rc = gcry_control (GCRYCTL_ENABLE_QUICK_RANDOM, 0)))
    FPRINTF (stderr,
	     "Failed to set libgcrypt option %s: %s\n",
	     "ENABLE_QUICK_RANDOM",
	     gcry_strerror (rc));
  gcry_control (GCRYCTL_INITIALIZATION_FINISHED, 0);
}


Sample stack trace:

#2  0x00007fbab01077ea in _gcry_secmem_free (a=0x7fba4000faf0) at
secmem.c:753
#3  0x00007fbab01068f6 in _gcry_private_free (a=0x7fba4000faf0) at
stdmem.c:238
#4  0x00007fbab0100c8d in _gcry_free (p=0x7fba4000faf0) at global.c:1033
#5  0x00007fbab01018e9 in _gcry_sexp_release (sexp=0x7fba4000faf0) at
sexp.c:350
#6  0x00007fbab01025e1 in _gcry_sexp_cadr (list=0x7fba40005dc0) at
sexp.c:947
#7  0x00007fbab00fbc6a in gcry_sexp_cadr (list=0x7fba40005dc0) at
visibility.c:226
#8  0x00007fbab0052a1e in key_from_sexp (array=0x7fba4fffe4b0,
sexp=0x7fba4001a5f0, topname=0x7fbab009244d "public-key",
elems=0x7fbab00924b6 "n") at crypto_rsa.c:103
#9  0x00007fbab00534d0 in GNUNET_CRYPTO_rsa_public_key_decode (
    buf=0x7fba40003630 "(public-key \n (rsa \n  (n
#00C8B6DCDE035F1BD788CE1062F7F4D5DF64AF0C87C6B080B17E5313BEBC0FDAA1A711C03B55447778D53E70BF9CEBA198707D37E5B206C0A6B9EFD4DBBA79B6426487513F7D838EB16D1346D80A
6758C9739EA5B94C51"..., len=310) at crypto_rsa.c:380
#10 0x00007fbab0017129 in parse_rsa_public_key (cls=0x0,
root=0x7fba4001dd60, spec=0x7fba4fffe680) at json_helper.c:799
#11 0x00007fbab00144f6 in GNUNET_JSON_parse (root=0x7fba4001cb40,
spec=0x7fba4fffe610, error_json_name=0x7fba4fffe5c0,
error_line=0x7fba4fffe5bc) at json.c:62
#12 0x000055680f9af868 in TEH_PARSE_json_data
(connection=0x7fba4001e370, root=0x7fba4001cb40, spec=0x7fba4fffe610) at
taler-exchange-httpd_parsing.c:190
#13 0x000055680f9a96ea in TEH_DEPOSIT_handler_deposit (rh=0x55680f9c7988
<handlers+616>, connection=0x7fba4001e370,
connection_cls=0x7fba4001e3c8, upload_data=0x0,
upload_data_size=0x7fba4fffeb70)


Note that 'mainpool' clearly was never setup:

(gdb) print mainpool
$4 = {next = 0x0, mem = 0x0, size = 0, okay = 0, is_mmapped = 0,
cur_alloced = 0, cur_blocks = 0}
(gdb) print no_secure_memory
$5 = 1


My understanding is that the bug is that stdmem.c::_gcry_private_free()
first tries to call _gcry_secmem_free() before calling free(), thus
uselessly grabbing the lock.  Here, a test whether secmem is disabled
instead of grabbing the lock ought to be inserted.


As a workaround, I've added this call to our libgcrypt initialization
sequence:

  /* set custom allocators */
  gcry_set_allocation_handler (&w_malloc, // wrapper around calloc()
                               &w_malloc, // wrapper around calloc()
                               &w_check,  // return 'false'
                               &realloc,
                               &free);

After that, the GNU Taler exchange could do > 10x the number of
operations per second on my system (might be more, but the bottleneck is
now elsehwere).


Happy hacking!

Christian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.gnupg.org/pipermail/gcrypt-devel/attachments/20180814/6b1ee9ca/attachment-0001.sig>

From gniibe at fsij.org  Thu Aug 23 02:49:32 2018
From: gniibe at fsij.org (NIIBE Yutaka)
Date: Thu, 23 Aug 2018 09:49:32 +0900
Subject: request for TLBleed information / non-constant-time
 vulnerabilities
In-Reply-To: <d1993fc8-c40b-9ca3-0992-72d06152f8d8@digitalocean.com>
References: <d1993fc8-c40b-9ca3-0992-72d06152f8d8@digitalocean.com>
Message-ID: <877ekhdi03.fsf@iwagami.gniibe.org>

Hello,

I should have answered earlier.  I was joining Debconf18 in Hsinchu and
was busy to distribute 5x5 LED Matrix board.

The paper is great, it shows us coarse-grained leak like the one through
TLB (the shared resource) is still important.

But... from the libgcrypt developer(s) view, it's somewhat confusing, I
have to say.  I don't know if it's intentional, if not, I wish
improvement for their clarification.

Please note that they have not reported to us.  If had, I would have
complained. :-)


"Michael R. Hines" <mrhines at digitalocean.com> wrote:
> Our team is trying to get an accurate understanding of whether or not 
> the most recent version of libgcrypt does or does not remain vulnerable
> to the timing attack document here: 
> https://www.vusec.net/wp-content/uploads/2018/07/tlbleed-author-preprint.pdf

For an attack to libgcrypt EdDSA described in the paper, it is
irrelevant to _any_ versions of libgcrypt, if "signing" refers the use
of the function _gcry_ecc_eddsa_sign.

I mean, the scenario never occurs in real use case of
_gcry_ecc_eddsa_sign.  In the paper, it says, "We use the non-
constant-time version in this work.", in the second paragraph in section
8 "Temporal Analysis".  I interpret that they changed the code so that
their attack can work for that path.  It is artificial, from my
viewpoint.

In _gcry_ecc_eddsa_sign, the scalar A is allocated in secure memory.
So, it always goes another path, which always call
_gcry_mpi_ec_add_points.

They would have targeted the call of point_set function in the old
version of 1.6.3.  This was fixed in 2015, to use point_swap_cond,
unconditionally.  Version 1.7.0, 1.8.0 or later has this fix.


For an attack to libgcrypt RSA, it is also not the real use case,
but artificial (for 1.6, 1.7 and 1.8).

We keep the old and simple implementation of _gcry_mpi_powm, which can
be enabled with the macro USE_ALGORITHM_SIMPLE_EXPONENTIATION.  Their
target is this code, which is not used since 1.6.0.

Well, in 1.8.0, I modified this implementation too.  So, if they attack
the one in 1.8.0, it won't work.


> In addition to that, more generally, as a cloud provider we are also 
> looking for input and guidance on whether or not there are other 
> locations that should be of concern within libgcrypt from the 
> perspective of TLBleed.

Although their current version of "attacks" are pretty artificial (for
me), their points make sense; There are more possibility of side channel
leaks, along with improvements of signal analysis and reverse
engineering of lower architecture.

We need to remove any non-constant time code.
-- 


From mrhines at digitalocean.com  Thu Aug 23 03:34:18 2018
From: mrhines at digitalocean.com (Michael R. Hines)
Date: Wed, 22 Aug 2018 20:34:18 -0500
Subject: request for TLBleed information / non-constant-time
 vulnerabilities
In-Reply-To: <877ekhdi03.fsf@iwagami.gniibe.org>
References: <d1993fc8-c40b-9ca3-0992-72d06152f8d8@digitalocean.com>
 <877ekhdi03.fsf@iwagami.gniibe.org>
Message-ID: <95abb1c4-f907-0e80-7602-525c69575ddf@digitalocean.com>

NIIBE,

Thank you so much for your response. This was extremely helpful. =)

- Michael

On 08/22/2018 07:49 PM, NIIBE Yutaka wrote:
> Hello,
>
> I should have answered earlier.  I was joining Debconf18 in Hsinchu and
> was busy to distribute 5x5 LED Matrix board.
>
> The paper is great, it shows us coarse-grained leak like the one through
> TLB (the shared resource) is still important.
>
> But... from the libgcrypt developer(s) view, it's somewhat confusing, I
> have to say.  I don't know if it's intentional, if not, I wish
> improvement for their clarification.
>
> Please note that they have not reported to us.  If had, I would have
> complained. :-)
>
>
> "Michael R. Hines" <mrhines at digitalocean.com> wrote:
>> Our team is trying to get an accurate understanding of whether or not
>> the most recent version of libgcrypt does or does not remain vulnerable
>> to the timing attack document here:
>> https://www.vusec.net/wp-content/uploads/2018/07/tlbleed-author-preprint.pdf
> For an attack to libgcrypt EdDSA described in the paper, it is
> irrelevant to _any_ versions of libgcrypt, if "signing" refers the use
> of the function _gcry_ecc_eddsa_sign.
>
> I mean, the scenario never occurs in real use case of
> _gcry_ecc_eddsa_sign.  In the paper, it says, "We use the non-
> constant-time version in this work.", in the second paragraph in section
> 8 "Temporal Analysis".  I interpret that they changed the code so that
> their attack can work for that path.  It is artificial, from my
> viewpoint.
>
> In _gcry_ecc_eddsa_sign, the scalar A is allocated in secure memory.
> So, it always goes another path, which always call
> _gcry_mpi_ec_add_points.
>
> They would have targeted the call of point_set function in the old
> version of 1.6.3.  This was fixed in 2015, to use point_swap_cond,
> unconditionally.  Version 1.7.0, 1.8.0 or later has this fix.
>
>
> For an attack to libgcrypt RSA, it is also not the real use case,
> but artificial (for 1.6, 1.7 and 1.8).
>
> We keep the old and simple implementation of _gcry_mpi_powm, which can
> be enabled with the macro USE_ALGORITHM_SIMPLE_EXPONENTIATION.  Their
> target is this code, which is not used since 1.6.0.
>
> Well, in 1.8.0, I modified this implementation too.  So, if they attack
> the one in 1.8.0, it won't work.
>
>
>> In addition to that, more generally, as a cloud provider we are also
>> looking for input and guidance on whether or not there are other
>> locations that should be of concern within libgcrypt from the
>> perspective of TLBleed.
> Although their current version of "attacks" are pretty artificial (for
> me), their points make sense; There are more possibility of side channel
> leaks, along with improvements of signal analysis and reverse
> engineering of lower architecture.
>
> We need to remove any non-constant time code.