From borneo.antonio at gmail.com  Fri Jul  1 12:09:12 2016
From: borneo.antonio at gmail.com (Antonio Borneo)
Date: Fri, 1 Jul 2016 12:09:12 +0200
Subject: PIC, alignment problems with libcrypt on armv7
In-Reply-To: <5775916F.8010505@iki.fi>
References: <C459156C-545C-47EF-904A-CFEAB535C244@plass-family.net>
 <5774CDA5.2090001@iki.fi> <5775916F.8010505@iki.fi>
Message-ID: <CAAj6DX0uiNXYYNAHokwjESe81R64GHu1SWFrw44Drs=Lsn2dng@mail.gmail.com>

On Thu, Jun 30, 2016 at 11:38 PM, Jussi Kivilinna
<jussi.kivilinna at iki.fi> wrote:
> On 30.06.2016 10:43, Jussi Kivilinna wrote:
>
>>> The second problem showed up as a bus error running tests/basic.
>>> The problem is that ldm/stm don't deal with unaligned accesses even
>>> on armv7 (see http://www.heyrick.co.uk/armwiki/Unaligned_data_access).
>>> My workaround is to undef the gcc-defined feature symbol, but a better
>>> fix would be to strip out the conditional guards, since the alignment
>>> adjustments are needed on all versions.
>>
>> I have made wrong assumption about unaligned accesses with ldm/stm.
>> I'll make the needed changes and add proper unaligned buffer test cases
>> so that these will be caught in future.
>>
>
> Appears that there is proper tests for unaligned buffers. However those
> tests did not fail for me, since on Linux unaligned ldm/stm exception
> is caught and handled by kernel.

Jussi,

on arnv7 target you can disable kernel unaligned handling and enable
the unaligned fault with
echo 4 > /proc/cpu/alignment
as explained in kernel Documentation/arm/mem_alignment

Regards,
Antonio


From jussi.kivilinna at iki.fi  Fri Jul  1 22:07:58 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Fri, 01 Jul 2016 23:07:58 +0300
Subject: [PATCH] Fix static build
Message-ID: <146740367798.16457.1799426351699167255.stgit@localhost6.localdomain6>

* tests/pubkey.c (_gcry_pk_util_get_nbits): Make function 'static'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/tests/pubkey.c b/tests/pubkey.c
index 3eb5b4f..1271e43 100644
--- a/tests/pubkey.c
+++ b/tests/pubkey.c
@@ -175,7 +175,7 @@ show_sexp (const char *prefix, gcry_sexp_t a)
 }
 
 /* from ../cipher/pubkey-util.c */
-gpg_err_code_t
+static gpg_err_code_t
 _gcry_pk_util_get_nbits (gcry_sexp_t list, unsigned int *r_nbits)
 {
   char buf[50];


From cvs at cvs.gnupg.org  Sun Jul  3 17:18:00 2016
From: cvs at cvs.gnupg.org (by Jussi Kivilinna)
Date: Sun, 03 Jul 2016 17:18:00 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-15-gcb79630
Message-ID: <E1bJj73-0001lz-N0@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  cb79630ec567a5f2e03e5f863cda168faa7b8cc8 (commit)
       via  07de9858032826f5a7b08c372f6bcc73bbb503eb (commit)
       via  a6158a01a4d81a5d862e1e0a60bfd6063443311d (commit)
       via  a09126242a51c4ea4564b0f70b808e4f27fe5a91 (commit)
       via  4a983e3bef58b9d056517e25e0ab10b72d12ceba (commit)
      from  6965515c73632a088fb126a4a55e95121671fa98 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit cb79630ec567a5f2e03e5f863cda168faa7b8cc8
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Fri Jul 1 23:07:07 2016 +0300

    Fix static build
    
    * tests/pubkey.c (_gcry_pk_util_get_nbits): Make function 'static'.
    --
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/tests/pubkey.c b/tests/pubkey.c
index 3eb5b4f..1271e43 100644
--- a/tests/pubkey.c
+++ b/tests/pubkey.c
@@ -175,7 +175,7 @@ show_sexp (const char *prefix, gcry_sexp_t a)
 }
 
 /* from ../cipher/pubkey-util.c */
-gpg_err_code_t
+static gpg_err_code_t
 _gcry_pk_util_get_nbits (gcry_sexp_t list, unsigned int *r_nbits)
 {
   char buf[50];

commit 07de9858032826f5a7b08c372f6bcc73bbb503eb
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Thu Jun 30 21:51:50 2016 +0300

    Disallow encryption/decryption if key is not set
    
    * cipher/cipher.c (cipher_encrypt, cipher_decrypt): If mode is not
    NONE, make sure that key is set.
    * cipher/cipher-ccm.c (_gcry_cipher_ccm_set_nonce): Do not clear
    'marks.key' when reseting state.
    --
    
    Reported-by: Andreas Metzler <ametzler at bebt.de>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/cipher-ccm.c b/cipher/cipher-ccm.c
index 4d8f816..d7f14d8 100644
--- a/cipher/cipher-ccm.c
+++ b/cipher/cipher-ccm.c
@@ -110,6 +110,7 @@ gcry_err_code_t
 _gcry_cipher_ccm_set_nonce (gcry_cipher_hd_t c, const unsigned char *nonce,
                             size_t noncelen)
 {
+  unsigned int marks_key;
   size_t L = 15 - noncelen;
   size_t L_;
 
@@ -122,12 +123,14 @@ _gcry_cipher_ccm_set_nonce (gcry_cipher_hd_t c, const unsigned char *nonce,
     return GPG_ERR_INV_LENGTH;
 
   /* Reset state */
+  marks_key = c->marks.key;
   memset (&c->u_mode, 0, sizeof(c->u_mode));
   memset (&c->marks, 0, sizeof(c->marks));
   memset (&c->u_iv, 0, sizeof(c->u_iv));
   memset (&c->u_ctr, 0, sizeof(c->u_ctr));
   memset (c->lastiv, 0, sizeof(c->lastiv));
   c->unused = 0;
+  c->marks.key = marks_key;
 
   /* Setup CTR */
   c->u_ctr.ctr[0] = L_;
diff --git a/cipher/cipher.c b/cipher/cipher.c
index 2b7bf21..ff3340f 100644
--- a/cipher/cipher.c
+++ b/cipher/cipher.c
@@ -818,6 +818,12 @@ cipher_encrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen,
 {
   gcry_err_code_t rc;
 
+  if (c->mode != GCRY_CIPHER_MODE_NONE && !c->marks.key)
+    {
+      log_error ("cipher_encrypt: key not set\n");
+      return GPG_ERR_MISSING_KEY;
+    }
+
   switch (c->mode)
     {
     case GCRY_CIPHER_MODE_ECB:
@@ -935,6 +941,12 @@ cipher_decrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen,
 {
   gcry_err_code_t rc;
 
+  if (c->mode != GCRY_CIPHER_MODE_NONE && !c->marks.key)
+    {
+      log_error ("cipher_decrypt: key not set\n");
+      return GPG_ERR_MISSING_KEY;
+    }
+
   switch (c->mode)
     {
     case GCRY_CIPHER_MODE_ECB:

commit a6158a01a4d81a5d862e1e0a60bfd6063443311d
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Thu Jun 30 21:34:46 2016 +0300

    Avoid unaligned accesses with ARM ldm/stm instructions
    
    * cipher/rijndael-arm.S: Remove __ARM_FEATURE_UNALIGNED ifdefs, always
    compile with unaligned load/store code paths.
    * cipher/sha512-arm.S: Ditto.
    --
    
    Reported-by: Michael Plass <mfpnb at plass-family.net>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/rijndael-arm.S b/cipher/rijndael-arm.S
index 694369d..e3a91c2 100644
--- a/cipher/rijndael-arm.S
+++ b/cipher/rijndael-arm.S
@@ -225,7 +225,7 @@ _gcry_aes_arm_encrypt_block:
 	push {%r4-%r11, %ip, %lr};
 
 	/* read input block */
-#ifndef __ARM_FEATURE_UNALIGNED
+
 	/* test if src is unaligned */
 	tst	%r2, #3;
 	beq	1f;
@@ -238,7 +238,6 @@ _gcry_aes_arm_encrypt_block:
 	b	2f;
 .ltorg
 1:
-#endif
 	/* aligned load */
 	ldm	%r2, {RA, RB, RC, RD};
 #ifndef __ARMEL__
@@ -277,7 +276,7 @@ _gcry_aes_arm_encrypt_block:
 	add	%sp, #16;
 
 	/* store output block */
-#ifndef __ARM_FEATURE_UNALIGNED
+
 	/* test if dst is unaligned */
 	tst	RT0, #3;
 	beq	1f;
@@ -290,7 +289,6 @@ _gcry_aes_arm_encrypt_block:
 	b	2f;
 .ltorg
 1:
-#endif
 	/* aligned store */
 #ifndef __ARMEL__
 	rev	RA, RA;
@@ -484,7 +482,7 @@ _gcry_aes_arm_decrypt_block:
 	push {%r4-%r11, %ip, %lr};
 
 	/* read input block */
-#ifndef __ARM_FEATURE_UNALIGNED
+
 	/* test if src is unaligned */
 	tst	%r2, #3;
 	beq	1f;
@@ -497,7 +495,6 @@ _gcry_aes_arm_decrypt_block:
 	b	2f;
 .ltorg
 1:
-#endif
 	/* aligned load */
 	ldm	%r2, {RA, RB, RC, RD};
 #ifndef __ARMEL__
@@ -533,7 +530,7 @@ _gcry_aes_arm_decrypt_block:
 	add	%sp, #16;
 
 	/* store output block */
-#ifndef __ARM_FEATURE_UNALIGNED
+
 	/* test if dst is unaligned */
 	tst	RT0, #3;
 	beq	1f;
@@ -546,7 +543,6 @@ _gcry_aes_arm_decrypt_block:
 	b	2f;
 .ltorg
 1:
-#endif
 	/* aligned store */
 #ifndef __ARMEL__
 	rev	RA, RA;
diff --git a/cipher/sha512-arm.S b/cipher/sha512-arm.S
index 28f156e..94ec014 100644
--- a/cipher/sha512-arm.S
+++ b/cipher/sha512-arm.S
@@ -323,7 +323,7 @@ _gcry_sha512_transform_arm:
 	stm RWhi, {RT1lo,RT1hi,RT2lo,RT2hi,RT3lo,RT3hi,RT4lo,RT4hi}
 
 	/* Load input to w[16] */
-#ifndef __ARM_FEATURE_UNALIGNED
+
 	/* test if data is unaligned */
 	tst %r1, #3;
 	beq 1f;
@@ -341,7 +341,6 @@ _gcry_sha512_transform_arm:
 
 	read_be64_unaligned_4(%r1, 12 * 8, RT1lo, RT1hi, RT2lo, RT2hi, RT3lo, RT3hi, RT4lo, RT4hi, RWlo);
 	b 2f;
-#endif
 1:
 	/* aligned load */
 	add RWhi, %sp, #(w(0));

commit a09126242a51c4ea4564b0f70b808e4f27fe5a91
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Thu Jun 30 21:23:05 2016 +0300

    Fix non-PIC reference in PIC for poly1305/ARMv7-NEON
    
    * cipher/poly1305-armv7-neon.S (GET_DATA_POINTER): New.
    (_gcry_poly1305_armv7_neon_init_ext): Use GET_DATA_POINTER.
    --
    
    Reported-by: Michael Plass <mfpnb at plass-family.net>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S
index 1134e85..b1554ed 100644
--- a/cipher/poly1305-armv7-neon.S
+++ b/cipher/poly1305-armv7-neon.S
@@ -33,6 +33,19 @@
 .fpu neon
 .arm
 
+#ifdef __PIC__
+#  define GET_DATA_POINTER(reg, name, rtmp) \
+		ldr reg, 1f; \
+		ldr rtmp, 2f; \
+		b 3f; \
+	1:	.word _GLOBAL_OFFSET_TABLE_-(3f+8); \
+	2:	.word name(GOT); \
+	3:	add reg, pc, reg; \
+		ldr reg, [reg, rtmp];
+#else
+#  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+#endif
+
 .text
 
 .p2align 2
@@ -52,7 +65,7 @@ _gcry_poly1305_armv7_neon_init_ext:
 	and r2, r2, r2
 	moveq r14, #-1
 	ldmia r1!, {r2-r5}
-	ldr r7, =.Lpoly1305_init_constants_neon
+	GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8)
 	mov r6, r2
 	mov r8, r2, lsr #26
 	mov r9, r3, lsr #20

commit 4a983e3bef58b9d056517e25e0ab10b72d12ceba
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Thu Jun 30 21:17:32 2016 +0300

    Fix wrong CPU feature #ifdef for SHA1/AVX
    
    * cipher/sha1-avx-amd64.S: Check for HAVE_GCC_INLINE_ASM_AVX instead of
    HAVE_GCC_INLINE_ASM_AVX2 & HAVE_GCC_INLINE_ASM_BMI2.
    --
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/sha1-avx-amd64.S b/cipher/sha1-avx-amd64.S
index 062a45b..3b3a6d1 100644
--- a/cipher/sha1-avx-amd64.S
+++ b/cipher/sha1-avx-amd64.S
@@ -31,8 +31,7 @@
 
 #if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \
      defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \
-    defined(HAVE_GCC_INLINE_ASM_BMI2) && \
-    defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(USE_SHA1)
+    defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA1)
 
 #ifdef __PIC__
 #  define RIP (%rip)

-----------------------------------------------------------------------

Summary of changes:
 cipher/cipher-ccm.c          |  3 +++
 cipher/cipher.c              | 12 ++++++++++++
 cipher/poly1305-armv7-neon.S | 15 ++++++++++++++-
 cipher/rijndael-arm.S        | 12 ++++--------
 cipher/sha1-avx-amd64.S      |  3 +--
 cipher/sha512-arm.S          |  3 +--
 tests/pubkey.c               |  2 +-
 7 files changed, 36 insertions(+), 14 deletions(-)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From jussi.kivilinna at iki.fi  Sun Jul  3 17:42:07 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Sun, 03 Jul 2016 18:42:07 +0300
Subject: [PATCH 1/3] bench-slope: add unaligned buffer mode
Message-ID: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6>

* tests/bench-slope.c (unaligned_mode): New.
(do_slope_benchmark): Unalign buffer if in unaligned mode enabled.
(print_help, main): Add '--unaligned' parameter.
--

Patch adds --unaligned parameter to allow measurement of unaligned
buffer overhead.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/tests/bench-slope.c b/tests/bench-slope.c
index d97494c..cdd0fa6 100644
--- a/tests/bench-slope.c
+++ b/tests/bench-slope.c
@@ -42,6 +42,7 @@
 
 static int verbose;
 static int csv_mode;
+static int unaligned_mode;
 static int num_measurement_repetitions;
 
 /* CPU Ghz value provided by user, allows constructing cycles/byte and other
@@ -411,12 +412,14 @@ do_slope_benchmark (struct bench_obj *obj)
       obj->max_bufsize < 1 || obj->min_bufsize > obj->max_bufsize)
     goto err_free;
 
-  real_buffer = malloc (obj->max_bufsize + 128);
+  real_buffer = malloc (obj->max_bufsize + 128 + unaligned_mode);
   if (!real_buffer)
     goto err_free;
   /* Get aligned buffer */
   buffer = real_buffer;
   buffer += 128 - ((real_buffer - (unsigned char *) 0) & (128 - 1));
+  if (unaligned_mode)
+    buffer += unaligned_mode; /* Make buffer unaligned */
 
   for (i = 0; i < obj->max_bufsize; i++)
     buffer[i] = 0x55 ^ (-i);
@@ -1748,6 +1751,7 @@ print_help (void)
     "                             for benchmarking.",
     "   --repetitions <n>         Use N repetitions (default "
                                      STR2(NUM_MEASUREMENT_REPETITIONS) ")",
+    "   --unaligned               Use unaligned input buffers.",
     "   --csv                     Use CSV output format",
     NULL
   };
@@ -1832,6 +1836,12 @@ main (int argc, char **argv)
 	  argc--;
 	  argv++;
 	}
+      else if (!strcmp (*argv, "--unaligned"))
+	{
+	  unaligned_mode = 1;
+	  argc--;
+	  argv++;
+	}
       else if (!strcmp (*argv, "--disable-hwf"))
 	{
 	  argc--;


From jussi.kivilinna at iki.fi  Sun Jul  3 17:42:17 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Sun, 03 Jul 2016 18:42:17 +0300
Subject: [PATCH 3/3] Add ARMv8/Aarch32 Crypto Extension implementation of SHA-1
In-Reply-To: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6>
References: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6>
Message-ID: <146756053731.707.11731047954620193783.stgit@localhost6.localdomain6>

* cipher/Makefile.am: Add 'sha1-armv8-aarch32-ce.S'.
* cipher/sha1-armv7-neon.S (_gcry_sha1_transform_armv7_neon): Add
missing size.
* cipher/sha1-armv8-aarch32-ce.S: New.
* cipher/sha1.c (USE_ARM_CE): New.
(sha1_init): Check features for HWF_ARM_SHA1.
[USE_ARM_CE] (_gcry_sha1_transform_armv8_ce): New.
(transform) [USE_ARM_CE]: Use ARMv8 CE implementation if HW supports
it.
* cipher/sha1.h (SHA1_CONTEXT): Add 'use_arm_ce'.
* configure.ac: Add 'sha1-armv8-aarch32-ce.lo'.
--

Benchmark on Cortex-A53 (1152 Mhz):

Before (SHA-1 NEON):

                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA1           |      6.62 ns/B     144.2 MiB/s      7.62 c/B

After (~3.7x faster):

                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA1           |      1.78 ns/B     535.5 MiB/s      2.05 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index f60338a..571673e 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
@@ -87,7 +87,7 @@ scrypt.c \
 seed.c \
 serpent.c serpent-sse2-amd64.S serpent-avx2-amd64.S serpent-armv7-neon.S \
 sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \
-  sha1-armv7-neon.S \
+  sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \
 sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \
 sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \
   sha512-armv7-neon.S sha512-arm.S \
diff --git a/cipher/sha1-armv7-neon.S b/cipher/sha1-armv7-neon.S
index f314d8e..61cc541 100644
--- a/cipher/sha1-armv7-neon.S
+++ b/cipher/sha1-armv7-neon.S
@@ -521,5 +521,6 @@ _gcry_sha1_transform_armv7_neon:
 .Ldo_nothing:
   mov r0, #0;
   bx lr
+.size _gcry_sha1_transform_armv7_neon,.-_gcry_sha1_transform_armv7_neon;
 
 #endif
diff --git a/cipher/sha1-armv8-aarch32-ce.S b/cipher/sha1-armv8-aarch32-ce.S
new file mode 100644
index 0000000..055d893
--- /dev/null
+++ b/cipher/sha1-armv8-aarch32-ce.S
@@ -0,0 +1,219 @@
+/* sha1-armv8-aarch32-ce.S - ARM/CE accelerated SHA-1 transform function
+ * Copyright (C) 2016 Jussi Kivilinna <jussi.kivilinna at iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA1)
+
+.syntax unified
+.fpu crypto-neon-fp-armv8
+.arm
+
+.text
+
+#ifdef __PIC__
+#  define GET_DATA_POINTER(reg, name, rtmp) \
+		ldr reg, 1f; \
+		ldr rtmp, 2f; \
+		b 3f; \
+	1:	.word _GLOBAL_OFFSET_TABLE_-(3f+8); \
+	2:	.word name(GOT); \
+	3:	add reg, pc, reg; \
+		ldr reg, [reg, rtmp];
+#else
+#  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+#endif
+
+
+/* Constants */
+
+#define K1  0x5A827999
+#define K2  0x6ED9EBA1
+#define K3  0x8F1BBCDC
+#define K4  0xCA62C1D6
+.align 4
+gcry_sha1_aarch32_ce_K_VEC:
+.LK_VEC:
+.LK1:	.long K1, K1, K1, K1
+.LK2:	.long K2, K2, K2, K2
+.LK3:	.long K3, K3, K3, K3
+.LK4:	.long K4, K4, K4, K4
+
+
+/* Register macros */
+
+#define qH4    q0
+#define sH4    s0
+#define qH0123 q1
+
+#define qABCD q2
+#define qE0   q3
+#define qE1   q4
+
+#define qT0   q5
+#define qT1   q6
+
+#define qW0 q8
+#define qW1 q9
+#define qW2 q10
+#define qW3 q11
+
+#define qK1 q12
+#define qK2 q13
+#define qK3 q14
+#define qK4 q15
+
+
+/* Round macros */
+
+#define _(...) /*_*/
+#define do_add(dst, src0, src1) vadd.u32 dst, src0, src1;
+#define do_sha1su0(w0,w1,w2) sha1su0.32 w0,w1,w2;
+#define do_sha1su1(w0,w3) sha1su1.32 w0,w3;
+
+#define do_rounds(f, e0, e1, t, k, w0, w1, w2, w3, add_fn, sha1su0_fn, sha1su1_fn) \
+        sha1h.32    e0, qABCD; \
+        sha1##f.32  qABCD, e1, t; \
+        add_fn(     t, w2, k   ); \
+        sha1su1_fn( w3, w2     ); \
+        sha1su0_fn( w0, w1, w2 );
+
+
+/* Other functional macros */
+
+#define CLEAR_REG(reg) veor reg, reg;
+
+
+/*
+ * unsigned int
+ * _gcry_sha1_transform_armv8_ce (void *ctx, const unsigned char *data,
+ *                                size_t nblks)
+ */
+.align 3
+.globl _gcry_sha1_transform_armv8_ce
+.type  _gcry_sha1_transform_armv8_ce,%function;
+_gcry_sha1_transform_armv8_ce:
+  /* input:
+   *	r0: ctx, CTX
+   *	r1: data (64*nblks bytes)
+   *	r2: nblks
+   */
+
+  cmp r2, #0;
+  push {r4,lr};
+  beq .Ldo_nothing;
+
+  vpush {q4-q7};
+
+  GET_DATA_POINTER(r4, .LK_VEC, lr);
+
+  veor qH4, qH4
+  vld1.32 {qH0123}, [r0]    /* load h0,h1,h2,h3 */
+
+  vld1.32 {qK1-qK2}, [r4]!  /* load K1,K2 */
+  vldr sH4, [r0, #16]       /* load h4 */
+  vld1.32 {qK3-qK4}, [r4]   /* load K3,K4 */
+
+  vld1.8 {qW0-qW1}, [r1]!
+  vmov qABCD, qH0123
+  vld1.8 {qW2-qW3}, [r1]!
+
+  vrev32.8 qW0, qW0
+  vrev32.8 qW1, qW1
+  vrev32.8 qW2, qW2
+  do_add(qT0, qW0, qK1)
+  vrev32.8 qW3, qW3
+  do_add(qT1, qW1, qK1)
+
+.Loop:
+  do_rounds(c, qE1, qH4, qT0, qK1, qW0, qW1, qW2, qW3, do_add, do_sha1su0, _)
+  do_rounds(c, qE0, qE1, qT1, qK1, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(c, qE1, qE0, qT0, qK1, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(c, qE0, qE1, qT1, qK2, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(c, qE1, qE0, qT0, qK2, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1)
+
+  do_rounds(p, qE0, qE1, qT1, qK2, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(p, qE1, qE0, qT0, qK2, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(p, qE0, qE1, qT1, qK2, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(p, qE1, qE0, qT0, qK3, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(p, qE0, qE1, qT1, qK3, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1)
+
+  subs r2, r2, #1
+  do_rounds(m, qE1, qE0, qT0, qK3, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(m, qE0, qE1, qT1, qK3, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(m, qE1, qE0, qT0, qK3, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(m, qE0, qE1, qT1, qK4, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1)
+  do_rounds(m, qE1, qE0, qT0, qK4, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1)
+
+  do_rounds(p, qE0, qE1, qT1, qK4, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1)
+  beq .Lend
+
+  do_rounds(p, qE1, qE0, qT0, qK4, _  , _  , qW2, qW3, do_add, _, do_sha1su1)
+  vld1.8 {qW0-qW1}, [r1]! /* preload */
+  vld1.8 {qW2}, [r1]!
+  do_rounds(p, qE0, qE1, qT1, qK4, _  , _  , qW3, _  , do_add, _, _)
+  vrev32.8 qW0, qW0
+  vld1.8 {qW3}, [r1]!
+  vrev32.8 qW1, qW1
+  vrev32.8 qW2, qW2
+  do_rounds(p, qE1, qE0, qT0, _, _, _, _, _, _, _, _)
+  vrev32.8 qW3, qW3
+  do_rounds(p, qE0, qE1, qT1, _, _, _, _, _, _, _, _)
+
+  do_add(qT0, qW0, qK1)
+  vadd.u32 qH4, qE0
+  vadd.u32 qABCD, qH0123
+  do_add(qT1, qW1, qK1)
+
+  vmov qH0123, qABCD
+
+  b .Loop
+
+.Lend:
+  do_rounds(p, qE1, qE0, qT0, qK4, _  , _  , qW2, qW3, do_add, _, do_sha1su1)
+  do_rounds(p, qE0, qE1, qT1, qK4, _  , _  , qW3, _  , do_add, _, _)
+  do_rounds(p, qE1, qE0, qT0, _, _, _, _, _, _, _, _)
+  do_rounds(p, qE0, qE1, qT1, _, _, _, _, _, _, _, _)
+
+  vadd.u32 qH4, qE0
+  vadd.u32 qH0123, qABCD
+
+  CLEAR_REG(qW0)
+  CLEAR_REG(qW1)
+  CLEAR_REG(qW2)
+  CLEAR_REG(qW3)
+  CLEAR_REG(qABCD)
+  CLEAR_REG(qE1)
+  CLEAR_REG(qE0)
+
+  vstr sH4, [r0, #16]    /* store h4 */
+  vst1.32 {qH0123}, [r0] /* store h0,h1,h2,h3 */
+
+  CLEAR_REG(qH0123)
+  CLEAR_REG(qH4)
+  vpop {q4-q7}
+
+.Ldo_nothing:
+  mov r0, #0
+  pop {r4,pc}
+.size _gcry_sha1_transform_armv8_ce,.-_gcry_sha1_transform_armv8_ce;
+
+#endif
diff --git a/cipher/sha1.c b/cipher/sha1.c
index d15c2a2..e0b68b2 100644
--- a/cipher/sha1.c
+++ b/cipher/sha1.c
@@ -76,8 +76,18 @@
      && defined(HAVE_GCC_INLINE_ASM_NEON)
 #  define USE_NEON 1
 # endif
-#endif /*ENABLE_NEON_SUPPORT*/
+#endif
 
+/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly
+ * code. */
+#undef USE_ARM_CE
+#ifdef ENABLE_ARM_CRYPTO_SUPPORT
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+#  define USE_ARM_CE 1
+# endif
+#endif
 
 /* A macro to test whether P is properly aligned for an u32 type.
    Note that config.h provides a suitable replacement for uintptr_t if
@@ -127,6 +137,9 @@ sha1_init (void *context, unsigned int flags)
 #ifdef USE_NEON
   hd->use_neon = (features & HWF_ARM_NEON) != 0;
 #endif
+#ifdef USE_ARM_CE
+  hd->use_arm_ce = (features & HWF_ARM_SHA1) != 0;
+#endif
   (void)features;
 }
 
@@ -164,13 +177,18 @@ _gcry_sha1_mixblock_init (SHA1_CONTEXT *hd)
 			       } while(0)
 
 
-
 #ifdef USE_NEON
 unsigned int
 _gcry_sha1_transform_armv7_neon (void *state, const unsigned char *data,
                                  size_t nblks);
 #endif
 
+#ifdef USE_ARM_CE
+unsigned int
+_gcry_sha1_transform_armv8_ce (void *state, const unsigned char *data,
+                               size_t nblks);
+#endif
+
 /*
  * Transform NBLOCKS of each 64 bytes (16 32-bit words) at DATA.
  */
@@ -340,6 +358,10 @@ transform (void *ctx, const unsigned char *data, size_t nblks)
     return _gcry_sha1_transform_amd64_ssse3 (&hd->h0, data, nblks)
            + 4 * sizeof(void*) + ASM_EXTRA_STACK;
 #endif
+#ifdef USE_ARM_CE
+  if (hd->use_arm_ce)
+    return _gcry_sha1_transform_armv8_ce (&hd->h0, data, nblks);
+#endif
 #ifdef USE_NEON
   if (hd->use_neon)
     return _gcry_sha1_transform_armv7_neon (&hd->h0, data, nblks)
diff --git a/cipher/sha1.h b/cipher/sha1.h
index 6b87631..d448fca 100644
--- a/cipher/sha1.h
+++ b/cipher/sha1.h
@@ -30,6 +30,7 @@ typedef struct
   unsigned int use_avx:1;
   unsigned int use_bmi2:1;
   unsigned int use_neon:1;
+  unsigned int use_arm_ce:1;
 } SHA1_CONTEXT;
 
 
diff --git a/configure.ac b/configure.ac
index 2001596..073b7e9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2335,6 +2335,7 @@ case "${host}" in
   arm*-*-*)
     # Build with the assembly implementation
     GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha1-armv7-neon.lo"
+    GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha1-armv8-aarch32-ce.lo"
   ;;
 esac
 

From jussi.kivilinna at iki.fi  Sun Jul  3 17:42:12 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Sun, 03 Jul 2016 18:42:12 +0300
Subject: [PATCH 2/3] Add HW feature check for ARMv8 Aarch64 and crypto
 extensions
In-Reply-To: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6>
References: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6>
Message-ID: <146756053229.707.15669215069250836807.stgit@localhost6.localdomain6>

* configure.ac: Add '--disable-arm-crypto-support'; enable hwf-arm
module on 64-bit ARM.
(armcryptosupport, gcry_cv_gcc_inline_aarch32_crypto)
(gcry_cv_inline_asm_aarch64_neon)
(gcry_cv_gcc_inline_asm_aarch64_crypto): New.
* src/g10lib.h (HWF_ARM_AES, HWF_ARM_SHA1, HWF_ARM_SHA2)
(HWF_ARM_PMULL): New.
* src/hwf-arm.c [__aarch64__]: Enable building in Aarch64 mode.
(feature_map_s): New.
[__arm__] (AT_HWCAP, AT_HWCAP2, HWCAP2_AES, HWCAP2_PMULL)
(HWCAP2_SHA1, HWCAP2_SHA2, arm_features): New.
[__aarch64__] (AT_HWCAP, AT_HWCAP2, HWCAP_ASIMD, HWCAP_AES)
(HWCAP_PMULL, HWCAP_SHA1, HWCAP_SHA2, arm_features): New.
(get_hwcap): Add reading of 'AT_HWCAP2'; Change auxv use
'unsigned long'.
(detect_arm_at_hwcap): Add mapping of HWCAP/HWCAP2 to HWF flags.
(detect_arm_proc_cpuinfo): Add mapping of CPU features to HWF flags.
(_gcry_hwf_detect_arm): Use __ARM_NEON instead of legacy __ARM_NEON__.
* src/hwfeatures.c (hwflist): Add 'arm-aes', 'arm-sha1', 'arm-sha2'
and 'arm-pmull'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/configure.ac b/configure.ac
index 80e64fa..2001596 100644
--- a/configure.ac
+++ b/configure.ac
@@ -637,6 +637,14 @@ AC_ARG_ENABLE(neon-support,
 	      neonsupport=$enableval,neonsupport=yes)
 AC_MSG_RESULT($neonsupport)
 
+# Implementation of the --disable-arm-crypto-support switch.
+AC_MSG_CHECKING([whether ARMv8 Crypto Extension support is requested])
+AC_ARG_ENABLE(arm-crypto-support,
+              AC_HELP_STRING([--disable-arm-crypto-support],
+                 [Disable support for the ARMv8 Crypto Extension instructions]),
+	      armcryptosupport=$enableval,armcryptosupport=yes)
+AC_MSG_RESULT($armcryptosupport)
+
 # Implementation of the --disable-O-flag-munging switch.
 AC_MSG_CHECKING([whether a -O flag munging is requested])
 AC_ARG_ENABLE([O-flag-munging],
@@ -1125,7 +1133,10 @@ if test "$mpi_cpu_arch" != "x86" ; then
 fi
 
 if test "$mpi_cpu_arch" != "arm" ; then
-   neonsupport="n/a"
+   if test "$mpi_cpu_arch" != "aarch64" ; then
+     neonsupport="n/a"
+     armcryptosupport="n/a"
+   fi
 fi
 
 
@@ -1532,6 +1543,116 @@ if test "$gcry_cv_gcc_inline_asm_neon" = "yes" ; then
 fi
 
 
+#
+# Check whether GCC inline assembler supports Aarch32 Crypto Extension instructions
+#
+AC_CACHE_CHECK([whether GCC inline assembler supports Aarch32 Crypto Extension instructions],
+       [gcry_cv_gcc_inline_asm_aarch32_crypto],
+       [if test "$mpi_cpu_arch" != "arm" ; then
+          gcry_cv_gcc_inline_asm_aarch32_crypto="n/a"
+        else
+          gcry_cv_gcc_inline_asm_aarch32_crypto=no
+          AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+          [[__asm__(
+                ".syntax unified\n\t"
+                ".arm\n\t"
+                ".fpu crypto-neon-fp-armv8\n\t"
+
+                "sha1h.32 q0, q0;\n\t"
+                "sha1c.32 q0, q0, q0;\n\t"
+                "sha1p.32 q0, q0, q0;\n\t"
+                "sha1su0.32 q0, q0, q0;\n\t"
+                "sha1su1.32 q0, q0;\n\t"
+
+                "sha256h.32 q0, q0, q0;\n\t"
+                "sha256h2.32 q0, q0, q0;\n\t"
+                "sha1p.32 q0, q0, q0;\n\t"
+                "sha256su0.32 q0, q0;\n\t"
+                "sha256su1.32 q0, q0, q15;\n\t"
+
+                "aese.8 q0, q0;\n\t"
+                "aesd.8 q0, q0;\n\t"
+                "aesmc.8 q0, q0;\n\t"
+                "aesimc.8 q0, q0;\n\t"
+
+                "vmull.p64 q0, d0, d0;\n\t"
+                );
+            ]])],
+          [gcry_cv_gcc_inline_asm_aarch32_crypto=yes])
+        fi])
+if test "$gcry_cv_gcc_inline_asm_aarch32_crypto" = "yes" ; then
+   AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO,1,
+     [Defined if inline assembler supports Aarch32 Crypto Extension instructions])
+fi
+
+
+#
+# Check whether GCC inline assembler supports Aarch64 NEON instructions
+#
+AC_CACHE_CHECK([whether GCC inline assembler supports Aarch64 NEON instructions],
+       [gcry_cv_gcc_inline_asm_aarch64_neon],
+       [if test "$mpi_cpu_arch" != "aarch64" ; then
+          gcry_cv_gcc_inline_asm_aarch64_neon="n/a"
+        else
+          gcry_cv_gcc_inline_asm_aarch64_neon=no
+          AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+          [[__asm__(
+                ".arch armv8-a\n\t"
+                "mov w0, \#42;\n\t"
+                "dup v0.8b, w0;\n\t"
+                "ld4 {v0.8b,v1.8b,v2.8b,v3.8b},[x0],\#32;\n\t"
+                );
+            ]])],
+          [gcry_cv_gcc_inline_asm_aarch64_neon=yes])
+        fi])
+if test "$gcry_cv_gcc_inline_asm_aarch64_neon" = "yes" ; then
+   AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH64_NEON,1,
+     [Defined if inline assembler supports Aarch64 NEON instructions])
+fi
+
+
+#
+# Check whether GCC inline assembler supports Aarch64 Crypto Extension instructions
+#
+AC_CACHE_CHECK([whether GCC inline assembler supports Aarch64 Crypto Extension instructions],
+       [gcry_cv_gcc_inline_asm_aarch64_crypto],
+       [if test "$mpi_cpu_arch" != "aarch64" ; then
+          gcry_cv_gcc_inline_asm_aarch64_crypto="n/a"
+        else
+          gcry_cv_gcc_inline_asm_aarch64_crypto=no
+          AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+          [[__asm__(
+                ".arch armv8-a+crypto\n\t"
+
+                "sha1h s0, s0;\n\t"
+                "sha1c q0, s0, v0.4s;\n\t"
+                "sha1p q0, s0, v0.4s;\n\t"
+                "sha1su0 v0.4s, v0.4s, v0.4s;\n\t"
+                "sha1su1 v0.4s, v0.4s;\n\t"
+
+                "sha256h q0, q0, v0.4s;\n\t"
+                "sha256h2 q0, q0, v0.4s;\n\t"
+                "sha1p q0, s0, v0.4s;\n\t"
+                "sha256su0 v0.4s, v0.4s;\n\t"
+                "sha256su1 v0.4s, v0.4s, v31.4s;\n\t"
+
+                "aese v0.16b, v0.16b;\n\t"
+                "aesd v0.16b, v0.16b;\n\t"
+                "aesmc v0.16b, v0.16b;\n\t"
+                "aesimc v0.16b, v0.16b;\n\t"
+
+                "pmull v0.1q, v0.1d, v31.1d;\n\t"
+                "pmull2 v0.1q, v0.2d, v31.2d;\n\t"
+                );
+            ]])],
+          [gcry_cv_gcc_inline_asm_aarch64_crypto=yes])
+        fi])
+if test "$gcry_cv_gcc_inline_asm_aarch64_crypto" = "yes" ; then
+   AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH64_CRYPTO,1,
+     [Defined if inline assembler supports Aarch64 Crypto Extension instructions])
+fi
+
+
 #######################################
 #### Checks for library functions. ####
 #######################################
@@ -1758,7 +1879,16 @@ if test x"$avx2support" = xyes ; then
 fi
 if test x"$neonsupport" = xyes ; then
   if test "$gcry_cv_gcc_inline_asm_neon" != "yes" ; then
-    neonsupport="no (unsupported by compiler)"
+    if test "$gcry_cv_gcc_inline_asm_aarch64_neon" != "yes" ; then
+      neonsupport="no (unsupported by compiler)"
+    fi
+  fi
+fi
+if test x"$armcryptosupport" = xyes ; then
+  if test "$gcry_cv_gcc_inline_asm_aarch32_crypto" != "yes" ; then
+    if test "$gcry_cv_gcc_inline_asm_aarch64_crypto" != "yes" ; then
+      neonsupport="no (unsupported by compiler)"
+    fi
   fi
 fi
 
@@ -1786,6 +1916,10 @@ if test x"$neonsupport" = xyes ; then
   AC_DEFINE(ENABLE_NEON_SUPPORT,1,
             [Enable support for ARM NEON instructions.])
 fi
+if test x"$armcryptosupport" = xyes ; then
+  AC_DEFINE(ENABLE_ARM_CRYPTO_SUPPORT,1,
+            [Enable support for ARMv8 Crypto Extension instructions.])
+fi
 if test x"$padlocksupport" = xyes ; then
   AC_DEFINE(ENABLE_PADLOCK_SUPPORT, 1,
             [Enable support for the PadLock engine.])
@@ -2299,6 +2433,10 @@ case "$mpi_cpu_arch" in
         AC_DEFINE(HAVE_CPU_ARCH_ARM, 1,   [Defined for ARM platforms])
         GCRYPT_HWF_MODULES="hwf-arm.lo"
         ;;
+     aarch64)
+        AC_DEFINE(HAVE_CPU_ARCH_ARM, 1,   [Defined for ARM Aarch64 platforms])
+        GCRYPT_HWF_MODULES="hwf-arm.lo"
+        ;;
 esac
 AC_SUBST([GCRYPT_HWF_MODULES])
 
@@ -2384,6 +2522,7 @@ GCRY_MSG_SHOW([Try using DRNG (RDRAND):  ],[$drngsupport])
 GCRY_MSG_SHOW([Try using Intel AVX:      ],[$avxsupport])
 GCRY_MSG_SHOW([Try using Intel AVX2:     ],[$avx2support])
 GCRY_MSG_SHOW([Try using ARM NEON:       ],[$neonsupport])
+GCRY_MSG_SHOW([Try using ARMv8 crypto:   ],[$armcryptosupport])
 GCRY_MSG_SHOW([],[])
 
 if test "x${gpg_config_script_warn}" != x; then
diff --git a/src/g10lib.h b/src/g10lib.h
index 170ffa1..444c868 100644
--- a/src/g10lib.h
+++ b/src/g10lib.h
@@ -211,6 +211,10 @@ char **_gcry_strtokenize (const char *string, const char *delim);
 #define HWF_INTEL_AVX2      (1 << 13)
 
 #define HWF_ARM_NEON        (1 << 14)
+#define HWF_ARM_AES         (1 << 15)
+#define HWF_ARM_SHA1        (1 << 16)
+#define HWF_ARM_SHA2        (1 << 17)
+#define HWF_ARM_PMULL       (1 << 18)
 
 
 gpg_err_code_t _gcry_disable_hw_feature (const char *name);
diff --git a/src/hwf-arm.c b/src/hwf-arm.c
index 3dc050e..6f0bb95 100644
--- a/src/hwf-arm.c
+++ b/src/hwf-arm.c
@@ -27,7 +27,7 @@
 #include "g10lib.h"
 #include "hwf-common.h"
 
-#if !defined (__arm__)
+#if !defined (__arm__) && !defined (__aarch64__)
 # error Module build for wrong CPU.
 #endif
 
@@ -35,23 +35,81 @@
 #undef HAS_PROC_CPUINFO
 #ifdef __linux__
 
+struct feature_map_s {
+  unsigned int hwcap_flag;
+  unsigned int hwcap2_flag;
+  const char *feature_match;
+  unsigned int hwf_flag;
+};
+
 #define HAS_SYS_AT_HWCAP 1
+#define HAS_PROC_CPUINFO 1
+
+#ifdef __arm__
+
+#define AT_HWCAP      16
+#define AT_HWCAP2     26
+
+#define HWCAP_NEON    4096
 
-#define AT_HWCAP 16
-#define HWCAP_NEON 4096
+#define HWCAP2_AES    1
+#define HWCAP2_PMULL  2
+#define HWCAP2_SHA1   3
+#define HWCAP2_SHA2   4
+
+static const struct feature_map_s arm_features[] =
+  {
+#ifdef ENABLE_NEON_SUPPORT
+    { HWCAP_NEON, 0, " neon", HWF_ARM_NEON },
+#endif
+#ifdef ENABLE_ARM_CRYPTO_SUPPORT
+    { 0, HWCAP2_AES, " aes", HWF_ARM_AES },
+    { 0, HWCAP2_SHA1," sha1", HWF_ARM_SHA1 },
+    { 0, HWCAP2_SHA2, " sha2", HWF_ARM_SHA2 },
+    { 0, HWCAP2_PMULL, " pmull", HWF_ARM_PMULL },
+#endif
+  };
+
+#elif defined(__aarch64__)
+
+#define AT_HWCAP    16
+#define AT_HWCAP2   -1
+
+#define HWCAP_ASIMD 2
+#define HWCAP_AES   8
+#define HWCAP_PMULL 16
+#define HWCAP_SHA1  32
+#define HWCAP_SHA2  64
+
+static const struct feature_map_s arm_features[] =
+  {
+#ifdef ENABLE_NEON_SUPPORT
+    { HWCAP_ASIMD, 0, " asimd", HWF_ARM_NEON },
+#endif
+#ifdef ENABLE_ARM_CRYPTO_SUPPORT
+    { HWCAP_AES, 0, " aes", HWF_ARM_AES },
+    { HWCAP_SHA1, 0, " sha1", HWF_ARM_SHA1 },
+    { HWCAP_SHA2, 0, " sha2", HWF_ARM_SHA2 },
+    { HWCAP_PMULL, 0, " pmull", HWF_ARM_PMULL },
+#endif
+  };
+
+#endif
 
 static int
-get_hwcap(unsigned int *hwcap)
+get_hwcap(unsigned int *hwcap, unsigned int *hwcap2)
 {
-  struct { unsigned int a_type; unsigned int a_val; } auxv;
+  struct { unsigned long a_type; unsigned long a_val; } auxv;
   FILE *f;
   int err = -1;
   static int hwcap_initialized = 0;
-  static unsigned int stored_hwcap;
+  static unsigned int stored_hwcap = 0;
+  static unsigned int stored_hwcap2 = 0;
 
   if (hwcap_initialized)
     {
       *hwcap = stored_hwcap;
+      *hwcap2 = stored_hwcap2;
       return 0;
     }
 
@@ -59,22 +117,31 @@ get_hwcap(unsigned int *hwcap)
   if (!f)
     {
       *hwcap = stored_hwcap;
+      *hwcap2 = stored_hwcap2;
       return -1;
     }
 
   while (fread(&auxv, sizeof(auxv), 1, f) > 0)
     {
-      if (auxv.a_type != AT_HWCAP)
-        continue;
-
-      stored_hwcap = auxv.a_val;
-      hwcap_initialized = 1;
-      err = 0;
-      break;
+      if (auxv.a_type == AT_HWCAP)
+        {
+          stored_hwcap = auxv.a_val;
+          hwcap_initialized = 1;
+        }
+
+      if (auxv.a_type == AT_HWCAP2)
+        {
+          stored_hwcap2 = auxv.a_val;
+          hwcap_initialized = 1;
+        }
     }
 
+  if (hwcap_initialized)
+    err = 0;
+
   fclose(f);
   *hwcap = stored_hwcap;
+  *hwcap2 = stored_hwcap2;
   return err;
 }
 
@@ -82,29 +149,34 @@ static unsigned int
 detect_arm_at_hwcap(void)
 {
   unsigned int hwcap;
+  unsigned int hwcap2;
   unsigned int features = 0;
+  unsigned int i;
 
-  if (get_hwcap(&hwcap) < 0)
+  if (get_hwcap(&hwcap, &hwcap2) < 0)
     return features;
 
-#ifdef ENABLE_NEON_SUPPORT
-  if (hwcap & HWCAP_NEON)
-    features |= HWF_ARM_NEON;
-#endif
+  for (i = 0; i < DIM(arm_features); i++)
+    {
+      if (hwcap & arm_features[i].hwcap_flag)
+        features |= arm_features[i].hwf_flag;
+
+      if (hwcap2 & arm_features[i].hwcap2_flag)
+        features |= arm_features[i].hwf_flag;
+    }
 
   return features;
 }
 
-#define HAS_PROC_CPUINFO 1
-
 static unsigned int
 detect_arm_proc_cpuinfo(unsigned int *broken_hwfs)
 {
   char buf[1024]; /* large enough */
-  char *str_features, *str_neon;
+  char *str_features, *str_feat;
   int cpu_implementer, cpu_arch, cpu_variant, cpu_part, cpu_revision;
   FILE *f;
   int readlen, i;
+  size_t mlen;
   static int cpuinfo_initialized = 0;
   static unsigned int stored_cpuinfo_features;
   static unsigned int stored_broken_hwfs;
@@ -162,7 +234,11 @@ detect_arm_proc_cpuinfo(unsigned int *broken_hwfs)
         continue;
 
       str += 2;
-      *cpu_entries[i].value = strtoul(str, NULL, 0);
+      if (strcmp(cpu_entries[i].name, "CPU architecture") == 0
+          && strcmp(str, "Aarch64") == 0)
+        *cpu_entries[i].value = 8;
+      else
+        *cpu_entries[i].value = strtoul(str, NULL, 0);
     }
 
   /* Lines to strings. */
@@ -170,10 +246,19 @@ detect_arm_proc_cpuinfo(unsigned int *broken_hwfs)
     if (buf[i] == '\n')
       buf[i] = '\0';
 
-  /* Check for NEON. */
-  str_neon = strstr(str_features, " neon");
-  if (str_neon && (str_neon[5] == ' ' || str_neon[5] == '\0'))
-    stored_cpuinfo_features |= HWF_ARM_NEON;
+  /* Check features. */
+  for (i = 0; i < DIM(arm_features); i++)
+    {
+      str_feat = strstr(str_features, arm_features[i].feature_match);
+      if (str_feat)
+        {
+          mlen = strlen(arm_features[i].feature_match);
+          if (str_feat[mlen] == ' ' || str_feat[mlen] == '\0')
+            {
+              stored_cpuinfo_features |= arm_features[i].hwf_flag;
+            }
+        }
+    }
 
   /* Check for CPUs with broken NEON implementation. See
    * https://code.google.com/p/chromium/issues/detail?id=341598
@@ -207,7 +292,7 @@ _gcry_hwf_detect_arm (void)
   ret |= detect_arm_proc_cpuinfo (&broken_hwfs);
 #endif
 
-#if defined(__ARM_NEON__) && defined(ENABLE_NEON_SUPPORT)
+#if defined(__ARM_NEON) && defined(ENABLE_NEON_SUPPORT)
   ret |= HWF_ARM_NEON;
 #endif
 
diff --git a/src/hwfeatures.c b/src/hwfeatures.c
index 4cafae1..07221e8 100644
--- a/src/hwfeatures.c
+++ b/src/hwfeatures.c
@@ -56,7 +56,11 @@ static struct
     { HWF_INTEL_RDRAND,    "intel-rdrand" },
     { HWF_INTEL_AVX,       "intel-avx" },
     { HWF_INTEL_AVX2,      "intel-avx2" },
-    { HWF_ARM_NEON,        "arm-neon" }
+    { HWF_ARM_NEON,        "arm-neon" },
+    { HWF_ARM_AES,         "arm-aes" },
+    { HWF_ARM_SHA1,        "arm-sha1" },
+    { HWF_ARM_SHA2,        "arm-sha2" },
+    { HWF_ARM_PMULL,       "arm-pmull" }
   };
 
 /* A bit vector with the hardware features which shall not be used.


From orthecreedence at gmail.com  Thu Jul  7 21:49:02 2016
From: orthecreedence at gmail.com (Andrew Lyon)
Date: Thu, 7 Jul 2016 12:49:02 -0700
Subject: Errors compiling libgcrypt-1.7.1
Message-ID: <CAMm3BxHGf8m85zL_1XK1yQghRMezr8igx2neMRwTn44wCvwtvQ@mail.gmail.com>

Hello, everyone. I hope this is the correct place to post. I am trying to
compile gcrypt 1.7.1 on my machine (MSYS2, Mingw64).

I doing the following:

tar -xvf libgcrypt-1.7.1.tar.bz2
cd libgcrypt-1.7.1/
mkdir build && cd build/
../configure --prefix=/usr --enable-static --disable-shared

Outputs:

        Libgcrypt v1.7.1 has been configured as follows:

        Platform:                  W32 (x86_64-w64-mingw32)
        Hardware detection module: hwf-x86
        Enabled cipher algorithms: arcfour blowfish cast5 des aes twofish
                                   serpent rfc2268 seed camellia idea
salsa20
                                   gost28147 chacha20
        Enabled digest algorithms: crc gostr3411-94 md4 md5 rmd160 sha1
                                   sha256 sha512 sha3 tiger whirlpool
stribog

        Enabled kdf algorithms:    s2k pkdf2 scrypt
        Enabled pubkey algorithms: dsa elgamal rsa ecc
        Random number generator:   default
        Using linux capabilities:  no
        Try using Padlock crypto:  yes
        Try using AES-NI crypto:   yes
        Try using Intel PCLMUL:    yes
        Try using Intel SSE4.1:    yes
        Try using DRNG (RDRAND):   yes
        Try using Intel AVX:       yes
        Try using Intel AVX2:      yes
        Try using ARM NEON:        n/a

Then running:

make

The build seems to run fine, until I get to the tests, then:

/bin/sh ../libtool  --tag=CC   --mode=link /mingw64/bin/gcc  -I/usr/include
-L/usr/lib -Wall -no-install  -o pubkey.exe pubkey.o ../src/libgcrypt.la
 ../compat/libcompat.la -lgpg-error
libtool: link: warning: `-no-install' is ignored for x86_64-w64-mingw32
libtool: link: warning: assuming `-no-fast-install' instead
libtool: link: /mingw64/bin/gcc -I/usr/include -Wall -o pubkey.exe pubkey.o
 -L/usr/lib ../src/.libs/libgcrypt.a ../compat/.libs/libcompat.a
/usr/lib/libgpg-error.a
../src/.libs/libgcrypt.a(pubkey-util.o):pubkey-util.c:(.text+0x6e0):
multiple definition of `_gcry_pk_util_get_nbits'
pubkey.o:pubkey.c:(.text+0x211): first defined here
collect2.exe: error: ld returned 1 exit status
make[2]: *** [Makefile:671: pubkey.exe] Error 1
make[2]: Leaving directory '/d/src/libgcrypt-1.7.1/build/tests'
make[1]: *** [Makefile:477: all-recursive] Error 1
make[1]: Leaving directory '/d/src/libgcrypt-1.7.1/build'
make: *** [Makefile:408: all] Error 2

Note that I tried running configure with --disable-asm / make
CFLAGS+="-DNO_ASM" as well, no luck (I read somewhere that might help with
build problems on windows).

I tried building v1.7.0 as well, same result, while v1.6.5 compiles just
fine.

Ok, so it failed while building the tests. I figured I'd try running them
to see if it works regardless:

./tests/basic

selftest for CTR failed - see syslog for details
pass 0, algo 7, mode 1, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 2, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 5, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 6, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 8, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 9, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 7, mode 11, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 1, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 2, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 5, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 6, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 8, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 9, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 8, mode 11, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 1, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 2, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 5, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 3, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 6, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 8, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 9, gcry_cipher_setkey failed: Selftest failed
pass 0, algo 9, mode 11, gcry_cipher_setkey failed: Selftest failed
aes-cbc-cts, gcry_cipher_setkey failed: Selftest failed
cbc-mac algo 7, gcry_cipher_setkey failed: Selftest failed
aes-ctr, gcry_cipher_setkey failed: Selftest failed
aes-cfb, gcry_cipher_setkey failed: Selftest failed
aes-ofb, gcry_cipher_setkey failed: Selftest failed
cipher-ccm, gcry_cipher_setkey failed: Selftest failed
aes-gcm, gcry_cipher_setkey failed: Selftest failed
aes-gcm, gcry_cipher_setkey failed: Selftest failed
aes-gcm, gcry_cipher_setkey failed: Selftest failed
aes-gcm, gcry_cipher_setkey failed: Selftest failed
cipher-ocb, gcry_cipher_setkey failed (tv 0): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (tv 0): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed
cipher-ocb-splitaad, gcry_cipher_setkey failed: Selftest failed
gcry_cipher_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_setkey failed: Selftest failed
algo 201, mac gcry_mac_s

I tried using the lib in my app anyway (it builds fine against the static
lib) but I just keep getting "Selftest failed" errors. Once again, v1.6.5
builds fine, and I get no "Selftest error" when using it in my app.

Any ideas? Thanks for the help! Please let me know if I omitted any details.

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/attachments/20160707/57e5b75a/attachment-0001.html>

From jussi.kivilinna at iki.fi  Fri Jul  8 00:28:27 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Fri, 08 Jul 2016 01:28:27 +0300
Subject: [PATCH] Fix unaligned accesses with ldm/stm in ChaCha20 and
 Poly1305 ARM/NEON
Message-ID: <146793050732.767.4788160689520723569.stgit@localhost6.localdomain6>

* cipher/chacha20-armv7-neon.S (UNALIGNED_STMIA8)
(UNALIGNED_LDMIA4): New.
(_gcry_chacha20_armv7_neon_blocks): Use new helper macros instead of
ldm/stm instructions directly.
* cipher/poly1305-armv7-neon.S (UNALIGNED_LDMIA2)
(UNALIGNED_LDMIA4): New.
(_gcry_poly1305_armv7_neon_init_ext, _gcry_poly1305_armv7_neon_blocks)
(_gcry_poly1305_armv7_neon_finish_ext): Use new helper macros instead
of ldm instruction directly.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S
index 1a395ba..4d3340b 100644
--- a/cipher/chacha20-armv7-neon.S
+++ b/cipher/chacha20-armv7-neon.S
@@ -33,6 +33,40 @@
 .fpu neon
 .arm
 
+#define UNALIGNED_STMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0-d3}; \
+        vmov s0, l0; \
+        vmov s1, l1; \
+        vmov s2, l2; \
+        vmov s3, l3; \
+        vmov s4, l4; \
+        vmov s5, l5; \
+        vmov s6, l6; \
+        vmov s7, l7; \
+        vst1.32 {d0-d3}, [ptr]; \
+        add ptr, #32; \
+        vpop {d0-d3}; \
+        b 2f; \
+     1: stmia ptr!, {l0-l7}; \
+     2: ;
+
+#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \
+        tst ptr, #3; \
+        /*beq 1f;*/ \
+        vpush {d0-d1}; \
+        vld1.32 {d0-d1}, [ptr]; \
+        add ptr, #16; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vmov l2, s2; \
+        vmov l3, s3; \
+        vpop {d0-d1}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l3}; \
+     2: ;
+
 .text
 
 .globl _gcry_chacha20_armv7_neon_blocks
@@ -352,7 +386,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r7, r7, r11
 	vadd.i32 q11, q11, q14
 	beq .Lchacha_blocks_neon_nomessage11
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -367,7 +402,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage11:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
+	tst r12, r12
 	ldm sp, {r0-r7}
 	ldr r8, [sp, #(64 +32)]
 	ldr r9, [sp, #(64 +36)]
@@ -391,7 +427,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	str r9, [sp, #(64 +52)]
 	beq .Lchacha_blocks_neon_nomessage12
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -406,7 +443,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage12:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
+	tst r12, r12
 	beq .Lchacha_blocks_neon_nomessage13
 	vld1.32 {q12,q13}, [r12]!
 	vld1.32 {q14,q15}, [r12]!
@@ -613,7 +651,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	add r7, r7, r11
 	beq .Lchacha_blocks_neon_nomessage21
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -628,7 +667,7 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage21:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
 	ldm sp, {r0-r7}
 	ldr r8, [sp, #(64 +32)]
 	ldr r9, [sp, #(64 +36)]
@@ -652,7 +691,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	str r9, [sp, #(64 +52)]
 	beq .Lchacha_blocks_neon_nomessage22
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -667,7 +707,7 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage22:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
 	str r12, [sp, #48]
 	str r14, [sp, #40]
 	ldr r3, [sp, #52]
diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S
index b1554ed..13cb4a5 100644
--- a/cipher/poly1305-armv7-neon.S
+++ b/cipher/poly1305-armv7-neon.S
@@ -46,6 +46,32 @@
 #  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
 #endif
 
+#define UNALIGNED_LDMIA2(ptr, l0, l1) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0}; \
+        vld1.32 {d0}, [ptr]!; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vpop {d0}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l1}; \
+     2: ;
+
+#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0-d1}; \
+        vld1.32 {d0-d1}, [ptr]!; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vmov l2, s2; \
+        vmov l3, s3; \
+        vpop {d0-d1}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l3}; \
+     2: ;
+
 .text
 
 .p2align 2
@@ -64,7 +90,7 @@ _gcry_poly1305_armv7_neon_init_ext:
 	mov r14, r2
 	and r2, r2, r2
 	moveq r14, #-1
-	ldmia r1!, {r2-r5}
+	UNALIGNED_LDMIA4(r1, r2, r3, r4, r5)
 	GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8)
 	mov r6, r2
 	mov r8, r2, lsr #26
@@ -175,7 +201,7 @@ _gcry_poly1305_armv7_neon_init_ext:
 	eor r6, r6, r6
 	stmia r0!, {r2-r6}
 	stmia r0!, {r2-r6}
-	ldmia r1!, {r2-r5}
+	UNALIGNED_LDMIA4(r1, r2, r3, r4, r5)
 	stmia r0, {r2-r6}
 	add sp, sp, #32
 	ldmfd sp!, {r4-r11, lr}
@@ -286,7 +312,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	vmov d14, d12
 	vmul.i32 q6, q5, d0[0]
 .Lpoly1305_blocks_neon_mainloop:
-	ldmia r0!, {r2-r5}
+	UNALIGNED_LDMIA4(r0, r2, r3, r4, r5)
 	vmull.u32 q0, d25, d12[0]
 	mov r7, r2, lsr #26
 	vmlal.u32 q0, d24, d12[1]
@@ -302,7 +328,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	orr r4, r8, r4, lsl #12
 	orr r5, r9, r5, lsl #18
 	vmlal.u32 q1, d24, d13[0]
-	ldmia r0!, {r7-r10}
+	UNALIGNED_LDMIA4(r0, r7, r8, r9, r10)
 	vmlal.u32 q1, d23, d13[1]
 	mov r1, r7, lsr #26
 	vmlal.u32 q1, d22, d14[0]
@@ -344,7 +370,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	vmlal.u32 q4, d21, d11[1]
 	vld1.64 {d21-d24}, [r14, :256]!
 	vld1.64 {d25}, [r14, :64]
-	ldmia r0!, {r2-r5}
+	UNALIGNED_LDMIA4(r0, r2, r3, r4, r5)
 	vmlal.u32 q0, d25, d26
 	mov r7, r2, lsr #26
 	vmlal.u32 q0, d24, d27
@@ -360,7 +386,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	orr r4, r8, r4, lsl #12
 	orr r5, r9, r5, lsl #18
 	vmlal.u32 q1, d24, d28
-	ldmia r0!, {r7-r10}
+	UNALIGNED_LDMIA4(r0, r7, r8, r9, r10)
 	vmlal.u32 q1, d23, d29
 	mov r1, r7, lsr #26
 	vmlal.u32 q1, d22, d20
@@ -643,7 +669,7 @@ _gcry_poly1305_armv7_neon_finish_ext:
 .Lpoly1305_finish_ext_neon_skip16:
 	tst r7, #8
 	beq .Lpoly1305_finish_ext_neon_skip8
-	ldmia r1!, {r10-r11}
+	UNALIGNED_LDMIA2(r1, r10, r11)
 	stmia r9!, {r10-r11}
 .Lpoly1305_finish_ext_neon_skip8:
 	tst r7, #4


From mfpnb at plass-family.net  Fri Jul  1 00:53:41 2016
From: mfpnb at plass-family.net (Michael Plass)
Date: Thu, 30 Jun 2016 15:53:41 -0700
Subject: PIC, alignment problems with libcrypt on armv7
In-Reply-To: <5775916F.8010505@iki.fi>
References: <C459156C-545C-47EF-904A-CFEAB535C244@plass-family.net>
 <5774CDA5.2090001@iki.fi> <5775916F.8010505@iki.fi>
Message-ID: <EF44A487-17A9-4362-9966-A2CF941D1522@plass-family.net>


On Jun 30, 2016, at 2:38 PM, Jussi Kivilinna wrote:

> On 30.06.2016 10:43, Jussi Kivilinna wrote:
> 
> Appears that there is proper tests for unaligned buffers. However those
> tests did not fail for me, since on Linux unaligned ldm/stm exception
> is caught and handled by kernel. 
> 
> -Jussi
> 
> 
> 


I see - that explains why this was not noticed before. A bus error in the basic
tests is what called my attention to this.

Thanks,
- Michael

From mfpnb at plass-family.net  Fri Jul  1 08:17:58 2016
From: mfpnb at plass-family.net (Michael Plass)
Date: Thu, 30 Jun 2016 23:17:58 -0700
Subject: PIC, alignment problems with libcrypt on armv7
In-Reply-To: <5774CDA5.2090001@iki.fi>
References: <C459156C-545C-47EF-904A-CFEAB535C244@plass-family.net>
 <5774CDA5.2090001@iki.fi>
Message-ID: <E9214AAC-27BD-463D-B3CB-02A3A26DB8A6@plass-family.net>

On Jun 30, 2016, at 12:43 AM, Jussi Kivilinna wrote:

> I wonder if there is automated way to check resulting library for such
> non-PIC references. If there is, such check could be incorporated to
> build process and abort build if found.

Nick Hudson on the netbsd lists suggests this:
> 
> There is the ld(1) option --warn-shared-textrel and this with --fatal-warning should
> do the trick.
> 


From cvs at cvs.gnupg.org  Fri Jul  8 12:31:13 2016
From: cvs at cvs.gnupg.org (by Jussi Kivilinna)
Date: Fri, 08 Jul 2016 12:31:13 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-17-g1111d31
Message-ID: <E1bLT18-0000bN-TY@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  1111d311fd6452abd4080d1072c75ddb1b5a3dd1 (commit)
       via  496790940753226f96b731a43d950bd268acd97a (commit)
      from  cb79630ec567a5f2e03e5f863cda168faa7b8cc8 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 1111d311fd6452abd4080d1072c75ddb1b5a3dd1
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Fri Jul 8 01:22:58 2016 +0300

    Fix unaligned accesses with ldm/stm in ChaCha20 and Poly1305 ARM/NEON
    
    * cipher/chacha20-armv7-neon.S (UNALIGNED_STMIA8)
    (UNALIGNED_LDMIA4): New.
    (_gcry_chacha20_armv7_neon_blocks): Use new helper macros instead of
    ldm/stm instructions directly.
    * cipher/poly1305-armv7-neon.S (UNALIGNED_LDMIA2)
    (UNALIGNED_LDMIA4): New.
    (_gcry_poly1305_armv7_neon_init_ext, _gcry_poly1305_armv7_neon_blocks)
    (_gcry_poly1305_armv7_neon_finish_ext): Use new helper macros instead
    of ldm instruction directly.
    --
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S
index 1a395ba..4d3340b 100644
--- a/cipher/chacha20-armv7-neon.S
+++ b/cipher/chacha20-armv7-neon.S
@@ -33,6 +33,40 @@
 .fpu neon
 .arm
 
+#define UNALIGNED_STMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0-d3}; \
+        vmov s0, l0; \
+        vmov s1, l1; \
+        vmov s2, l2; \
+        vmov s3, l3; \
+        vmov s4, l4; \
+        vmov s5, l5; \
+        vmov s6, l6; \
+        vmov s7, l7; \
+        vst1.32 {d0-d3}, [ptr]; \
+        add ptr, #32; \
+        vpop {d0-d3}; \
+        b 2f; \
+     1: stmia ptr!, {l0-l7}; \
+     2: ;
+
+#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \
+        tst ptr, #3; \
+        /*beq 1f;*/ \
+        vpush {d0-d1}; \
+        vld1.32 {d0-d1}, [ptr]; \
+        add ptr, #16; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vmov l2, s2; \
+        vmov l3, s3; \
+        vpop {d0-d1}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l3}; \
+     2: ;
+
 .text
 
 .globl _gcry_chacha20_armv7_neon_blocks
@@ -352,7 +386,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r7, r7, r11
 	vadd.i32 q11, q11, q14
 	beq .Lchacha_blocks_neon_nomessage11
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -367,7 +402,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage11:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
+	tst r12, r12
 	ldm sp, {r0-r7}
 	ldr r8, [sp, #(64 +32)]
 	ldr r9, [sp, #(64 +36)]
@@ -391,7 +427,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	str r9, [sp, #(64 +52)]
 	beq .Lchacha_blocks_neon_nomessage12
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -406,7 +443,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage12:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
+	tst r12, r12
 	beq .Lchacha_blocks_neon_nomessage13
 	vld1.32 {q12,q13}, [r12]!
 	vld1.32 {q14,q15}, [r12]!
@@ -613,7 +651,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	add r7, r7, r11
 	beq .Lchacha_blocks_neon_nomessage21
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -628,7 +667,7 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage21:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
 	ldm sp, {r0-r7}
 	ldr r8, [sp, #(64 +32)]
 	ldr r9, [sp, #(64 +36)]
@@ -652,7 +691,8 @@ _gcry_chacha20_armv7_neon_blocks:
 	tst r12, r12
 	str r9, [sp, #(64 +52)]
 	beq .Lchacha_blocks_neon_nomessage22
-	ldmia r12!, {r8-r11}
+	UNALIGNED_LDMIA4(r12, r8, r9, r10, r11)
+	tst r12, r12
 	eor r0, r0, r8
 	eor r1, r1, r9
 	eor r2, r2, r10
@@ -667,7 +707,7 @@ _gcry_chacha20_armv7_neon_blocks:
 	add r12, r12, #16
 	eor r7, r7, r11
 .Lchacha_blocks_neon_nomessage22:
-	stmia r14!, {r0-r7}
+	UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7)
 	str r12, [sp, #48]
 	str r14, [sp, #40]
 	ldr r3, [sp, #52]
diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S
index b1554ed..13cb4a5 100644
--- a/cipher/poly1305-armv7-neon.S
+++ b/cipher/poly1305-armv7-neon.S
@@ -46,6 +46,32 @@
 #  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
 #endif
 
+#define UNALIGNED_LDMIA2(ptr, l0, l1) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0}; \
+        vld1.32 {d0}, [ptr]!; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vpop {d0}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l1}; \
+     2: ;
+
+#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \
+        tst ptr, #3; \
+        beq 1f; \
+        vpush {d0-d1}; \
+        vld1.32 {d0-d1}, [ptr]!; \
+        vmov l0, s0; \
+        vmov l1, s1; \
+        vmov l2, s2; \
+        vmov l3, s3; \
+        vpop {d0-d1}; \
+        b 2f; \
+     1: ldmia ptr!, {l0-l3}; \
+     2: ;
+
 .text
 
 .p2align 2
@@ -64,7 +90,7 @@ _gcry_poly1305_armv7_neon_init_ext:
 	mov r14, r2
 	and r2, r2, r2
 	moveq r14, #-1
-	ldmia r1!, {r2-r5}
+	UNALIGNED_LDMIA4(r1, r2, r3, r4, r5)
 	GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8)
 	mov r6, r2
 	mov r8, r2, lsr #26
@@ -175,7 +201,7 @@ _gcry_poly1305_armv7_neon_init_ext:
 	eor r6, r6, r6
 	stmia r0!, {r2-r6}
 	stmia r0!, {r2-r6}
-	ldmia r1!, {r2-r5}
+	UNALIGNED_LDMIA4(r1, r2, r3, r4, r5)
 	stmia r0, {r2-r6}
 	add sp, sp, #32
 	ldmfd sp!, {r4-r11, lr}
@@ -286,7 +312,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	vmov d14, d12
 	vmul.i32 q6, q5, d0[0]
 .Lpoly1305_blocks_neon_mainloop:
-	ldmia r0!, {r2-r5}
+	UNALIGNED_LDMIA4(r0, r2, r3, r4, r5)
 	vmull.u32 q0, d25, d12[0]
 	mov r7, r2, lsr #26
 	vmlal.u32 q0, d24, d12[1]
@@ -302,7 +328,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	orr r4, r8, r4, lsl #12
 	orr r5, r9, r5, lsl #18
 	vmlal.u32 q1, d24, d13[0]
-	ldmia r0!, {r7-r10}
+	UNALIGNED_LDMIA4(r0, r7, r8, r9, r10)
 	vmlal.u32 q1, d23, d13[1]
 	mov r1, r7, lsr #26
 	vmlal.u32 q1, d22, d14[0]
@@ -344,7 +370,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	vmlal.u32 q4, d21, d11[1]
 	vld1.64 {d21-d24}, [r14, :256]!
 	vld1.64 {d25}, [r14, :64]
-	ldmia r0!, {r2-r5}
+	UNALIGNED_LDMIA4(r0, r2, r3, r4, r5)
 	vmlal.u32 q0, d25, d26
 	mov r7, r2, lsr #26
 	vmlal.u32 q0, d24, d27
@@ -360,7 +386,7 @@ _gcry_poly1305_armv7_neon_blocks:
 	orr r4, r8, r4, lsl #12
 	orr r5, r9, r5, lsl #18
 	vmlal.u32 q1, d24, d28
-	ldmia r0!, {r7-r10}
+	UNALIGNED_LDMIA4(r0, r7, r8, r9, r10)
 	vmlal.u32 q1, d23, d29
 	mov r1, r7, lsr #26
 	vmlal.u32 q1, d22, d20
@@ -643,7 +669,7 @@ _gcry_poly1305_armv7_neon_finish_ext:
 .Lpoly1305_finish_ext_neon_skip16:
 	tst r7, #8
 	beq .Lpoly1305_finish_ext_neon_skip8
-	ldmia r1!, {r10-r11}
+	UNALIGNED_LDMIA2(r1, r10, r11)
 	stmia r9!, {r10-r11}
 .Lpoly1305_finish_ext_neon_skip8:
 	tst r7, #4

commit 496790940753226f96b731a43d950bd268acd97a
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Sun Jul 3 18:39:40 2016 +0300

    bench-slope: add unaligned buffer mode
    
    * tests/bench-slope.c (unaligned_mode): New.
    (do_slope_benchmark): Unalign buffer if in unaligned mode enabled.
    (print_help, main): Add '--unaligned' parameter.
    --
    
    Patch adds --unaligned parameter to allow measurement of unaligned
    buffer overhead.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/tests/bench-slope.c b/tests/bench-slope.c
index d97494c..cdd0fa6 100644
--- a/tests/bench-slope.c
+++ b/tests/bench-slope.c
@@ -42,6 +42,7 @@
 
 static int verbose;
 static int csv_mode;
+static int unaligned_mode;
 static int num_measurement_repetitions;
 
 /* CPU Ghz value provided by user, allows constructing cycles/byte and other
@@ -411,12 +412,14 @@ do_slope_benchmark (struct bench_obj *obj)
       obj->max_bufsize < 1 || obj->min_bufsize > obj->max_bufsize)
     goto err_free;
 
-  real_buffer = malloc (obj->max_bufsize + 128);
+  real_buffer = malloc (obj->max_bufsize + 128 + unaligned_mode);
   if (!real_buffer)
     goto err_free;
   /* Get aligned buffer */
   buffer = real_buffer;
   buffer += 128 - ((real_buffer - (unsigned char *) 0) & (128 - 1));
+  if (unaligned_mode)
+    buffer += unaligned_mode; /* Make buffer unaligned */
 
   for (i = 0; i < obj->max_bufsize; i++)
     buffer[i] = 0x55 ^ (-i);
@@ -1748,6 +1751,7 @@ print_help (void)
     "                             for benchmarking.",
     "   --repetitions <n>         Use N repetitions (default "
                                      STR2(NUM_MEASUREMENT_REPETITIONS) ")",
+    "   --unaligned               Use unaligned input buffers.",
     "   --csv                     Use CSV output format",
     NULL
   };
@@ -1832,6 +1836,12 @@ main (int argc, char **argv)
 	  argc--;
 	  argv++;
 	}
+      else if (!strcmp (*argv, "--unaligned"))
+	{
+	  unaligned_mode = 1;
+	  argc--;
+	  argv++;
+	}
       else if (!strcmp (*argv, "--disable-hwf"))
 	{
 	  argc--;

-----------------------------------------------------------------------

Summary of changes:
 cipher/chacha20-armv7-neon.S | 56 +++++++++++++++++++++++++++++++++++++-------
 cipher/poly1305-armv7-neon.S | 40 +++++++++++++++++++++++++------
 tests/bench-slope.c          | 12 +++++++++-
 3 files changed, 92 insertions(+), 16 deletions(-)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From jussi.kivilinna at iki.fi  Tue Jul 12 11:54:01 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 12 Jul 2016 12:54:01 +0300
Subject: [PATCH 2/3] Add ARMv8/AArch32 Crypto Extension implementation of GCM
In-Reply-To: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6>
References: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6>
Message-ID: <146831724159.1076.14802585401018300755.stgit@localhost6.localdomain6>

* cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch32-ce.S'.
* cipher/cipher-gcm-armv8-aarch32-ce.S: New.
* cipher/cipher-gcm.c [GCM_USE_ARM_PMULL]
(_gcry_ghash_setup_armv8_ce_pmull, _gcry_ghash_armv8_ce_pmull)
(ghash_setup_armv8_ce_pmull, ghash_armv8_ce_pmull): New.
(setupM) [GCM_USE_ARM_PMULL]: Enable ARM PMULL implementation if
HWF_ARM_PULL HW feature flag is enabled.
* cipher/cipher-gcm.h (GCM_USE_ARM_PMULL): New.
--

Benchmark on Cortex-A53 (1152 Mhz):

Before:
                     |  nanosecs/byte   mebibytes/sec   cycles/byte
  GMAC_AES           |     24.10 ns/B     39.57 MiB/s     27.76 c/B

After (~26x faster):
                     |  nanosecs/byte   mebibytes/sec   cycles/byte
  GMAC_AES           |     0.924 ns/B    1032.2 MiB/s      1.06 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 1e97050..5d69a38 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
@@ -43,6 +43,7 @@ libcipher_la_SOURCES = \
 cipher.c cipher-internal.h \
 cipher-cbc.c cipher-cfb.c cipher-ofb.c cipher-ctr.c cipher-aeswrap.c \
 cipher-ccm.c cipher-cmac.c cipher-gcm.c cipher-gcm-intel-pclmul.c \
+ cipher-gcm-armv8-aarch32-ce.S \
 cipher-poly1305.c cipher-ocb.c \
 cipher-selftest.c cipher-selftest.h \
 pubkey.c pubkey-internal.h pubkey-util.c \
diff --git a/cipher/cipher-gcm-armv8-aarch32-ce.S b/cipher/cipher-gcm-armv8-aarch32-ce.S
new file mode 100644
index 0000000..b879fb2
--- /dev/null
+++ b/cipher/cipher-gcm-armv8-aarch32-ce.S
@@ -0,0 +1,235 @@
+/* cipher-gcm-armv8-aarch32-ce.S - ARM/CE accelerated GHASH
+ * Copyright (C) 2016 Jussi Kivilinna <jussi.kivilinna at iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+
+.syntax unified
+.fpu crypto-neon-fp-armv8
+.arm
+
+.text
+
+#ifdef __PIC__
+#  define GET_DATA_POINTER(reg, name, rtmp) \
+		ldr reg, 1f; \
+		ldr rtmp, 2f; \
+		b 3f; \
+	1:	.word _GLOBAL_OFFSET_TABLE_-(3f+8); \
+	2:	.word name(GOT); \
+	3:	add reg, pc, reg; \
+		ldr reg, [reg, rtmp];
+#else
+#  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+#endif
+
+
+/* Constants */
+
+.align 4
+gcry_gcm_reduction_constant:
+.Lrconst64:
+  .quad 0xc200000000000000
+
+
+/* Register macros */
+
+#define rhash q0
+#define rhash_l d0
+#define rhash_h d1
+
+#define rbuf q1
+#define rbuf_l d2
+#define rbuf_h d3
+
+#define rh0 q2
+#define rh0_l d4
+#define rh0_h d5
+
+#define rt0 q3
+#define rt0_l d6
+#define rt0_h d7
+
+#define rr0 q8
+#define rr0_l d16
+#define rr0_h d17
+
+#define rr1 q9
+#define rr1_l d18
+#define rr1_h d19
+
+#define rrconst q15
+#define rrconst_l d30
+#define rrconst_h d31
+
+#define ia rbuf_h
+#define ib rbuf_l
+#define oa rh0_l
+#define ob rh0_h
+#define co rrconst_l
+#define ma rrconst_h
+
+/* GHASH macros */
+
+/* See "Gouv?a, C. P. L. & L?pez, J. Implementing GCM on ARMv8. Topics in
+ * Cryptology ? CT-RSA 2015" for details.
+ */
+
+/* Input: 'a' and 'b', Output: 'r0:r1' (low 128-bits in r0, high in r1) */
+#define PMUL_128x128(r0, r1, a, b, t, interleave_op) \
+        veor t##_h, b##_l, b##_h; \
+        veor t##_l, a##_l, a##_h; \
+        vmull.p64 r0, a##_l, b##_l; \
+        vmull.p64 r1, a##_h, b##_h; \
+        vmull.p64 t, t##_h, t##_l; \
+        interleave_op(); \
+        veor t, r0; \
+        veor t, r1; \
+        veor r0##_h, t##_l; \
+        veor r1##_l, t##_h;
+
+/* Input: 'r0:r1', Output: 'a' */
+#define REDUCTION(a, r0, r1, rconst, t, interleave_op) \
+        vmull.p64 t, r0##_l, rconst; \
+        veor r0##_h, t##_l; \
+        veor r1##_l, t##_h; \
+        interleave_op(); \
+        vmull.p64 t, r0##_h, rconst; \
+        veor r1, t; \
+        veor a, r0, r1;
+
+#define _(...) /*_*/
+#define vrev_rbuf() vrev64.8 rbuf, rbuf;
+#define vext_rbuf() vext.8 rbuf, rbuf, rbuf, #8;
+
+/* Other functional macros */
+
+#define CLEAR_REG(reg) veor reg, reg;
+
+
+/*
+ * unsigned int _gcry_ghash_armv8_ce_pmull (void *gcm_key, byte *result,
+ *                                          const byte *buf, size_t nblocks,
+ *                                          void *gcm_table);
+ */
+.align 3
+.globl _gcry_ghash_armv8_ce_pmull
+.type  _gcry_ghash_armv8_ce_pmull,%function;
+_gcry_ghash_armv8_ce_pmull:
+  /* input:
+   *    r0: gcm_key
+   *    r1: result/hash
+   *    r2: buf
+   *    r3: nblocks
+   *    %st+0: gcm_table
+   */
+  push {r4, lr}
+
+  cmp r3, #0
+  beq .Ldo_nothing
+
+  GET_DATA_POINTER(lr, .Lrconst64, r4)
+
+  subs r3, r3, #1
+  vld1.64 {rhash}, [r1]
+  vld1.64 {rh0}, [r0]
+
+  vrev64.8 rhash, rhash /* byte-swap */
+  vld1.64 {rrconst_h}, [lr]
+  vext.8 rhash, rhash, rhash, #8
+
+  vld1.64 {rbuf}, [r2]!
+
+  vrev64.8 rbuf, rbuf /* byte-swap */
+  vext.8 rbuf, rbuf, rbuf, #8
+
+  veor rhash, rhash, rbuf
+
+  beq .Lend
+
+.Loop:
+  vld1.64 {rbuf}, [r2]!
+  subs r3, r3, #1
+  PMUL_128x128(rr0, rr1, rh0, rhash, rt0, vrev_rbuf)
+  REDUCTION(rhash, rr0, rr1, rrconst_h, rt0, vext_rbuf)
+  veor rhash, rhash, rbuf
+
+  bne .Loop
+
+.Lend:
+  PMUL_128x128(rr0, rr1, rh0, rhash, rt0, _)
+  REDUCTION(rhash, rr0, rr1, rrconst_h, rt0, _)
+
+  CLEAR_REG(rr1)
+  CLEAR_REG(rr0)
+  vrev64.8 rhash, rhash /* byte-swap */
+  CLEAR_REG(rbuf)
+  CLEAR_REG(rt0)
+  vext.8 rhash, rhash, rhash, #8
+  CLEAR_REG(rh0)
+
+  vst1.64 {rhash}, [r1]
+  CLEAR_REG(rhash)
+
+.Ldo_nothing:
+  mov r0, #0
+  pop {r4, pc}
+.size _gcry_ghash_armv8_ce_pmull,.-_gcry_ghash_armv8_ce_pmull;
+
+
+/*
+ * void _gcry_ghash_setup_armv8_ce_pmull (void *gcm_key, void *gcm_table);
+ */
+.align 3
+.globl _gcry_ghash_setup_armv8_ce_pmull
+.type  _gcry_ghash_setup_armv8_ce_pmull,%function;
+_gcry_ghash_setup_armv8_ce_pmull:
+  /* input:
+   *	r0: gcm_key
+   *	r1: gcm_table
+   */
+
+  push {r4, lr}
+
+  GET_DATA_POINTER(r4, .Lrconst64, lr)
+
+  /* H <<< 1 */
+  vld1.64 {ib,ia}, [r0]
+  vld1.64 {co}, [r4]
+  vrev64.8 ib, ib;
+  vrev64.8 ia, ia;
+  vshr.s64 ma, ib, #63
+  vshr.u64 oa, ib, #63
+  vshr.u64 ob, ia, #63
+  vand ma, co
+  vshl.u64 ib, ib, #1
+  vshl.u64 ia, ia, #1
+  vorr ob, ib
+  vorr oa, ia
+  veor ob, ma
+
+  vst1.64 {oa, ob}, [r0]
+
+  pop {r4, pc}
+.size _gcry_ghash_setup_armv8_ce_pmull,.-_gcry_ghash_setup_armv8_ce_pmull;
+
+#endif
diff --git a/cipher/cipher-gcm.c b/cipher/cipher-gcm.c
index 6e0959a..2b8b454 100644
--- a/cipher/cipher-gcm.c
+++ b/cipher/cipher-gcm.c
@@ -37,6 +37,30 @@ extern unsigned int _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result,
                                               const byte *buf, size_t nblocks);
 #endif
 
+#ifdef GCM_USE_ARM_PMULL
+extern void _gcry_ghash_setup_armv8_ce_pmull (void *gcm_key, void *gcm_table);
+
+extern unsigned int _gcry_ghash_armv8_ce_pmull (void *gcm_key, byte *result,
+                                                const byte *buf, size_t nblocks,
+                                                void *gcm_table);
+
+static void
+ghash_setup_armv8_ce_pmull (gcry_cipher_hd_t c)
+{
+  _gcry_ghash_setup_armv8_ce_pmull(c->u_mode.gcm.u_ghash_key.key,
+                                   c->u_mode.gcm.gcm_table);
+}
+
+static unsigned int
+ghash_armv8_ce_pmull (gcry_cipher_hd_t c, byte *result, const byte *buf,
+                      size_t nblocks)
+{
+  return _gcry_ghash_armv8_ce_pmull(c->u_mode.gcm.u_ghash_key.key, result, buf,
+                                    nblocks, c->u_mode.gcm.gcm_table);
+}
+
+#endif
+
 
 #ifdef GCM_USE_TABLES
 static const u16 gcmR[256] = {
@@ -379,15 +403,26 @@ ghash_internal (gcry_cipher_hd_t c, byte *result, const byte *buf,
 static void
 setupM (gcry_cipher_hd_t c)
 {
+#if defined(GCM_USE_INTEL_PCLMUL) || defined(GCM_USE_ARM_PMULL)
+  unsigned int features = _gcry_get_hw_features ();
+#endif
+
   if (0)
     ;
 #ifdef GCM_USE_INTEL_PCLMUL
-  else if (_gcry_get_hw_features () & HWF_INTEL_PCLMUL)
+  else if (features & HWF_INTEL_PCLMUL)
     {
       c->u_mode.gcm.ghash_fn = _gcry_ghash_intel_pclmul;
       _gcry_ghash_setup_intel_pclmul (c);
     }
 #endif
+#ifdef GCM_USE_ARM_PMULL
+  else if (features & HWF_ARM_PMULL)
+    {
+      c->u_mode.gcm.ghash_fn = ghash_armv8_ce_pmull;
+      ghash_setup_armv8_ce_pmull (c);
+    }
+#endif
   else
     {
       c->u_mode.gcm.ghash_fn = ghash_internal;
diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h
index 9fd1d91..52504f6 100644
--- a/cipher/cipher-internal.h
+++ b/cipher/cipher-internal.h
@@ -72,6 +72,16 @@
 # endif
 #endif /* GCM_USE_INTEL_PCLMUL */
 
+/* GCM_USE_ARM_PMULL indicates whether to compile GCM with ARMv8 PMULL code. */
+#undef GCM_USE_ARM_PMULL
+#if defined(ENABLE_ARM_CRYPTO_SUPPORT) && defined(GCM_USE_TABLES)
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+#  define GCM_USE_ARM_PMULL 1
+# endif
+#endif /* GCM_USE_ARM_PMULL */
+
 
 typedef unsigned int (*ghash_fn_t) (gcry_cipher_hd_t c, byte *result,
                                     const byte *buf, size_t nblocks);


From jussi.kivilinna at iki.fi  Tue Jul 12 11:53:56 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 12 Jul 2016 12:53:56 +0300
Subject: [PATCH 1/3] Add ARMv8/AArch32 Crypto Extension implemenation of
 SHA-256
Message-ID: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6>

* cipher/Makefile.am: Add 'sha256-armv8-aarch32-ce.S'.
* cipher/sha256-armv8-aarch32-ce.S: New.
* cipher/sha256.c (USE_ARM_CE): New.
(sha256_init, sha224_init): Check features for HWF_ARM_SHA1.
[USE_ARM_CE] (_gcry_sha256_transform_armv8_ce): New.
(transform) [USE_ARM_CE]: Use ARMv8 CE implementation if HW supports.
(SHA256_CONTEXT): Add 'use_arm_ce'.
* configure.ac: Add 'sha256-armv8-aarch32-ce.lo'.
--

Benchmark on Cortex-A53 (1152 Mhz):

Before:

                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |     17.38 ns/B     54.88 MiB/s     20.02 c/B

After (~9.3x faster):

                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |      1.85 ns/B     515.7 MiB/s      2.13 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 571673e..1e97050 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
@@ -89,6 +89,7 @@ serpent.c serpent-sse2-amd64.S serpent-avx2-amd64.S serpent-armv7-neon.S \
 sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \
   sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \
 sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \
+  sha256-armv8-aarch32-ce.S \
 sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \
   sha512-armv7-neon.S sha512-arm.S \
 keccak.c keccak_permute_32.h keccak_permute_64.h keccak-armv7-neon.S \
diff --git a/cipher/sha256-armv8-aarch32-ce.S b/cipher/sha256-armv8-aarch32-ce.S
new file mode 100644
index 0000000..a0dbcea
--- /dev/null
+++ b/cipher/sha256-armv8-aarch32-ce.S
@@ -0,0 +1,231 @@
+/* sha256-armv8-aarch32-ce.S - ARM/CE accelerated SHA-256 transform function
+ * Copyright (C) 2016 Jussi Kivilinna <jussi.kivilinna at iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA256)
+
+.syntax unified
+.fpu crypto-neon-fp-armv8
+.arm
+
+.text
+
+#ifdef __PIC__
+#  define GET_DATA_POINTER(reg, name, rtmp) \
+		ldr reg, 1f; \
+		ldr rtmp, 2f; \
+		b 3f; \
+	1:	.word _GLOBAL_OFFSET_TABLE_-(3f+8); \
+	2:	.word name(GOT); \
+	3:	add reg, pc, reg; \
+		ldr reg, [reg, rtmp];
+#else
+#  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+#endif
+
+
+/* Constants */
+
+.align 4
+gcry_sha256_aarch32_ce_K:
+.LK:
+  .long 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+  .long 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+  .long 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+  .long 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+  .long 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+  .long 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+  .long 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+  .long 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+  .long 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+  .long 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+  .long 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+  .long 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+  .long 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+  .long 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+  .long 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+  .long 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+
+/* Register macros */
+
+#define qH0123 q0
+#define qH4567 q1
+
+#define qABCD0 q2
+#define qABCD1 q3
+#define qEFGH  q4
+
+#define qT0 q5
+#define qT1 q6
+
+#define qW0 q8
+#define qW1 q9
+#define qW2 q10
+#define qW3 q11
+
+#define qK0 q12
+#define qK1 q13
+#define qK2 q14
+#define qK3 q15
+
+
+/* Round macros */
+
+#define _(...) /*_*/
+
+
+
+/* Other functional macros */
+
+#define CLEAR_REG(reg) veor reg, reg;
+
+
+/*
+ * unsigned int
+ * _gcry_sha256_transform_armv8_ce (u32 state[8], const void *input_data,
+ *                                  size_t num_blks)
+ */
+.align 3
+.globl _gcry_sha256_transform_armv8_ce
+.type  _gcry_sha256_transform_armv8_ce,%function;
+_gcry_sha256_transform_armv8_ce:
+  /* input:
+   *	r0: ctx, CTX
+   *	r1: data (64*nblks bytes)
+   *	r2: nblks
+   */
+
+  cmp r2, #0;
+  push {r4,lr};
+  beq .Ldo_nothing;
+
+  vpush {q4-q7};
+
+  GET_DATA_POINTER(r4, .LK, lr);
+  mov lr, r4
+
+  vld1.32 {qH0123-qH4567}, [r0]  /* load state */
+
+#define do_loadk(nk0, nk1) vld1.32 {nk0-nk1},[lr]!;
+#define do_add(a, b) vadd.u32 a, a, b;
+#define do_sha256su0(w0, w1) sha256su0.32 w0, w1;
+#define do_sha256su1(w0, w2, w3) sha256su1.32 w0, w2, w3;
+
+#define do_rounds(k, nk0, nk1, w0, w1, w2, w3, loadk_fn, add_fn, su0_fn, su1_fn) \
+        loadk_fn(   nk0, nk1     ); \
+        su0_fn(     w0, w1       ); \
+        vmov        qABCD1, qABCD0; \
+        sha256h.32  qABCD0, qEFGH, k; \
+        sha256h2.32 qEFGH, qABCD1, k; \
+        add_fn(     nk0, w2      ); \
+        su1_fn(     w0, w2, w3   ); \
+
+  vld1.8 {qW0-qW1}, [r1]!
+  do_loadk(qK0, qK1)
+  vld1.8 {qW2-qW3}, [r1]!
+  vmov qABCD0, qH0123
+  vmov qEFGH, qH4567
+
+  vrev32.8 qW0, qW0
+  vrev32.8 qW1, qW1
+  vrev32.8 qW2, qW2
+  do_add(qK0, qW0)
+  vrev32.8 qW3, qW3
+  do_add(qK1, qW1)
+
+.Loop:
+  do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  subs r2,r2,#1
+  do_rounds(qK1, qK3, _  , qW1, qW2, qW3, qW0, _       , do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK3, qK1, _  , qW3, qW0, qW1, qW2, _       , do_add, do_sha256su0, do_sha256su1)
+
+  do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK1, qK3, _  , qW1, qW2, qW3, qW0, _       , do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK3, qK1, _  , qW3, qW0, qW1, qW2, _       , do_add, do_sha256su0, do_sha256su1)
+
+  do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK1, qK3, _  , qW1, qW2, qW3, qW0, _       , do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1)
+  do_rounds(qK3, qK1, _  , qW3, qW0, qW1, qW2, _       , do_add, do_sha256su0, do_sha256su1)
+
+  beq .Lend
+
+  do_rounds(qK0, qK2, qK3, qW0, _  , qW2, qW3, do_loadk, do_add, _, _)
+  vld1.8 {qW0}, [r1]!
+  mov lr, r4
+  do_rounds(qK1, qK3, _  , qW1, _  , qW3, _  , _       , do_add, _, _)
+  vld1.8 {qW1}, [r1]!
+  vrev32.8 qW0, qW0
+  do_rounds(qK2, qK0, qK1, qW2, _  , qW0, _  , do_loadk, do_add, _, _)
+  vrev32.8 qW1, qW1
+  vld1.8 {qW2}, [r1]!
+  do_rounds(qK3, qK1, _  , qW3, _  , qW1, _  , _       , do_add, _, _)
+  vld1.8 {qW3}, [r1]!
+
+  vadd.u32 qH0123, qABCD0
+  vadd.u32 qH4567, qEFGH
+
+  vrev32.8 qW2, qW2
+  vmov qABCD0, qH0123
+  vrev32.8 qW3, qW3
+  vmov qEFGH, qH4567
+
+  b .Loop
+
+.Lend:
+
+  do_rounds(qK0, qK2, qK3, qW0, _  , qW2, qW3, do_loadk, do_add, _, _)
+  do_rounds(qK1, qK3, _  , qW1, _  , qW3, _  , _       , do_add, _, _)
+  do_rounds(qK2, _  , _  , qW2, _  , _  , _  , _       , _, _, _)
+  do_rounds(qK3, _  , _  , qW3, _  , _  , _  , _       , _, _, _)
+
+  CLEAR_REG(qW0)
+  CLEAR_REG(qW1)
+  CLEAR_REG(qW2)
+  CLEAR_REG(qW3)
+  CLEAR_REG(qK0)
+  CLEAR_REG(qK1)
+  CLEAR_REG(qK2)
+  CLEAR_REG(qK3)
+
+  vadd.u32 qH0123, qABCD0
+  vadd.u32 qH4567, qEFGH
+
+  CLEAR_REG(qABCD0)
+  CLEAR_REG(qABCD1)
+  CLEAR_REG(qEFGH)
+
+  vst1.32 {qH0123-qH4567}, [r0] /* store state */
+
+  CLEAR_REG(qH0123)
+  CLEAR_REG(qH4567)
+  vpop {q4-q7}
+
+.Ldo_nothing:
+  mov r0, #0
+  pop {r4,pc}
+.size _gcry_sha256_transform_armv8_ce,.-_gcry_sha256_transform_armv8_ce;
+
+#endif
diff --git a/cipher/sha256.c b/cipher/sha256.c
index 1b82ee7..72818ce 100644
--- a/cipher/sha256.c
+++ b/cipher/sha256.c
@@ -75,6 +75,17 @@
 # define USE_AVX2 1
 #endif
 
+/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly
+ * code. */
+#undef USE_ARM_CE
+#ifdef ENABLE_ARM_CRYPTO_SUPPORT
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+#  define USE_ARM_CE 1
+# endif
+#endif
+
 
 typedef struct {
   gcry_md_block_ctx_t bctx;
@@ -88,6 +99,9 @@ typedef struct {
 #ifdef USE_AVX2
   unsigned int use_avx2:1;
 #endif
+#ifdef USE_ARM_CE
+  unsigned int use_arm_ce:1;
+#endif
 } SHA256_CONTEXT;
 
 
@@ -129,6 +143,9 @@ sha256_init (void *context, unsigned int flags)
 #ifdef USE_AVX2
   hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2);
 #endif
+#ifdef USE_ARM_CE
+  hd->use_arm_ce = (features & HWF_ARM_SHA2) != 0;
+#endif
   (void)features;
 }
 
@@ -167,6 +184,9 @@ sha224_init (void *context, unsigned int flags)
 #ifdef USE_AVX2
   hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2);
 #endif
+#ifdef USE_ARM_CE
+  hd->use_arm_ce = (features & HWF_ARM_SHA2) != 0;
+#endif
   (void)features;
 }
 
@@ -355,6 +375,11 @@ unsigned int _gcry_sha256_transform_amd64_avx2(const void *input_data,
                                                size_t num_blks) ASM_FUNC_ABI;
 #endif
 
+#ifdef USE_ARM_CE
+unsigned int _gcry_sha256_transform_armv8_ce(u32 state[8],
+                                             const void *input_data,
+                                             size_t num_blks);
+#endif
 
 static unsigned int
 transform (void *ctx, const unsigned char *data, size_t nblks)
@@ -380,6 +405,11 @@ transform (void *ctx, const unsigned char *data, size_t nblks)
            + 4 * sizeof(void*) + ASM_EXTRA_STACK;
 #endif
 
+#ifdef USE_ARM_CE
+  if (hd->use_arm_ce)
+    return _gcry_sha256_transform_armv8_ce (&hd->h0, data, nblks);
+#endif
+
   do
     {
       burn = transform_blk (hd, data);
diff --git a/configure.ac b/configure.ac
index 613a3d6..91dd285 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2256,6 +2256,10 @@ if test "$found" = "1" ; then
          GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-avx-amd64.lo"
          GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-avx2-bmi2-amd64.lo"
       ;;
+      arm*-*-*)
+         # Build with the assembly implementation
+         GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha512-arm.lo"
+      ;;
    esac
 fi
 
@@ -2273,7 +2277,7 @@ if test "$found" = "1" ; then
       ;;
       arm*-*-*)
          # Build with the assembly implementation
-         GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha512-arm.lo"
+         GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-armv8-aarch32-ce.lo"
       ;;
    esac
 

From cvs at cvs.gnupg.org  Wed Jul 13 19:08:37 2016
From: cvs at cvs.gnupg.org (by Werner Koch)
Date: Wed, 13 Jul 2016 19:08:37 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-18-ge535ea1
Message-ID: <E1bNNbK-0001ix-T8@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  e535ea1bdc42309553007d60599d3147b8defe93 (commit)
      from  1111d311fd6452abd4080d1072c75ddb1b5a3dd1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit e535ea1bdc42309553007d60599d3147b8defe93
Author: Werner Koch <wk at gnupg.org>
Date:   Wed Jul 13 19:05:34 2016 +0200

    build: Update config.{guess,sub} to {2016-05-15,2016-06-20}.
    
    * build-aux/config.guess: Update.
    * build-aux/config.sub: Update.
    
    Signed-off-by: Werner Koch <wk at gnupg.org>

diff --git a/build-aux/config.guess b/build-aux/config.guess
index dbfb978..c4bd827 100755
--- a/build-aux/config.guess
+++ b/build-aux/config.guess
@@ -1,8 +1,8 @@
 #! /bin/sh
 # Attempt to guess a canonical system name.
-#   Copyright 1992-2015 Free Software Foundation, Inc.
+#   Copyright 1992-2016 Free Software Foundation, Inc.
 
-timestamp='2015-01-01'
+timestamp='2016-05-15'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -27,7 +27,7 @@ timestamp='2015-01-01'
 # Originally written by Per Bothner; maintained since 2000 by Ben Elliston.
 #
 # You can get the latest version of this script from:
-# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess
 #
 # Please send patches to <config-patches at gnu.org>.
 
@@ -50,7 +50,7 @@ version="\
 GNU config.guess ($timestamp)
 
 Originally written by Per Bothner.
-Copyright 1992-2015 Free Software Foundation, Inc.
+Copyright 1992-2016 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -168,19 +168,29 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	# Note: NetBSD doesn't particularly care about the vendor
 	# portion of the name.  We always set it to "unknown".
 	sysctl="sysctl -n hw.machine_arch"
-	UNAME_MACHINE_ARCH=`(/sbin/$sysctl 2>/dev/null || \
-	    /usr/sbin/$sysctl 2>/dev/null || echo unknown)`
+	UNAME_MACHINE_ARCH=`(uname -p 2>/dev/null || \
+	    /sbin/$sysctl 2>/dev/null || \
+	    /usr/sbin/$sysctl 2>/dev/null || \
+	    echo unknown)`
 	case "${UNAME_MACHINE_ARCH}" in
 	    armeb) machine=armeb-unknown ;;
 	    arm*) machine=arm-unknown ;;
 	    sh3el) machine=shl-unknown ;;
 	    sh3eb) machine=sh-unknown ;;
 	    sh5el) machine=sh5le-unknown ;;
+	    earmv*)
+		arch=`echo ${UNAME_MACHINE_ARCH} | sed -e 's,^e\(armv[0-9]\).*$,\1,'`
+		endian=`echo ${UNAME_MACHINE_ARCH} | sed -ne 's,^.*\(eb\)$,\1,p'`
+		machine=${arch}${endian}-unknown
+		;;
 	    *) machine=${UNAME_MACHINE_ARCH}-unknown ;;
 	esac
 	# The Operating System including object format, if it has switched
-	# to ELF recently, or will in the future.
+	# to ELF recently (or will in the future) and ABI.
 	case "${UNAME_MACHINE_ARCH}" in
+	    earm*)
+		os=netbsdelf
+		;;
 	    arm*|i386|m68k|ns32k|sh3*|sparc|vax)
 		eval $set_cc_for_build
 		if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \
@@ -197,6 +207,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 		os=netbsd
 		;;
 	esac
+	# Determine ABI tags.
+	case "${UNAME_MACHINE_ARCH}" in
+	    earm*)
+		expr='s/^earmv[0-9]/-eabi/;s/eb$//'
+		abi=`echo ${UNAME_MACHINE_ARCH} | sed -e "$expr"`
+		;;
+	esac
 	# The OS release
 	# Debian GNU/NetBSD machines have a different userland, and
 	# thus, need a distinct triplet. However, they do not need
@@ -207,13 +224,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 		release='-gnu'
 		;;
 	    *)
-		release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
+		release=`echo ${UNAME_RELEASE} | sed -e 's/[-_].*//' | cut -d. -f1,2`
 		;;
 	esac
 	# Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM:
 	# contains redundant information, the shorter form:
 	# CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used.
-	echo "${machine}-${os}${release}"
+	echo "${machine}-${os}${release}${abi}"
 	exit ;;
     *:Bitrig:*:*)
 	UNAME_MACHINE_ARCH=`arch | sed 's/Bitrig.//'`
@@ -223,6 +240,10 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'`
 	echo ${UNAME_MACHINE_ARCH}-unknown-openbsd${UNAME_RELEASE}
 	exit ;;
+    *:LibertyBSD:*:*)
+	UNAME_MACHINE_ARCH=`arch | sed 's/^.*BSD\.//'`
+	echo ${UNAME_MACHINE_ARCH}-unknown-libertybsd${UNAME_RELEASE}
+	exit ;;
     *:ekkoBSD:*:*)
 	echo ${UNAME_MACHINE}-unknown-ekkobsd${UNAME_RELEASE}
 	exit ;;
@@ -235,6 +256,9 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
     *:MirBSD:*:*)
 	echo ${UNAME_MACHINE}-unknown-mirbsd${UNAME_RELEASE}
 	exit ;;
+    *:Sortix:*:*)
+	echo ${UNAME_MACHINE}-unknown-sortix
+	exit ;;
     alpha:OSF1:*:*)
 	case $UNAME_RELEASE in
 	*4.0)
@@ -251,42 +275,42 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^  The alpha \(.*\) processor.*$/\1/p' | head -n 1`
 	case "$ALPHA_CPU_TYPE" in
 	    "EV4 (21064)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "EV4.5 (21064)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "LCA4 (21066/21068)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "EV5 (21164)")
-		UNAME_MACHINE="alphaev5" ;;
+		UNAME_MACHINE=alphaev5 ;;
 	    "EV5.6 (21164A)")
-		UNAME_MACHINE="alphaev56" ;;
+		UNAME_MACHINE=alphaev56 ;;
 	    "EV5.6 (21164PC)")
-		UNAME_MACHINE="alphapca56" ;;
+		UNAME_MACHINE=alphapca56 ;;
 	    "EV5.7 (21164PC)")
-		UNAME_MACHINE="alphapca57" ;;
+		UNAME_MACHINE=alphapca57 ;;
 	    "EV6 (21264)")
-		UNAME_MACHINE="alphaev6" ;;
+		UNAME_MACHINE=alphaev6 ;;
 	    "EV6.7 (21264A)")
-		UNAME_MACHINE="alphaev67" ;;
+		UNAME_MACHINE=alphaev67 ;;
 	    "EV6.8CB (21264C)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.8AL (21264B)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.8CX (21264D)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.9A (21264/EV69A)")
-		UNAME_MACHINE="alphaev69" ;;
+		UNAME_MACHINE=alphaev69 ;;
 	    "EV7 (21364)")
-		UNAME_MACHINE="alphaev7" ;;
+		UNAME_MACHINE=alphaev7 ;;
 	    "EV7.9 (21364A)")
-		UNAME_MACHINE="alphaev79" ;;
+		UNAME_MACHINE=alphaev79 ;;
 	esac
 	# A Pn.n version is a patched version.
 	# A Vn.n version is a released version.
 	# A Tn.n version is a released field test version.
 	# A Xn.n version is an unreleased experimental baselevel.
 	# 1.2 uses "1.2" for uname -r.
-	echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
+	echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz`
 	# Reset EXIT trap before exiting to avoid spurious non-zero exit code.
 	exitcode=$?
 	trap '' 0
@@ -359,16 +383,16 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	exit ;;
     i86pc:SunOS:5.*:* | i86xen:SunOS:5.*:*)
 	eval $set_cc_for_build
-	SUN_ARCH="i386"
+	SUN_ARCH=i386
 	# If there is a compiler, see if it is configured for 64-bit objects.
 	# Note that the Sun cc does not turn __LP64__ into 1 like gcc does.
 	# This test works for both compilers.
-	if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then
+	if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
 	    if (echo '#ifdef __amd64'; echo IS_64BIT_ARCH; echo '#endif') | \
-		(CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \
+		(CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
 		grep IS_64BIT_ARCH >/dev/null
 	    then
-		SUN_ARCH="x86_64"
+		SUN_ARCH=x86_64
 	    fi
 	fi
 	echo ${SUN_ARCH}-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
@@ -393,7 +417,7 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	exit ;;
     sun*:*:4.2BSD:*)
 	UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null`
-	test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3
+	test "x${UNAME_RELEASE}" = x && UNAME_RELEASE=3
 	case "`/bin/arch`" in
 	    sun3)
 		echo m68k-sun-sunos${UNAME_RELEASE}
@@ -618,13 +642,13 @@ EOF
 		    sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null`
 		    sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null`
 		    case "${sc_cpu_version}" in
-		      523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0
-		      528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1
+		      523) HP_ARCH=hppa1.0 ;; # CPU_PA_RISC1_0
+		      528) HP_ARCH=hppa1.1 ;; # CPU_PA_RISC1_1
 		      532)                      # CPU_PA_RISC2_0
 			case "${sc_kernel_bits}" in
-			  32) HP_ARCH="hppa2.0n" ;;
-			  64) HP_ARCH="hppa2.0w" ;;
-			  '') HP_ARCH="hppa2.0" ;;   # HP-UX 10.20
+			  32) HP_ARCH=hppa2.0n ;;
+			  64) HP_ARCH=hppa2.0w ;;
+			  '') HP_ARCH=hppa2.0 ;;   # HP-UX 10.20
 			esac ;;
 		    esac
 		fi
@@ -663,11 +687,11 @@ EOF
 		    exit (0);
 		}
 EOF
-		    (CCOPTS= $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy`
+		    (CCOPTS="" $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy`
 		    test -z "$HP_ARCH" && HP_ARCH=hppa
 		fi ;;
 	esac
-	if [ ${HP_ARCH} = "hppa2.0w" ]
+	if [ ${HP_ARCH} = hppa2.0w ]
 	then
 	    eval $set_cc_for_build
 
@@ -680,12 +704,12 @@ EOF
 	    # $ CC_FOR_BUILD="cc +DA2.0w" ./config.guess
 	    # => hppa64-hp-hpux11.23
 
-	    if echo __LP64__ | (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) |
+	    if echo __LP64__ | (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) |
 		grep -q __LP64__
 	    then
-		HP_ARCH="hppa2.0w"
+		HP_ARCH=hppa2.0w
 	    else
-		HP_ARCH="hppa64"
+		HP_ARCH=hppa64
 	    fi
 	fi
 	echo ${HP_ARCH}-hp-hpux${HPUX_REV}
@@ -790,14 +814,14 @@ EOF
 	echo craynv-cray-unicosmp${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
 	exit ;;
     F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*)
-	FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
-	FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
+	FUJITSU_PROC=`uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz`
+	FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'`
 	FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'`
 	echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
 	exit ;;
     5000:UNIX_System_V:4.*:*)
-	FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
-	FUJITSU_REL=`echo ${UNAME_RELEASE} | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/ /_/'`
+	FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'`
+	FUJITSU_REL=`echo ${UNAME_RELEASE} | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/'`
 	echo "sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
 	exit ;;
     i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*)
@@ -879,7 +903,7 @@ EOF
 	exit ;;
     *:GNU/*:*:*)
 	# other systems with GNU libc and userland
-	echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC}
+	echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]"``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC}
 	exit ;;
     i*86:Minix:*:*)
 	echo ${UNAME_MACHINE}-pc-minix
@@ -902,7 +926,7 @@ EOF
 	  EV68*) UNAME_MACHINE=alphaev68 ;;
 	esac
 	objdump --private-headers /bin/sh | grep -q ld.so.1
-	if test "$?" = 0 ; then LIBC="gnulibc1" ; fi
+	if test "$?" = 0 ; then LIBC=gnulibc1 ; fi
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
     arc:Linux:*:* | arceb:Linux:*:*)
@@ -933,6 +957,9 @@ EOF
     crisv32:Linux:*:*)
 	echo ${UNAME_MACHINE}-axis-linux-${LIBC}
 	exit ;;
+    e2k:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     frv:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
@@ -945,6 +972,9 @@ EOF
     ia64:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
+    k1om:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     m32r*:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
@@ -1021,7 +1051,7 @@ EOF
 	echo ${UNAME_MACHINE}-dec-linux-${LIBC}
 	exit ;;
     x86_64:Linux:*:*)
-	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	echo ${UNAME_MACHINE}-pc-linux-${LIBC}
 	exit ;;
     xtensa*:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
@@ -1100,7 +1130,7 @@ EOF
 	# uname -m prints for DJGPP always 'pc', but it prints nothing about
 	# the processor, so we play safe by assuming i586.
 	# Note: whatever this is, it MUST be the same as what config.sub
-	# prints for the "djgpp" host, or else GDB configury will decide that
+	# prints for the "djgpp" host, or else GDB configure will decide that
 	# this is a cross-build.
 	echo i586-pc-msdosdjgpp
 	exit ;;
@@ -1249,6 +1279,9 @@ EOF
     SX-8R:SUPER-UX:*:*)
 	echo sx8r-nec-superux${UNAME_RELEASE}
 	exit ;;
+    SX-ACE:SUPER-UX:*:*)
+	echo sxace-nec-superux${UNAME_RELEASE}
+	exit ;;
     Power*:Rhapsody:*:*)
 	echo powerpc-apple-rhapsody${UNAME_RELEASE}
 	exit ;;
@@ -1262,9 +1295,9 @@ EOF
 	    UNAME_PROCESSOR=powerpc
 	fi
 	if test `echo "$UNAME_RELEASE" | sed -e 's/\..*//'` -le 10 ; then
-	    if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then
+	    if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
 		if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \
-		    (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \
+		    (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
 		    grep IS_64BIT_ARCH >/dev/null
 		then
 		    case $UNAME_PROCESSOR in
@@ -1286,7 +1319,7 @@ EOF
 	exit ;;
     *:procnto*:*:* | *:QNX:[0123456789]*:*)
 	UNAME_PROCESSOR=`uname -p`
-	if test "$UNAME_PROCESSOR" = "x86"; then
+	if test "$UNAME_PROCESSOR" = x86; then
 		UNAME_PROCESSOR=i386
 		UNAME_MACHINE=pc
 	fi
@@ -1317,7 +1350,7 @@ EOF
 	# "uname -m" is not consistent, so use $cputype instead. 386
 	# is converted to i386 for consistency with other x86
 	# operating systems.
-	if test "$cputype" = "386"; then
+	if test "$cputype" = 386; then
 	    UNAME_MACHINE=i386
 	else
 	    UNAME_MACHINE="$cputype"
@@ -1359,7 +1392,7 @@ EOF
 	echo i386-pc-xenix
 	exit ;;
     i*86:skyos:*:*)
-	echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE}` | sed -e 's/ .*$//'
+	echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE} | sed -e 's/ .*$//'`
 	exit ;;
     i*86:rdos:*:*)
 	echo ${UNAME_MACHINE}-pc-rdos
@@ -1370,23 +1403,25 @@ EOF
     x86_64:VMkernel:*:*)
 	echo ${UNAME_MACHINE}-unknown-esx
 	exit ;;
+    amd64:Isilon\ OneFS:*:*)
+	echo x86_64-unknown-onefs
+	exit ;;
 esac
 
 cat >&2 <<EOF
 $0: unable to guess system type
 
-This script, last modified $timestamp, has failed to recognize
-the operating system you are using. It is advised that you
-download the most up to date version of the config scripts from
+This script (version $timestamp), has failed to recognize the
+operating system you are using. If your script is old, overwrite
+config.guess and config.sub with the latest versions from:
 
-  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess
 and
-  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD
+  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub
 
-If the version you run ($0) is already up to date, please
-send the following data and any information you think might be
-pertinent to <config-patches at gnu.org> in order to provide the needed
-information to handle your system.
+If $0 has already been updated, send the following data and any
+information you think might be pertinent to config-patches at gnu.org to
+provide the necessary information to handle your system.
 
 config.guess timestamp = $timestamp
 
diff --git a/build-aux/config.sub b/build-aux/config.sub
index 6d2e94c..9feb73b 100755
--- a/build-aux/config.sub
+++ b/build-aux/config.sub
@@ -1,8 +1,8 @@
 #! /bin/sh
 # Configuration validation subroutine script.
-#   Copyright 1992-2015 Free Software Foundation, Inc.
+#   Copyright 1992-2016 Free Software Foundation, Inc.
 
-timestamp='2015-01-01'
+timestamp='2016-06-20'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -33,7 +33,7 @@ timestamp='2015-01-01'
 # Otherwise, we print the canonical config type on stdout and succeed.
 
 # You can get the latest version of this script from:
-# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD
+# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub
 
 # This file is supposed to be the same for all GNU packages
 # and recognize all the CPU types, system types and aliases
@@ -53,8 +53,7 @@ timestamp='2015-01-01'
 me=`echo "$0" | sed -e 's,.*/,,'`
 
 usage="\
-Usage: $0 [OPTION] CPU-MFR-OPSYS
-       $0 [OPTION] ALIAS
+Usage: $0 [OPTION] CPU-MFR-OPSYS or ALIAS
 
 Canonicalize a configuration name.
 
@@ -68,7 +67,7 @@ Report bugs and patches to <config-patches at gnu.org>."
 version="\
 GNU config.sub ($timestamp)
 
-Copyright 1992-2015 Free Software Foundation, Inc.
+Copyright 1992-2016 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -117,7 +116,7 @@ maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'`
 case $maybe_os in
   nto-qnx* | linux-gnu* | linux-android* | linux-dietlibc | linux-newlib* | \
   linux-musl* | linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | kfreebsd*-gnu* | \
-  knetbsd*-gnu* | netbsd*-gnu* | \
+  knetbsd*-gnu* | netbsd*-gnu* | netbsd*-eabi* | \
   kopensolaris*-gnu* | \
   storm-chaos* | os2-emx* | rtmk-nova*)
     os=-$maybe_os
@@ -255,11 +254,12 @@ case $basic_machine in
 	| arc | arceb \
 	| arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \
 	| avr | avr32 \
+	| ba \
 	| be32 | be64 \
 	| bfin \
 	| c4x | c8051 | clipper \
 	| d10v | d30v | dlx | dsp16xx \
-	| epiphany \
+	| e2k | epiphany \
 	| fido | fr30 | frv | ft32 \
 	| h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \
 	| hexagon \
@@ -305,7 +305,7 @@ case $basic_machine in
 	| riscv32 | riscv64 \
 	| rl78 | rx \
 	| score \
-	| sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[34]eb | sheb | shbe | shle | sh[1234]le | sh3ele \
+	| sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[234]eb | sheb | shbe | shle | sh[1234]le | sh3ele \
 	| sh64 | sh64le \
 	| sparc | sparc64 | sparc64b | sparc64v | sparc86x | sparclet | sparclite \
 	| sparcv8 | sparcv9 | sparcv9b | sparcv9v \
@@ -376,12 +376,13 @@ case $basic_machine in
 	| alphapca5[67]-* | alpha64pca5[67]-* | arc-* | arceb-* \
 	| arm-*  | armbe-* | armle-* | armeb-* | armv*-* \
 	| avr-* | avr32-* \
+	| ba-* \
 	| be32-* | be64-* \
 	| bfin-* | bs2000-* \
 	| c[123]* | c30-* | [cjt]90-* | c4x-* \
 	| c8051-* | clipper-* | craynv-* | cydra-* \
 	| d10v-* | d30v-* | dlx-* \
-	| elxsi-* \
+	| e2k-* | elxsi-* \
 	| f30[01]-* | f700-* | fido-* | fr30-* | frv-* | fx80-* \
 	| h8300-* | h8500-* \
 	| hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \
@@ -428,12 +429,13 @@ case $basic_machine in
 	| pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \
 	| powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* \
 	| pyramid-* \
+	| riscv32-* | riscv64-* \
 	| rl78-* | romp-* | rs6000-* | rx-* \
 	| sh-* | sh[1234]-* | sh[24]a-* | sh[24]aeb-* | sh[23]e-* | sh[34]eb-* | sheb-* | shbe-* \
 	| shle-* | sh[1234]le-* | sh3ele-* | sh64-* | sh64le-* \
 	| sparc-* | sparc64-* | sparc64b-* | sparc64v-* | sparc86x-* | sparclet-* \
 	| sparclite-* \
-	| sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx?-* \
+	| sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx*-* \
 	| tahoe-* \
 	| tic30-* | tic4x-* | tic54x-* | tic55x-* | tic6x-* | tic80-* \
 	| tile*-* \
@@ -518,6 +520,9 @@ case $basic_machine in
 		basic_machine=i386-pc
 		os=-aros
 		;;
+	asmjs)
+		basic_machine=asmjs-unknown
+		;;
 	aux)
 		basic_machine=m68k-apple
 		os=-aux
@@ -638,6 +643,14 @@ case $basic_machine in
 		basic_machine=m68k-bull
 		os=-sysv3
 		;;
+	e500v[12])
+		basic_machine=powerpc-unknown
+		os=$os"spe"
+		;;
+	e500v[12]-*)
+		basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'`
+		os=$os"spe"
+		;;
 	ebmon29k)
 		basic_machine=a29k-amd
 		os=-ebmon
@@ -1373,18 +1386,18 @@ case $os in
 	      | -hpux* | -unos* | -osf* | -luna* | -dgux* | -auroraux* | -solaris* \
 	      | -sym* | -kopensolaris* | -plan9* \
 	      | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \
-	      | -aos* | -aros* \
+	      | -aos* | -aros* | -cloudabi* | -sortix* \
 	      | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
 	      | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \
 	      | -hiux* | -386bsd* | -knetbsd* | -mirbsd* | -netbsd* \
-	      | -bitrig* | -openbsd* | -solidbsd* \
+	      | -bitrig* | -openbsd* | -solidbsd* | -libertybsd* \
 	      | -ekkobsd* | -kfreebsd* | -freebsd* | -riscix* | -lynxos* \
 	      | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \
 	      | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \
 	      | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \
 	      | -chorusos* | -chorusrdb* | -cegcc* \
 	      | -cygwin* | -msys* | -pe* | -psos* | -moss* | -proelf* | -rtems* \
-	      | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \
+	      | -midipix* | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \
 	      | -linux-newlib* | -linux-musl* | -linux-uclibc* \
 	      | -uxpv* | -beos* | -mpeix* | -udk* | -moxiebox* \
 	      | -interix* | -uwin* | -mks* | -rhapsody* | -darwin* | -opened* \
@@ -1393,7 +1406,8 @@ case $os in
 	      | -os2* | -vos* | -palmos* | -uclinux* | -nucleus* \
 	      | -morphos* | -superux* | -rtmk* | -rtmk-nova* | -windiss* \
 	      | -powermax* | -dnix* | -nx6 | -nx7 | -sei* | -dragonfly* \
-	      | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* | -tirtos*)
+	      | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* \
+	      | -onefs* | -tirtos* | -phoenix*)
 	# Remember, each alternative MUST END IN *, to match a version number.
 		;;
 	-qnx*)
@@ -1525,6 +1539,8 @@ case $os in
 		;;
 	-nacl*)
 		;;
+	-ios)
+		;;
 	-none)
 		;;
 	*)

-----------------------------------------------------------------------

Summary of changes:
 build-aux/config.guess | 159 ++++++++++++++++++++++++++++++-------------------
 build-aux/config.sub   |  46 +++++++++-----
 2 files changed, 128 insertions(+), 77 deletions(-)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From jussi.kivilinna at iki.fi  Tue Jul 12 11:54:06 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 12 Jul 2016 12:54:06 +0300
Subject: [PATCH 3/3] Add ARMv8/AArch32 Crypto Extension implementation of AES
In-Reply-To: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6>
References: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6>
Message-ID: <146831724662.1076.2155032204748217085.stgit@localhost6.localdomain6>

* cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and
'rijndael-armv-aarch32-ce.S'.
* cipher/rijndael-armv8-aarch32-ce.S: New.
* cipher/rijndael-armv8-ce.c: New.
* cipher/rijndael-internal.h (USE_ARM_CE): New.
(RIJNDAEL_context_s): Add 'use_arm_ce'.
* cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey)
(_gcry_aes_armv8_ce_prepare_decryption)
(_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt)
(_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc)
(_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec)
(_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt)
(_gcry_aes_armv8_ce_ocb_auth): New.
(do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key
setup for ARM CE.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add
ARM CE support.
* configure.ac: Add 'rijndael-armv8-ce.lo' and
'rijndael-armv8-aarch32-ce.lo'.
--

Improvement vs ARM assembly on Cortex-A53:

           AES-128  AES-192  AES-256
CBC enc:   14.8x    12.8x    11.4x
CBC dec:   21.4x    20.5x    19.4x
CFB enc:   16.2x    13.6x    11.6x
CFB dec:   21.6x    20.5x    19.4x
CTR:       19.1x    18.6x    17.8x
OCB enc:   16.0x    16.2x    16.1x
OCB dec:   15.6x    15.9x    15.8x
OCB auth:  18.3x    18.4x    18.0x

Benchmark on Cortex-A53 (1152 Mhz):

Before:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     24.42 ns/B     39.06 MiB/s     28.13 c/B
        ECB dec |     25.07 ns/B     38.05 MiB/s     28.88 c/B
        CBC enc |     21.05 ns/B     45.30 MiB/s     24.25 c/B
        CBC dec |     21.16 ns/B     45.07 MiB/s     24.38 c/B
        CFB enc |     21.05 ns/B     45.31 MiB/s     24.25 c/B
        CFB dec |     21.38 ns/B     44.61 MiB/s     24.62 c/B
        OFB enc |     26.15 ns/B     36.47 MiB/s     30.13 c/B
        OFB dec |     26.15 ns/B     36.47 MiB/s     30.13 c/B
        CTR enc |     21.17 ns/B     45.06 MiB/s     24.38 c/B
        CTR dec |     21.16 ns/B     45.06 MiB/s     24.38 c/B
        CCM enc |     42.32 ns/B     22.53 MiB/s     48.75 c/B
        CCM dec |     42.32 ns/B     22.53 MiB/s     48.75 c/B
       CCM auth |     21.17 ns/B     45.06 MiB/s     24.38 c/B
        GCM enc |     22.08 ns/B     43.19 MiB/s     25.44 c/B
        GCM dec |     22.08 ns/B     43.18 MiB/s     25.44 c/B
       GCM auth |     0.923 ns/B    1032.8 MiB/s      1.06 c/B
        OCB enc |     26.20 ns/B     36.40 MiB/s     30.18 c/B
        OCB dec |     25.97 ns/B     36.73 MiB/s     29.91 c/B
       OCB auth |     24.52 ns/B     38.90 MiB/s     28.24 c/B
                =
 AES192         |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     27.83 ns/B     34.26 MiB/s     32.06 c/B
        ECB dec |     28.54 ns/B     33.42 MiB/s     32.88 c/B
        CBC enc |     24.47 ns/B     38.97 MiB/s     28.19 c/B
        CBC dec |     25.27 ns/B     37.74 MiB/s     29.11 c/B
        CFB enc |     25.08 ns/B     38.02 MiB/s     28.89 c/B
        CFB dec |     25.31 ns/B     37.68 MiB/s     29.16 c/B
        OFB enc |     29.57 ns/B     32.25 MiB/s     34.06 c/B
        OFB dec |     29.57 ns/B     32.25 MiB/s     34.06 c/B
        CTR enc |     25.24 ns/B     37.78 MiB/s     29.08 c/B
        CTR dec |     25.24 ns/B     37.79 MiB/s     29.08 c/B
        CCM enc |     49.81 ns/B     19.15 MiB/s     57.38 c/B
        CCM dec |     49.80 ns/B     19.15 MiB/s     57.37 c/B
       CCM auth |     24.58 ns/B     38.80 MiB/s     28.32 c/B
        GCM enc |     26.15 ns/B     36.47 MiB/s     30.13 c/B
        GCM dec |     26.11 ns/B     36.52 MiB/s     30.08 c/B
       GCM auth |     0.923 ns/B    1033.0 MiB/s      1.06 c/B
        OCB enc |     29.59 ns/B     32.23 MiB/s     34.09 c/B
        OCB dec |     29.42 ns/B     32.42 MiB/s     33.89 c/B
       OCB auth |     27.92 ns/B     34.16 MiB/s     32.16 c/B
                =
 AES256         |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     31.20 ns/B     30.57 MiB/s     35.94 c/B
        ECB dec |     31.80 ns/B     29.99 MiB/s     36.63 c/B
        CBC enc |     27.83 ns/B     34.27 MiB/s     32.06 c/B
        CBC dec |     27.87 ns/B     34.21 MiB/s     32.11 c/B
        CFB enc |     27.88 ns/B     34.20 MiB/s     32.12 c/B
        CFB dec |     28.16 ns/B     33.87 MiB/s     32.44 c/B
        OFB enc |     32.93 ns/B     28.96 MiB/s     37.94 c/B
        OFB dec |     32.93 ns/B     28.96 MiB/s     37.94 c/B
        CTR enc |     27.95 ns/B     34.13 MiB/s     32.19 c/B
        CTR dec |     27.95 ns/B     34.12 MiB/s     32.20 c/B
        CCM enc |     55.88 ns/B     17.07 MiB/s     64.38 c/B
        CCM dec |     55.88 ns/B     17.07 MiB/s     64.38 c/B
       CCM auth |     27.95 ns/B     34.12 MiB/s     32.20 c/B
        GCM enc |     28.86 ns/B     33.05 MiB/s     33.25 c/B
        GCM dec |     28.87 ns/B     33.04 MiB/s     33.25 c/B
       GCM auth |     0.923 ns/B    1033.0 MiB/s      1.06 c/B
        OCB enc |     32.96 ns/B     28.94 MiB/s     37.97 c/B
        OCB dec |     32.73 ns/B     29.14 MiB/s     37.70 c/B
       OCB auth |     31.29 ns/B     30.48 MiB/s     36.04 c/B

After:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      5.10 ns/B     187.0 MiB/s      5.88 c/B
        ECB dec |      5.27 ns/B     181.0 MiB/s      6.07 c/B
        CBC enc |      1.41 ns/B     675.8 MiB/s      1.63 c/B
        CBC dec |     0.992 ns/B     961.7 MiB/s      1.14 c/B
        CFB enc |      1.30 ns/B     732.4 MiB/s      1.50 c/B
        CFB dec |     0.991 ns/B     962.7 MiB/s      1.14 c/B
        OFB enc |      7.05 ns/B     135.2 MiB/s      8.13 c/B
        OFB dec |      7.05 ns/B     135.2 MiB/s      8.13 c/B
        CTR enc |      1.11 ns/B     856.9 MiB/s      1.28 c/B
        CTR dec |      1.11 ns/B     857.0 MiB/s      1.28 c/B
        CCM enc |      2.58 ns/B     369.8 MiB/s      2.97 c/B
        CCM dec |      2.58 ns/B     369.5 MiB/s      2.97 c/B
       CCM auth |      1.58 ns/B     605.2 MiB/s      1.82 c/B
        GCM enc |      2.04 ns/B     467.9 MiB/s      2.35 c/B
        GCM dec |      2.04 ns/B     466.6 MiB/s      2.35 c/B
       GCM auth |     0.923 ns/B    1033.0 MiB/s      1.06 c/B
        OCB enc |      1.64 ns/B     579.8 MiB/s      1.89 c/B
        OCB dec |      1.66 ns/B     574.5 MiB/s      1.91 c/B
       OCB auth |      1.33 ns/B     715.5 MiB/s      1.54 c/B
                =
 AES192         |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      5.64 ns/B     169.0 MiB/s      6.50 c/B
        ECB dec |      5.81 ns/B     164.3 MiB/s      6.69 c/B
        CBC enc |      1.90 ns/B     502.1 MiB/s      2.19 c/B
        CBC dec |      1.24 ns/B     771.7 MiB/s      1.42 c/B
        CFB enc |      1.84 ns/B     517.1 MiB/s      2.12 c/B
        CFB dec |      1.23 ns/B     772.5 MiB/s      1.42 c/B
        OFB enc |      7.60 ns/B     125.5 MiB/s      8.75 c/B
        OFB dec |      7.60 ns/B     125.6 MiB/s      8.75 c/B
        CTR enc |      1.36 ns/B     702.7 MiB/s      1.56 c/B
        CTR dec |      1.36 ns/B     702.5 MiB/s      1.56 c/B
        CCM enc |      3.31 ns/B     287.8 MiB/s      3.82 c/B
        CCM dec |      3.31 ns/B     288.0 MiB/s      3.81 c/B
       CCM auth |      2.06 ns/B     462.1 MiB/s      2.38 c/B
        GCM enc |      2.28 ns/B     418.4 MiB/s      2.63 c/B
        GCM dec |      2.28 ns/B     418.0 MiB/s      2.63 c/B
       GCM auth |     0.923 ns/B    1032.8 MiB/s      1.06 c/B
        OCB enc |      1.83 ns/B     520.1 MiB/s      2.11 c/B
        OCB dec |      1.84 ns/B     517.8 MiB/s      2.12 c/B
       OCB auth |      1.52 ns/B     626.1 MiB/s      1.75 c/B
                =
 AES256         |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      5.86 ns/B     162.7 MiB/s      6.75 c/B
        ECB dec |      6.02 ns/B     158.3 MiB/s      6.94 c/B
        CBC enc |      2.44 ns/B     390.5 MiB/s      2.81 c/B
        CBC dec |      1.45 ns/B     656.4 MiB/s      1.67 c/B
        CFB enc |      2.39 ns/B     399.5 MiB/s      2.75 c/B
        CFB dec |      1.45 ns/B     656.8 MiB/s      1.67 c/B
        OFB enc |      7.81 ns/B     122.1 MiB/s      9.00 c/B
        OFB dec |      7.81 ns/B     122.1 MiB/s      9.00 c/B
        CTR enc |      1.57 ns/B     605.8 MiB/s      1.81 c/B
        CTR dec |      1.57 ns/B     605.9 MiB/s      1.81 c/B
        CCM enc |      4.07 ns/B     234.3 MiB/s      4.69 c/B
        CCM dec |      4.07 ns/B     234.1 MiB/s      4.69 c/B
       CCM auth |      2.61 ns/B     365.7 MiB/s      3.00 c/B
        GCM enc |      2.50 ns/B     381.9 MiB/s      2.88 c/B
        GCM dec |      2.49 ns/B     382.3 MiB/s      2.87 c/B
       GCM auth |     0.926 ns/B    1029.7 MiB/s      1.07 c/B
        OCB enc |      2.05 ns/B     465.6 MiB/s      2.36 c/B
        OCB dec |      2.06 ns/B     462.0 MiB/s      2.38 c/B
       OCB auth |      1.74 ns/B     548.4 MiB/s      2.00 c/B
---
 0 files changed

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 5d69a38..de619fe 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
@@ -81,6 +81,7 @@ md5.c \
 poly1305-sse2-amd64.S poly1305-avx2-amd64.S poly1305-armv7-neon.S \
 rijndael.c rijndael-internal.h rijndael-tables.h rijndael-aesni.c \
   rijndael-padlock.c rijndael-amd64.S rijndael-arm.S rijndael-ssse3-amd64.c \
+  rijndael-armv8-ce.c rijndael-armv8-aarch32-ce.S \
 rmd160.c \
 rsa.c \
 salsa20.c salsa20-amd64.S salsa20-armv7-neon.S \
diff --git a/cipher/rijndael-armv8-aarch32-ce.S b/cipher/rijndael-armv8-aarch32-ce.S
new file mode 100644
index 0000000..f3b5400
--- /dev/null
+++ b/cipher/rijndael-armv8-aarch32-ce.S
@@ -0,0 +1,1483 @@
+/* ARMv8 CE accelerated AES
+ * Copyright (C) 2016 Jussi Kivilinna <jussi.kivilinna at iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+
+.syntax unified
+.fpu crypto-neon-fp-armv8
+.arm
+
+.text
+
+#ifdef __PIC__
+#  define GET_DATA_POINTER(reg, name, rtmp) \
+		ldr reg, 1f; \
+		ldr rtmp, 2f; \
+		b 3f; \
+	1:	.word _GLOBAL_OFFSET_TABLE_-(3f+8); \
+	2:	.word name(GOT); \
+	3:	add reg, pc, reg; \
+		ldr reg, [reg, rtmp];
+#else
+#  define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+#endif
+
+
+/* AES macros */
+
+#define aes_preload_keys(keysched, rekeysched) \
+        vldmia   keysched!, {q5-q7}; \
+        mov      rekeysched, keysched; \
+        vldmialo keysched!, {q8-q15}; /* 128-bit */ \
+        addeq    keysched, #(2*16); \
+        vldmiaeq keysched!, {q10-q15}; /* 192-bit */ \
+        addhi    keysched, #(4*16); \
+        vldmiahi keysched!, {q12-q15}; /* 256-bit */ \
+
+#define do_aes_one128(ed, mcimc, qo, qb) \
+        aes##ed.8    qb, q5; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q6; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q7; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q8; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q9; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q10; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q11; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q12; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q13; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q14; \
+        veor         qo, qb, q15;
+
+#define do_aes_one128re(ed, mcimc, qo, qb, keysched, rekeysched) \
+        vldm         rekeysched, {q8-q9}; \
+        do_aes_one128(ed, mcimc, qo, qb);
+
+#define do_aes_one192(ed, mcimc, qo, qb, keysched, rekeysched) \
+        vldm         rekeysched!, {q8}; \
+        aes##ed.8    qb, q5; \
+        aes##mcimc.8 qb, qb; \
+        vldm         rekeysched, {q9}; \
+        aes##ed.8    qb, q6; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q7; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q8; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       keysched!, {q8}; \
+        aes##ed.8    qb, q9; \
+        aes##mcimc.8 qb, qb; \
+        sub          rekeysched, #(1*16); \
+        aes##ed.8    qb, q10; \
+        aes##mcimc.8 qb, qb; \
+        vldm         keysched, {q9}; \
+        aes##ed.8    qb, q11; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q12; \
+        aes##mcimc.8 qb, qb; \
+        sub          keysched, #16; \
+        aes##ed.8    qb, q13; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q14; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q15; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q8; \
+        veor         qo, qb, q9; \
+
+#define do_aes_one256(ed, mcimc, qo, qb, keysched, rekeysched) \
+        vldmia       rekeysched!, {q8}; \
+        aes##ed.8    qb, q5; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       rekeysched!, {q9}; \
+        aes##ed.8    qb, q6; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       rekeysched!, {q10}; \
+        aes##ed.8    qb, q7; \
+        aes##mcimc.8 qb, qb; \
+        vldm         rekeysched, {q11}; \
+        aes##ed.8    qb, q8; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       keysched!, {q8}; \
+        aes##ed.8    qb, q9; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q10; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       keysched!, {q9}; \
+        aes##ed.8    qb, q11; \
+        aes##mcimc.8 qb, qb; \
+        sub          rekeysched, #(3*16); \
+        aes##ed.8    qb, q12; \
+        aes##mcimc.8 qb, qb; \
+        vldmia       keysched!, {q10}; \
+        aes##ed.8    qb, q13; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q14; \
+        aes##mcimc.8 qb, qb; \
+        vldm         keysched, {q11}; \
+        aes##ed.8    qb, q15; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q8; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q9; \
+        aes##mcimc.8 qb, qb; \
+        aes##ed.8    qb, q10; \
+        veor         qo, qb, q11; \
+        sub          keysched, #(3*16); \
+
+#define aes_round_4(ed, mcimc, b0, b1, b2, b3, key) \
+        aes##ed.8    b0, key; \
+        aes##mcimc.8 b0, b0; \
+          aes##ed.8    b1, key; \
+          aes##mcimc.8 b1, b1; \
+            aes##ed.8    b2, key; \
+            aes##mcimc.8 b2, b2; \
+              aes##ed.8    b3, key; \
+              aes##mcimc.8 b3, b3;
+
+#define do_aes_4_128(ed, mcimc, b0, b1, b2, b3) \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \
+        aes##ed.8    b0, q14; \
+        veor         b0, b0, q15; \
+        aes##ed.8    b1, q14; \
+        veor         b1, b1, q15; \
+        aes##ed.8    b2, q14; \
+        veor         b2, b2, q15; \
+        aes##ed.8    b3, q14; \
+        veor         b3, b3, q15;
+
+#define do_aes_4_128re(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \
+        vldm         rekeysched, {q8-q9}; \
+        do_aes_4_128(ed, mcimc, b0, b1, b2, b3);
+
+#define do_aes_4_192(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \
+        vldm         rekeysched!, {q8}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \
+        vldm         rekeysched, {q9}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \
+        vldmia       keysched!, {q8}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \
+        sub          rekeysched, #(1*16); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \
+        vldm         keysched, {q9}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \
+        sub          keysched, #16; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q14); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q15); \
+        aes##ed.8    b0, q8; \
+        veor         b0, b0, q9; \
+        aes##ed.8    b1, q8; \
+        veor         b1, b1, q9; \
+        aes##ed.8    b2, q8; \
+        veor         b2, b2, q9; \
+        aes##ed.8    b3, q8; \
+        veor         b3, b3, q9;
+
+#define do_aes_4_256(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \
+        vldmia       rekeysched!, {q8}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \
+        vldmia       rekeysched!, {q9}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \
+        vldmia       rekeysched!, {q10}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \
+        vldm         rekeysched, {q11}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \
+        vldmia       keysched!, {q8}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \
+        vldmia       keysched!, {q9}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \
+        sub          rekeysched, #(3*16); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \
+        vldmia       keysched!, {q10}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q14); \
+        vldm         keysched, {q11}; \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q15); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \
+        aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \
+        sub          keysched, #(3*16); \
+        aes##ed.8    b0, q10; \
+        veor         b0, b0, q11; \
+        aes##ed.8    b1, q10; \
+        veor         b1, b1, q11; \
+        aes##ed.8    b2, q10; \
+        veor         b2, b2, q11; \
+        aes##ed.8    b3, q10; \
+        veor         b3, b3, q11;
+
+
+/* Other functional macros */
+
+#define CLEAR_REG(reg) veor reg, reg;
+
+
+/*
+ * unsigned int _gcry_aes_enc_armv8_ce(void *keysched, byte *dst,
+ *                                     const byte *src,
+ *                                     unsigned int nrounds);
+ */
+.align 3
+.globl _gcry_aes_enc_armv8_ce
+.type  _gcry_aes_enc_armv8_ce,%function;
+_gcry_aes_enc_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: dst
+   *    r2: src
+   *    r3: nrounds
+   */
+
+  vldmia r0!, {q1-q3} /* load 3 round keys */
+
+  cmp r3, #12
+
+  vld1.8 {q0}, [r2]
+
+  bhi .Lenc1_256
+  beq .Lenc1_192
+
+.Lenc1_128:
+
+.Lenc1_tail:
+  vldmia r0, {q8-q15} /* load 8 round keys */
+
+  aese.8  q0, q1
+  aesmc.8 q0, q0
+  CLEAR_REG(q1)
+
+  aese.8  q0, q2
+  aesmc.8 q0, q0
+  CLEAR_REG(q2)
+
+  aese.8  q0, q3
+  aesmc.8 q0, q0
+  CLEAR_REG(q3)
+
+  aese.8  q0, q8
+  aesmc.8 q0, q0
+  CLEAR_REG(q8)
+
+  aese.8  q0, q9
+  aesmc.8 q0, q0
+  CLEAR_REG(q9)
+
+  aese.8  q0, q10
+  aesmc.8 q0, q0
+  CLEAR_REG(q10)
+
+  aese.8  q0, q11
+  aesmc.8 q0, q0
+  CLEAR_REG(q11)
+
+  aese.8  q0, q12
+  aesmc.8 q0, q0
+  CLEAR_REG(q12)
+
+  aese.8  q0, q13
+  aesmc.8 q0, q0
+  CLEAR_REG(q13)
+
+  aese.8  q0, q14
+  veor    q0, q15
+  CLEAR_REG(q14)
+  CLEAR_REG(q15)
+
+  vst1.8 {q0}, [r1]
+  CLEAR_REG(q0)
+
+  mov r0, #0
+  bx lr
+
+.Lenc1_192:
+  aese.8  q0, q1
+  aesmc.8 q0, q0
+  vmov q1, q3
+
+  aese.8  q0, q2
+  aesmc.8 q0, q0
+  vldm r0!, {q2-q3} /* load 3 round keys */
+
+  b .Lenc1_tail
+
+.Lenc1_256:
+  vldm r0!, {q15}   /* load 1 round key */
+  aese.8  q0, q1
+  aesmc.8 q0, q0
+
+  aese.8  q0, q2
+  aesmc.8 q0, q0
+
+  aese.8  q0, q3
+  aesmc.8 q0, q0
+  vldm r0!, {q1-q3} /* load 3 round keys */
+
+  aese.8  q0, q15
+  aesmc.8 q0, q0
+
+  b .Lenc1_tail
+.size _gcry_aes_enc_armv8_ce,.-_gcry_aes_enc_armv8_ce;
+
+
+/*
+ * unsigned int _gcry_aes_dec_armv8_ce(void *keysched, byte *dst,
+ *                                     const byte *src,
+ *                                     unsigned int nrounds);
+ */
+.align 3
+.globl _gcry_aes_dec_armv8_ce
+.type  _gcry_aes_dec_armv8_ce,%function;
+_gcry_aes_dec_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: dst
+   *    r2: src
+   *    r3: nrounds
+   */
+
+  vldmia r0!, {q1-q3} /* load 3 round keys */
+
+  cmp r3, #12
+
+  vld1.8 {q0}, [r2]
+
+  bhi .Ldec1_256
+  beq .Ldec1_192
+
+.Ldec1_128:
+
+.Ldec1_tail:
+  vldmia r0, {q8-q15} /* load 8 round keys */
+
+  aesd.8   q0, q1
+  aesimc.8 q0, q0
+  CLEAR_REG(q1)
+
+  aesd.8   q0, q2
+  aesimc.8 q0, q0
+  CLEAR_REG(q2)
+
+  aesd.8   q0, q3
+  aesimc.8 q0, q0
+  CLEAR_REG(q3)
+
+  aesd.8   q0, q8
+  aesimc.8 q0, q0
+  CLEAR_REG(q8)
+
+  aesd.8   q0, q9
+  aesimc.8 q0, q0
+  CLEAR_REG(q9)
+
+  aesd.8   q0, q10
+  aesimc.8 q0, q0
+  CLEAR_REG(q10)
+
+  aesd.8   q0, q11
+  aesimc.8 q0, q0
+  CLEAR_REG(q11)
+
+  aesd.8   q0, q12
+  aesimc.8 q0, q0
+  CLEAR_REG(q12)
+
+  aesd.8   q0, q13
+  aesimc.8 q0, q0
+  CLEAR_REG(q13)
+
+  aesd.8   q0, q14
+  veor     q0, q15
+  CLEAR_REG(q14)
+  CLEAR_REG(q15)
+
+  vst1.8 {q0}, [r1]
+  CLEAR_REG(q0)
+
+  mov r0, #0
+  bx lr
+
+.Ldec1_192:
+  aesd.8   q0, q1
+  aesimc.8 q0, q0
+  vmov q1, q3
+
+  aesd.8   q0, q2
+  aesimc.8 q0, q0
+  vldm r0!, {q2-q3} /* load 3 round keys */
+
+  b .Ldec1_tail
+
+.Ldec1_256:
+  vldm r0!, {q15}   /* load 1 round key */
+  aesd.8   q0, q1
+  aesimc.8 q0, q0
+
+  aesd.8   q0, q2
+  aesimc.8 q0, q0
+
+  aesd.8  q0, q3
+  aesimc.8 q0, q0
+  vldm r0!, {q1-q3} /* load 3 round keys */
+
+  aesd.8   q0, q15
+  aesimc.8 q0, q0
+
+  b .Ldec1_tail
+.size _gcry_aes_dec_armv8_ce,.-_gcry_aes_dec_armv8_ce;
+
+
+/*
+ * void _gcry_aes_cbc_enc_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *iv, size_t nblocks,
+ *                                  int cbc_mac, unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_cbc_enc_armv8_ce
+.type  _gcry_aes_cbc_enc_armv8_ce,%function;
+_gcry_aes_cbc_enc_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: iv
+   *    %st+0: nblocks => r4
+   *    %st+4: cbc_mac => r5
+   *    %st+8: nrounds => r6
+   */
+
+  push {r4-r6,lr} /* 4*4 = 16b */
+  ldr r4, [sp, #(16+0)]
+  ldr r5, [sp, #(16+4)]
+  cmp r4, #0
+  ldr r6, [sp, #(16+8)]
+  beq .Lcbc_enc_skip
+  cmp r5, #0
+  vpush {q4-q7}
+  moveq r5, #16
+  movne r5, #0
+
+  cmp r6, #12
+  vld1.8 {q1}, [r3] /* load IV */
+
+  aes_preload_keys(r0, lr);
+
+  beq .Lcbc_enc_loop192
+  bhi .Lcbc_enc_loop256
+
+#define CBC_ENC(bits, ...) \
+  .Lcbc_enc_loop##bits: \
+    vld1.8 {q0}, [r2]!; /* load plaintext */ \
+    veor q1, q0, q1; \
+    subs r4, r4, #1; \
+    \
+    do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \
+    \
+    vst1.8 {q1}, [r1], r5; /* store ciphertext */ \
+    \
+    bne .Lcbc_enc_loop##bits; \
+    b .Lcbc_enc_done;
+
+  CBC_ENC(128)
+  CBC_ENC(192, r0, lr)
+  CBC_ENC(256, r0, lr)
+
+#undef CBC_ENC
+
+.Lcbc_enc_done:
+  vst1.8 {q1}, [r3] /* store IV */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  vpop {q4-q7}
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+.Lcbc_enc_skip:
+  pop {r4-r6,pc}
+.size _gcry_aes_cbc_enc_armv8_ce,.-_gcry_aes_cbc_enc_armv8_ce;
+
+
+/*
+ * void _gcry_aes_cbc_dec_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *iv, unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_cbc_dec_armv8_ce
+.type  _gcry_aes_cbc_dec_armv8_ce,%function;
+_gcry_aes_cbc_dec_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: iv
+   *    %st+0: nblocks => r4
+   *    %st+4: nrounds => r5
+   */
+
+  push {r4-r6,lr} /* 4*4 = 16b */
+  ldr r4, [sp, #(16+0)]
+  ldr r5, [sp, #(16+4)]
+  cmp r4, #0
+  beq .Lcbc_dec_skip
+  vpush {q4-q7}
+
+  cmp r5, #12
+  vld1.8 {q0}, [r3] /* load IV */
+
+  aes_preload_keys(r0, r6);
+
+  beq .Lcbc_dec_entry_192
+  bhi .Lcbc_dec_entry_256
+
+#define CBC_DEC(bits, ...) \
+  .Lcbc_dec_entry_##bits: \
+    cmp r4, #4; \
+    blo .Lcbc_dec_loop_##bits; \
+    \
+  .Lcbc_dec_loop4_##bits: \
+    \
+    vld1.8 {q1-q2}, [r2]!; /* load ciphertext */ \
+    sub r4, r4, #4; \
+    vld1.8 {q3-q4}, [r2]; /* load ciphertext */ \
+    cmp r4, #4; \
+    sub r2, #32; \
+    \
+    do_aes_4_##bits(d, imc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    veor q1, q1, q0; \
+    vld1.8 {q0}, [r2]!; /* load next IV */ \
+    veor q2, q2, q0; \
+    vld1.8 {q0}, [r2]!; /* load next IV */ \
+    vst1.8 {q1-q2}, [r1]!; /* store plaintext */ \
+    veor q3, q3, q0; \
+    vld1.8 {q0}, [r2]!; /* load next IV */ \
+    veor q4, q4, q0; \
+    vld1.8 {q0}, [r2]!; /* load next IV */ \
+    vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \
+    \
+    bhs .Lcbc_dec_loop4_##bits; \
+    cmp r4, #0; \
+    beq .Lcbc_dec_done; \
+    \
+  .Lcbc_dec_loop_##bits: \
+    vld1.8 {q1}, [r2]!; /* load ciphertext */ \
+    subs r4, r4, #1; \
+    vmov q2, q1; \
+    \
+    do_aes_one##bits(d, imc, q1, q1, ##__VA_ARGS__); \
+    \
+    veor q1, q1, q0; \
+    vmov q0, q2; \
+    vst1.8 {q1}, [r1]!; /* store plaintext */ \
+    \
+    bne .Lcbc_dec_loop_##bits; \
+    b .Lcbc_dec_done;
+
+  CBC_DEC(128)
+  CBC_DEC(192, r0, r6)
+  CBC_DEC(256, r0, r6)
+
+#undef CBC_DEC
+
+.Lcbc_dec_done:
+  vst1.8 {q0}, [r3] /* store IV */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  vpop {q4-q7}
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+.Lcbc_dec_skip:
+  pop {r4-r6,pc}
+.size _gcry_aes_cbc_dec_armv8_ce,.-_gcry_aes_cbc_dec_armv8_ce;
+
+
+/*
+ * void _gcry_aes_cfb_enc_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *iv, unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_cfb_enc_armv8_ce
+.type  _gcry_aes_cfb_enc_armv8_ce,%function;
+_gcry_aes_cfb_enc_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: iv
+   *    %st+0: nblocks => r4
+   *    %st+4: nrounds => r5
+   */
+
+  push {r4-r6,lr} /* 4*4 = 16b */
+  ldr r4, [sp, #(16+0)]
+  ldr r5, [sp, #(16+4)]
+  cmp r4, #0
+  beq .Lcfb_enc_skip
+  vpush {q4-q7}
+
+  cmp r5, #12
+  vld1.8 {q0}, [r3] /* load IV */
+
+  aes_preload_keys(r0, r6);
+
+  beq .Lcfb_enc_entry_192
+  bhi .Lcfb_enc_entry_256
+
+#define CFB_ENC(bits, ...) \
+  .Lcfb_enc_entry_##bits: \
+  .Lcfb_enc_loop_##bits: \
+    vld1.8 {q1}, [r2]!; /* load plaintext */ \
+    subs r4, r4, #1; \
+    \
+    do_aes_one##bits(e, mc, q0, q0, ##__VA_ARGS__); \
+    \
+    veor q0, q1, q0; \
+    vst1.8 {q0}, [r1]!; /* store ciphertext */ \
+    \
+    bne .Lcfb_enc_loop_##bits; \
+    b .Lcfb_enc_done;
+
+  CFB_ENC(128)
+  CFB_ENC(192, r0, r6)
+  CFB_ENC(256, r0, r6)
+
+#undef CFB_ENC
+
+.Lcfb_enc_done:
+  vst1.8 {q0}, [r3] /* store IV */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  vpop {q4-q7}
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+.Lcfb_enc_skip:
+  pop {r4-r6,pc}
+.size _gcry_aes_cfb_enc_armv8_ce,.-_gcry_aes_cfb_enc_armv8_ce;
+
+
+/*
+ * void _gcry_aes_cfb_dec_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *iv, unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_cfb_dec_armv8_ce
+.type  _gcry_aes_cfb_dec_armv8_ce,%function;
+_gcry_aes_cfb_dec_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: iv
+   *    %st+0: nblocks => r4
+   *    %st+4: nrounds => r5
+   */
+
+  push {r4-r6,lr} /* 4*4 = 16b */
+  ldr r4, [sp, #(16+0)]
+  ldr r5, [sp, #(16+4)]
+  cmp r4, #0
+  beq .Lcfb_dec_skip
+  vpush {q4-q7}
+
+  cmp r5, #12
+  vld1.8 {q0}, [r3] /* load IV */
+
+  aes_preload_keys(r0, r6);
+
+  beq .Lcfb_dec_entry_192
+  bhi .Lcfb_dec_entry_256
+
+#define CFB_DEC(bits, ...) \
+  .Lcfb_dec_entry_##bits: \
+    cmp r4, #4; \
+    blo .Lcfb_dec_loop_##bits; \
+    \
+  .Lcfb_dec_loop4_##bits: \
+    \
+    vld1.8 {q2-q3}, [r2]!; /* load ciphertext */ \
+    vmov q1, q0; \
+    sub r4, r4, #4; \
+    vld1.8 {q4}, [r2]; /* load ciphertext */ \
+    sub r2, #32; \
+    cmp r4, #4; \
+    \
+    do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    vld1.8 {q0}, [r2]!; /* load ciphertext */ \
+    veor q1, q1, q0; \
+    vld1.8 {q0}, [r2]!; /* load ciphertext */ \
+    veor q2, q2, q0; \
+    vst1.8 {q1-q2}, [r1]!; /* store plaintext */ \
+    vld1.8 {q0}, [r2]!; \
+    veor q3, q3, q0; \
+    vld1.8 {q0}, [r2]!; /* load next IV / ciphertext */ \
+    veor q4, q4, q0; \
+    vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \
+    \
+    bhs .Lcfb_dec_loop4_##bits; \
+    cmp r4, #0; \
+    beq .Lcfb_dec_done; \
+    \
+  .Lcfb_dec_loop_##bits: \
+    \
+    vld1.8 {q1}, [r2]!; /* load ciphertext */ \
+    \
+    subs r4, r4, #1; \
+    \
+    do_aes_one##bits(e, mc, q0, q0, ##__VA_ARGS__); \
+    \
+    veor q2, q1, q0; \
+    vmov q0, q1; \
+    vst1.8 {q2}, [r1]!; /* store plaintext */ \
+    \
+    bne .Lcfb_dec_loop_##bits; \
+    b .Lcfb_dec_done;
+
+  CFB_DEC(128)
+  CFB_DEC(192, r0, r6)
+  CFB_DEC(256, r0, r6)
+
+#undef CFB_DEC
+
+.Lcfb_dec_done:
+  vst1.8 {q0}, [r3] /* store IV */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  vpop {q4-q7}
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+.Lcfb_dec_skip:
+  pop {r4-r6,pc}
+.size _gcry_aes_cfb_dec_armv8_ce,.-_gcry_aes_cfb_dec_armv8_ce;
+
+
+/*
+ * void _gcry_aes_ctr_enc_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *iv, unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_ctr_enc_armv8_ce
+.type  _gcry_aes_ctr_enc_armv8_ce,%function;
+_gcry_aes_ctr_enc_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: iv
+   *    %st+0: nblocks => r4
+   *    %st+4: nrounds => r5
+   */
+
+  vpush {q4-q7}
+  push {r4-r12,lr} /* 4*16 + 4*10 = 104b */
+  ldr r4, [sp, #(104+0)]
+  ldr r5, [sp, #(104+4)]
+  cmp r4, #0
+  beq .Lctr_enc_skip
+
+  cmp r5, #12
+  ldm r3, {r7-r10}
+  vld1.8 {q0}, [r3] /* load IV */
+  rev r7, r7
+  rev r8, r8
+  rev r9, r9
+  rev r10, r10
+
+  aes_preload_keys(r0, r6);
+
+  beq .Lctr_enc_entry_192
+  bhi .Lctr_enc_entry_256
+
+#define CTR_ENC(bits, ...) \
+  .Lctr_enc_entry_##bits: \
+    cmp r4, #4; \
+    blo .Lctr_enc_loop_##bits; \
+    \
+  .Lctr_enc_loop4_##bits: \
+    cmp r10, #0xfffffffc; \
+    sub r4, r4, #4; \
+    blo .Lctr_enc_loop4_##bits##_nocarry; \
+    cmp r9, #0xffffffff; \
+    bne .Lctr_enc_loop4_##bits##_nocarry; \
+    \
+    adds r10, #1; \
+    vmov q1, q0; \
+    blcs .Lctr_overflow_one; \
+    rev r11, r10; \
+    vmov.32 d1[1], r11; \
+    \
+    adds r10, #1; \
+    vmov q2, q0; \
+    blcs .Lctr_overflow_one; \
+    rev r11, r10; \
+    vmov.32 d1[1], r11; \
+    \
+    adds r10, #1; \
+    vmov q3, q0; \
+    blcs .Lctr_overflow_one; \
+    rev r11, r10; \
+    vmov.32 d1[1], r11; \
+    \
+    adds r10, #1; \
+    vmov q4, q0; \
+    blcs .Lctr_overflow_one; \
+    rev r11, r10; \
+    vmov.32 d1[1], r11; \
+    \
+    b .Lctr_enc_loop4_##bits##_store_ctr; \
+    \
+  .Lctr_enc_loop4_##bits##_nocarry: \
+    \
+    veor q2, q2; \
+    vrev64.8 q1, q0; \
+    vceq.u32 d5, d5; \
+    vadd.u64 q3, q2, q2; \
+    vadd.u64 q4, q3, q2; \
+    vadd.u64 q0, q3, q3; \
+    vsub.u64 q2, q1, q2; \
+    vsub.u64 q3, q1, q3; \
+    vsub.u64 q4, q1, q4; \
+    vsub.u64 q0, q1, q0; \
+    vrev64.8 q1, q1; \
+    vrev64.8 q2, q2; \
+    vrev64.8 q3, q3; \
+    vrev64.8 q0, q0; \
+    vrev64.8 q4, q4; \
+    add r10, #4; \
+    \
+  .Lctr_enc_loop4_##bits##_store_ctr: \
+    \
+    vst1.8 {q0}, [r3]; \
+    cmp r4, #4; \
+    vld1.8 {q0}, [r2]!; /* load ciphertext */ \
+    \
+    do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    veor q1, q1, q0; \
+    vld1.8 {q0}, [r2]!; /* load ciphertext */ \
+    vst1.8 {q1}, [r1]!; /* store plaintext */ \
+    vld1.8 {q1}, [r2]!; /* load ciphertext */ \
+    veor q2, q2, q0; \
+    veor q3, q3, q1; \
+    vld1.8 {q0}, [r2]!; /* load ciphertext */ \
+    vst1.8 {q2}, [r1]!; /* store plaintext */ \
+    veor q4, q4, q0; \
+    vld1.8 {q0}, [r3]; /* reload IV */ \
+    vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \
+    \
+    bhs .Lctr_enc_loop4_##bits; \
+    cmp r4, #0; \
+    beq .Lctr_enc_done; \
+    \
+  .Lctr_enc_loop_##bits: \
+    \
+    adds r10, #1; \
+    vmov q1, q0; \
+    blcs .Lctr_overflow_one; \
+    rev r11, r10; \
+    subs r4, r4, #1; \
+    vld1.8 {q2}, [r2]!; /* load ciphertext */ \
+    vmov.32 d1[1], r11; \
+    \
+    do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \
+    \
+    veor q1, q2, q1; \
+    vst1.8 {q1}, [r1]!; /* store plaintext */ \
+    \
+    bne .Lctr_enc_loop_##bits; \
+    b .Lctr_enc_done;
+
+  CTR_ENC(128)
+  CTR_ENC(192, r0, r6)
+  CTR_ENC(256, r0, r6)
+
+#undef CTR_ENC
+
+.Lctr_enc_done:
+  vst1.8 {q0}, [r3] /* store IV */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+.Lctr_enc_skip:
+  pop {r4-r12,lr}
+  vpop {q4-q7}
+  bx lr
+
+.Lctr_overflow_one:
+  adcs r9, #0
+  adcs r8, #0
+  adc r7, #0
+  rev r11, r9
+  rev r12, r8
+  vmov.32 d1[0], r11
+  rev r11, r7
+  vmov.32 d0[1], r12
+  vmov.32 d0[0], r11
+  bx lr
+.size _gcry_aes_ctr_enc_armv8_ce,.-_gcry_aes_ctr_enc_armv8_ce;
+
+
+/*
+ * void _gcry_aes_ocb_enc_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *offset,
+ *                                  unsigned char *checksum,
+ *                                  void **Ls,
+ *                                  size_t nblocks,
+ *                                  unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_ocb_enc_armv8_ce
+.type  _gcry_aes_ocb_enc_armv8_ce,%function;
+_gcry_aes_ocb_enc_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: offset
+   *    %st+0: checksum => r4
+   *    %st+4: Ls => r5
+   *    %st+8: nblocks => r6  (0 < nblocks <= 32)
+   *    %st+12: nrounds => r7
+   */
+
+  vpush {q4-q7}
+  push {r4-r12,lr} /* 4*16 + 4*10 = 104b */
+  ldr r7, [sp, #(104+12)]
+  ldr r4, [sp, #(104+0)]
+  ldr r5, [sp, #(104+4)]
+  ldr r6, [sp, #(104+8)]
+
+  cmp r7, #12
+  vld1.8 {q0}, [r3] /* load offset */
+
+  aes_preload_keys(r0, r12);
+
+  beq .Locb_enc_entry_192
+  bhi .Locb_enc_entry_256
+
+#define OCB_ENC(bits, ...) \
+  .Locb_enc_entry_##bits: \
+    cmp r6, #4; \
+    blo .Locb_enc_loop_##bits; \
+    \
+  .Locb_enc_loop4_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* Checksum_i = Checksum_{i-1} xor P_i  */ \
+    /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i)  */ \
+    \
+    ldm r5!, {r8, r9, r10, r11}; \
+    sub r6, #4; \
+    \
+    vld1.8 {q9}, [r8];     /* load L_{ntz(i+0)} */ \
+    vld1.8 {q1-q2}, [r2]!; /* load P_i+<0-1> */ \
+    vld1.8 {q8}, [r4];     /* load Checksum_{i-1} */ \
+    veor q0, q0, q9;       /* Offset_i+0 */ \
+    vld1.8 {q9}, [r9];     /* load L_{ntz(i+1)} */ \
+    veor q8, q8, q1;       /* Checksum_i+0 */ \
+    veor q1, q1, q0;       /* P_i+0 xor Offset_i+0 */\
+    vld1.8 {q3-q4}, [r2]!; /* load P_i+<2-3> */ \
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+0 */\
+    veor q0, q0, q9;       /* Offset_i+1 */ \
+    vld1.8 {q9}, [r10];    /* load L_{ntz(i+2)} */ \
+    veor q8, q8, q2;       /* Checksum_i+1 */ \
+    veor q2, q2, q0;       /* P_i+1 xor Offset_i+1 */\
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+1 */\
+    veor q0, q0, q9;       /* Offset_i+2 */ \
+    vld1.8 {q9}, [r11];    /* load L_{ntz(i+3)} */ \
+    veor q8, q8, q3;       /* Checksum_i+2 */ \
+    veor q3, q3, q0;       /* P_i+2 xor Offset_i+2 */\
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+2 */\
+    veor q0, q0, q9;       /* Offset_i+3 */ \
+    veor q8, q8, q4;       /* Checksum_i+3 */ \
+    veor q4, q4, q0;       /* P_i+3 xor Offset_i+3 */\
+    vst1.8 {q0}, [r1];     /* store Offset_i+3 */\
+    sub r1, #(3*16); \
+    vst1.8 {q8}, [r4];     /* store Checksum_i+3 */\
+    \
+    cmp r6, #4; \
+    \
+    do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    mov r8, r1; \
+    vld1.8 {q8-q9}, [r1]!; \
+    veor q1, q1, q8; \
+    veor q2, q2, q9; \
+    vld1.8 {q8-q9}, [r1]!; \
+    vst1.8 {q1-q2}, [r8]!; \
+    veor q3, q3, q8; \
+    veor q4, q4, q9; \
+    vst1.8 {q3-q4}, [r8]; \
+    \
+    bhs .Locb_enc_loop4_##bits; \
+    cmp r6, #0; \
+    beq .Locb_enc_done; \
+    \
+  .Locb_enc_loop_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* Checksum_i = Checksum_{i-1} xor P_i  */ \
+    /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i)  */ \
+    \
+    ldr r8, [r5], #4; \
+    vld1.8 {q1}, [r2]!; /* load plaintext */ \
+    vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \
+    vld1.8 {q3}, [r4]; /* load checksum */ \
+    subs r6, #1; \
+    veor q0, q0, q2; \
+    veor q3, q3, q1; \
+    veor q1, q1, q0; \
+    vst1.8 {q3}, [r4]; /* store checksum */ \
+    \
+    do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \
+    \
+    veor q1, q1, q0; \
+    vst1.8 {q1}, [r1]!; /* store ciphertext */ \
+    \
+    bne .Locb_enc_loop_##bits; \
+    b .Locb_enc_done;
+
+  OCB_ENC(128re, r0, r12)
+  OCB_ENC(192, r0, r12)
+  OCB_ENC(256, r0, r12)
+
+#undef OCB_ENC
+
+.Locb_enc_done:
+  vst1.8 {q0}, [r3] /* store offset */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+  pop {r4-r12,lr}
+  vpop {q4-q7}
+  bx lr
+.size _gcry_aes_ocb_enc_armv8_ce,.-_gcry_aes_ocb_enc_armv8_ce;
+
+
+/*
+ * void _gcry_aes_ocb_dec_armv8_ce (const void *keysched,
+ *                                  unsigned char *outbuf,
+ *                                  const unsigned char *inbuf,
+ *                                  unsigned char *offset,
+ *                                  unsigned char *checksum,
+ *                                  void **Ls,
+ *                                  size_t nblocks,
+ *                                  unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_ocb_dec_armv8_ce
+.type  _gcry_aes_ocb_dec_armv8_ce,%function;
+_gcry_aes_ocb_dec_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: outbuf
+   *    r2: inbuf
+   *    r3: offset
+   *    %st+0: checksum => r4
+   *    %st+4: Ls => r5
+   *    %st+8: nblocks => r6  (0 < nblocks <= 32)
+   *    %st+12: nrounds => r7
+   */
+
+  vpush {q4-q7}
+  push {r4-r12,lr} /* 4*16 + 4*10 = 104b */
+  ldr r7, [sp, #(104+12)]
+  ldr r4, [sp, #(104+0)]
+  ldr r5, [sp, #(104+4)]
+  ldr r6, [sp, #(104+8)]
+
+  cmp r7, #12
+  vld1.8 {q0}, [r3] /* load offset */
+
+  aes_preload_keys(r0, r12);
+
+  beq .Locb_dec_entry_192
+  bhi .Locb_dec_entry_256
+
+#define OCB_DEC(bits, ...) \
+  .Locb_dec_entry_##bits: \
+    cmp r6, #4; \
+    blo .Locb_dec_loop_##bits; \
+    \
+  .Locb_dec_loop4_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* P_i = Offset_i xor DECIPHER(K, C_i xor Offset_i)  */ \
+    /* Checksum_i = Checksum_{i-1} xor P_i  */ \
+    \
+    ldm r5!, {r8, r9, r10, r11}; \
+    sub r6, #4; \
+    \
+    vld1.8 {q9}, [r8];     /* load L_{ntz(i+0)} */ \
+    vld1.8 {q1-q2}, [r2]!; /* load P_i+<0-1> */ \
+    veor q0, q0, q9;       /* Offset_i+0 */ \
+    vld1.8 {q9}, [r9];     /* load L_{ntz(i+1)} */ \
+    veor q1, q1, q0;       /* P_i+0 xor Offset_i+0 */\
+    vld1.8 {q3-q4}, [r2]!; /* load P_i+<2-3> */ \
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+0 */\
+    veor q0, q0, q9;       /* Offset_i+1 */ \
+    vld1.8 {q9}, [r10];    /* load L_{ntz(i+2)} */ \
+    veor q2, q2, q0;       /* P_i+1 xor Offset_i+1 */\
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+1 */\
+    veor q0, q0, q9;       /* Offset_i+2 */ \
+    vld1.8 {q9}, [r11];    /* load L_{ntz(i+3)} */ \
+    veor q3, q3, q0;       /* P_i+2 xor Offset_i+2 */\
+    vst1.8 {q0}, [r1]!;    /* store Offset_i+2 */\
+    veor q0, q0, q9;       /* Offset_i+3 */ \
+    veor q4, q4, q0;       /* P_i+3 xor Offset_i+3 */\
+    vst1.8 {q0}, [r1];     /* store Offset_i+3 */\
+    sub r1, #(3*16); \
+    \
+    cmp r6, #4; \
+    \
+    do_aes_4_##bits(d, imc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    mov r8, r1; \
+    vld1.8 {q8-q9}, [r1]!; \
+    veor q1, q1, q8; \
+    veor q2, q2, q9; \
+    vld1.8 {q8-q9}, [r1]!; \
+    vst1.8 {q1-q2}, [r8]!; \
+    veor q1, q1, q2; \
+    vld1.8 {q2}, [r4];     /* load Checksum_{i-1} */ \
+    veor q3, q3, q8; \
+    veor q1, q1, q3; \
+    veor q4, q4, q9; \
+    veor q1, q1, q4; \
+    vst1.8 {q3-q4}, [r8]; \
+    veor q2, q2, q1; \
+    vst1.8 {q2}, [r4];     /* store Checksum_i+3 */ \
+    \
+    bhs .Locb_dec_loop4_##bits; \
+    cmp r6, #0; \
+    beq .Locb_dec_done; \
+    \
+  .Locb_dec_loop_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* P_i = Offset_i xor DECIPHER(K, C_i xor Offset_i)  */ \
+    /* Checksum_i = Checksum_{i-1} xor P_i  */ \
+    \
+    ldr r8, [r5], #4; \
+    vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \
+    vld1.8 {q1}, [r2]!; /* load ciphertext */ \
+    subs r6, #1; \
+    veor q0, q0, q2; \
+    veor q1, q1, q0; \
+    \
+    do_aes_one##bits(d, imc, q1, q1, ##__VA_ARGS__) \
+    \
+    vld1.8 {q2}, [r4]; /* load checksum */ \
+    veor q1, q1, q0; \
+    vst1.8 {q1}, [r1]!; /* store plaintext */ \
+    veor q2, q2, q1; \
+    vst1.8 {q2}, [r4]; /* store checksum */ \
+    \
+    bne .Locb_dec_loop_##bits; \
+    b .Locb_dec_done;
+
+  OCB_DEC(128re, r0, r12)
+  OCB_DEC(192, r0, r12)
+  OCB_DEC(256, r0, r12)
+
+#undef OCB_DEC
+
+.Locb_dec_done:
+  vst1.8 {q0}, [r3] /* store offset */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+  pop {r4-r12,lr}
+  vpop {q4-q7}
+  bx lr
+.size _gcry_aes_ocb_dec_armv8_ce,.-_gcry_aes_ocb_dec_armv8_ce;
+
+
+/*
+ * void _gcry_aes_ocb_auth_armv8_ce (const void *keysched,
+ *                                   const unsigned char *abuf,
+ *                                   unsigned char *offset,
+ *                                   unsigned char *checksum,
+ *                                   void **Ls,
+ *                                   size_t nblocks,
+ *                                   unsigned int nrounds);
+ */
+
+.align 3
+.globl _gcry_aes_ocb_auth_armv8_ce
+.type  _gcry_aes_ocb_auth_armv8_ce,%function;
+_gcry_aes_ocb_auth_armv8_ce:
+  /* input:
+   *    r0: keysched
+   *    r1: abuf
+   *    r2: offset
+   *    r3: checksum
+   *    %st+0: Ls => r5
+   *    %st+4: nblocks => r6  (0 < nblocks <= 32)
+   *    %st+8: nrounds => r7
+   */
+
+  vpush {q4-q7}
+  push {r4-r12,lr} /* 4*16 + 4*10 = 104b */
+  ldr r7, [sp, #(104+8)]
+  ldr r5, [sp, #(104+0)]
+  ldr r6, [sp, #(104+4)]
+
+  cmp r7, #12
+  vld1.8 {q0}, [r2] /* load offset */
+
+  aes_preload_keys(r0, r12);
+
+  beq .Locb_auth_entry_192
+  bhi .Locb_auth_entry_256
+
+#define OCB_AUTH(bits, ...) \
+  .Locb_auth_entry_##bits: \
+    cmp r6, #4; \
+    blo .Locb_auth_loop_##bits; \
+    \
+  .Locb_auth_loop4_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i)  */ \
+    \
+    ldm r5!, {r8, r9, r10, r11}; \
+    sub r6, #4; \
+    \
+    vld1.8 {q9}, [r8];     /* load L_{ntz(i+0)} */ \
+    vld1.8 {q1-q2}, [r1]!; /* load A_i+<0-1> */ \
+    veor q0, q0, q9;       /* Offset_i+0 */ \
+    vld1.8 {q9}, [r9];     /* load L_{ntz(i+1)} */ \
+    veor q1, q1, q0;       /* A_i+0 xor Offset_i+0 */\
+    vld1.8 {q3-q4}, [r1]!; /* load A_i+<2-3> */ \
+    veor q0, q0, q9;       /* Offset_i+1 */ \
+    vld1.8 {q9}, [r10];    /* load L_{ntz(i+2)} */ \
+    veor q2, q2, q0;       /* A_i+1 xor Offset_i+1 */\
+    veor q0, q0, q9;       /* Offset_i+2 */ \
+    vld1.8 {q9}, [r11];    /* load L_{ntz(i+3)} */ \
+    veor q3, q3, q0;       /* A_i+2 xor Offset_i+2 */\
+    veor q0, q0, q9;       /* Offset_i+3 */ \
+    veor q4, q4, q0;       /* A_i+3 xor Offset_i+3 */\
+    \
+    cmp r6, #4; \
+    \
+    do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \
+    \
+    veor q1, q1, q2; \
+    veor q3, q3, q4; \
+    vld1.8 {q2}, [r3]; \
+    veor q1, q1, q3; \
+    veor q2, q2, q1; \
+    vst1.8 {q2}, [r3]; \
+    \
+    bhs .Locb_auth_loop4_##bits; \
+    cmp r6, #0; \
+    beq .Locb_auth_done; \
+    \
+  .Locb_auth_loop_##bits: \
+    \
+    /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \
+    /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i)  */ \
+    \
+    ldr r8, [r5], #4; \
+    vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \
+    vld1.8 {q1}, [r1]!; /* load aadtext */ \
+    subs r6, #1; \
+    veor q0, q0, q2; \
+    vld1.8 {q2}, [r3]; /* load checksum */ \
+    veor q1, q1, q0; \
+    \
+    do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__) \
+    \
+    veor q2, q2, q1; \
+    vst1.8 {q2}, [r3]; /* store checksum */ \
+    \
+    bne .Locb_auth_loop_##bits; \
+    b .Locb_auth_done;
+
+  OCB_AUTH(128re, r0, r12)
+  OCB_AUTH(192, r0, r12)
+  OCB_AUTH(256, r0, r12)
+
+#undef OCB_AUTH
+
+.Locb_auth_done:
+  vst1.8 {q0}, [r2] /* store offset */
+
+  CLEAR_REG(q0)
+  CLEAR_REG(q1)
+  CLEAR_REG(q2)
+  CLEAR_REG(q3)
+  CLEAR_REG(q8)
+  CLEAR_REG(q9)
+  CLEAR_REG(q10)
+  CLEAR_REG(q11)
+  CLEAR_REG(q12)
+  CLEAR_REG(q13)
+  CLEAR_REG(q14)
+
+  pop {r4-r12,lr}
+  vpop {q4-q7}
+  bx lr
+.size _gcry_aes_ocb_auth_armv8_ce,.-_gcry_aes_ocb_auth_armv8_ce;
+
+
+/*
+ * u32 _gcry_aes_sbox4_armv8_ce(u32 in4b);
+ */
+.align 3
+.globl _gcry_aes_sbox4_armv8_ce
+.type  _gcry_aes_sbox4_armv8_ce,%function;
+_gcry_aes_sbox4_armv8_ce:
+  /* See "Gouv?a, C. P. L. & L?pez, J. Implementing GCM on ARMv8. Topics in
+   * Cryptology ? CT-RSA 2015" for details.
+   */
+  vmov.i8 q0, #0x52
+  vmov.i8 q1, #0
+  vmov s0, r0
+  aese.8 q0, q1
+  veor d0, d1
+  vpadd.i32 d0, d0, d1
+  vmov r0, s0
+  CLEAR_REG(q0)
+  bx lr
+.size _gcry_aes_sbox4_armv8_ce,.-_gcry_aes_sbox4_armv8_ce;
+
+
+/*
+ * void _gcry_aes_invmixcol_armv8_ce(void *dst, const void *src);
+ */
+.align 3
+.globl _gcry_aes_invmixcol_armv8_ce
+.type  _gcry_aes_invmixcol_armv8_ce,%function;
+_gcry_aes_invmixcol_armv8_ce:
+  vld1.8 {q0}, [r1]
+  aesimc.8 q0, q0
+  vst1.8 {q0}, [r0]
+  CLEAR_REG(q0)
+  bx lr
+.size _gcry_aes_invmixcol_armv8_ce,.-_gcry_aes_invmixcol_armv8_ce;
+
+#endif
diff --git a/cipher/rijndael-armv8-ce.c b/cipher/rijndael-armv8-ce.c
new file mode 100644
index 0000000..bed4066
--- /dev/null
+++ b/cipher/rijndael-armv8-ce.c
@@ -0,0 +1,469 @@
+/* ARMv8 Crypto Extension AES for Libgcrypt
+ * Copyright (C) 2016 Jussi Kivilinna <jussi.kivilinna at iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include <config.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h> /* for memcmp() */
+
+#include "types.h"  /* for byte and u32 typedefs */
+#include "g10lib.h"
+#include "cipher.h"
+#include "bufhelp.h"
+#include "cipher-selftest.h"
+#include "rijndael-internal.h"
+#include "./cipher-internal.h"
+
+
+#ifdef USE_ARM_CE
+
+
+typedef struct u128_s { u32 a, b, c, d; } u128_t;
+
+extern u32 _gcry_aes_sbox4_armv8_ce(u32 in4b);
+extern void _gcry_aes_invmixcol_armv8_ce(u128_t *dst, const u128_t *src);
+
+extern unsigned int _gcry_aes_enc_armv8_ce(const void *keysched, byte *dst,
+                                           const byte *src,
+                                           unsigned int nrounds);
+extern unsigned int _gcry_aes_dec_armv8_ce(const void *keysched, byte *dst,
+                                           const byte *src,
+                                           unsigned int nrounds);
+
+extern void _gcry_aes_cbc_enc_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        int cbc_mac, unsigned int nrounds);
+extern void _gcry_aes_cbc_dec_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        unsigned int nrounds);
+
+extern void _gcry_aes_cfb_enc_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        unsigned int nrounds);
+extern void _gcry_aes_cfb_dec_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        unsigned int nrounds);
+
+extern void _gcry_aes_ctr_enc_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        unsigned int nrounds);
+
+extern void _gcry_aes_ocb_enc_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *offset,
+                                        unsigned char *checksum,
+                                        void **Ls,
+                                        size_t nblocks,
+                                        unsigned int nrounds);
+extern void _gcry_aes_ocb_dec_armv8_ce (const void *keysched,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *offset,
+                                        unsigned char *checksum,
+                                        void **Ls,
+                                        size_t nblocks,
+                                        unsigned int nrounds);
+extern void _gcry_aes_ocb_auth_armv8_ce (const void *keysched,
+                                         const unsigned char *abuf,
+                                         unsigned char *offset,
+                                         unsigned char *checksum,
+                                         void **Ls,
+                                         size_t nblocks,
+                                         unsigned int nrounds);
+
+typedef void (*ocb_crypt_fn_t) (const void *keysched, unsigned char *outbuf,
+                                const unsigned char *inbuf,
+                                unsigned char *offset, unsigned char *checksum,
+                                void **Ls, size_t nblocks,
+                                unsigned int nrounds);
+
+void
+_gcry_aes_armv8_ce_setkey (RIJNDAEL_context *ctx, const byte *key)
+{
+  union
+    {
+      PROPERLY_ALIGNED_TYPE dummy;
+      byte data[MAXKC][4];
+      u32 data32[MAXKC];
+    } tkk[2];
+  unsigned int rounds = ctx->rounds;
+  int KC = rounds - 6;
+  unsigned int keylen = KC * 4;
+  unsigned int i, r, t;
+  byte rcon = 1;
+  int j;
+#define k      tkk[0].data
+#define k_u32  tkk[0].data32
+#define tk     tkk[1].data
+#define tk_u32 tkk[1].data32
+#define W      (ctx->keyschenc)
+#define W_u32  (ctx->keyschenc32)
+
+  for (i = 0; i < keylen; i++)
+    {
+      k[i >> 2][i & 3] = key[i];
+    }
+
+  for (j = KC-1; j >= 0; j--)
+    {
+      tk_u32[j] = k_u32[j];
+    }
+  r = 0;
+  t = 0;
+  /* Copy values into round key array.  */
+  for (j = 0; (j < KC) && (r < rounds + 1); )
+    {
+      for (; (j < KC) && (t < 4); j++, t++)
+        {
+          W_u32[r][t] = le_bswap32(tk_u32[j]);
+        }
+      if (t == 4)
+        {
+          r++;
+          t = 0;
+        }
+    }
+
+  while (r < rounds + 1)
+    {
+      tk_u32[0] ^= _gcry_aes_sbox4_armv8_ce(rol(tk_u32[KC - 1], 24)) ^ rcon;
+
+      if (KC != 8)
+        {
+          for (j = 1; j < KC; j++)
+            {
+              tk_u32[j] ^= tk_u32[j-1];
+            }
+        }
+      else
+        {
+          for (j = 1; j < KC/2; j++)
+            {
+              tk_u32[j] ^= tk_u32[j-1];
+            }
+
+          tk_u32[KC/2] ^= _gcry_aes_sbox4_armv8_ce(tk_u32[KC/2 - 1]);
+
+          for (j = KC/2 + 1; j < KC; j++)
+            {
+              tk_u32[j] ^= tk_u32[j-1];
+            }
+        }
+
+      /* Copy values into round key array.  */
+      for (j = 0; (j < KC) && (r < rounds + 1); )
+        {
+          for (; (j < KC) && (t < 4); j++, t++)
+            {
+              W_u32[r][t] = le_bswap32(tk_u32[j]);
+            }
+          if (t == 4)
+            {
+              r++;
+              t = 0;
+            }
+        }
+
+      rcon = (rcon << 1) ^ ((rcon >> 7) * 0x1b);
+    }
+
+#undef W
+#undef tk
+#undef k
+#undef W_u32
+#undef tk_u32
+#undef k_u32
+  wipememory(&tkk, sizeof(tkk));
+}
+
+/* Make a decryption key from an encryption key. */
+void
+_gcry_aes_armv8_ce_prepare_decryption (RIJNDAEL_context *ctx)
+{
+  u128_t *ekey = (u128_t *)(void *)ctx->keyschenc;
+  u128_t *dkey = (u128_t *)(void *)ctx->keyschdec;
+  int rounds = ctx->rounds;
+  int rr;
+  int r;
+
+#define DO_AESIMC() _gcry_aes_invmixcol_armv8_ce(&dkey[r], &ekey[rr])
+
+  dkey[0] = ekey[rounds];
+  r = 1;
+  rr = rounds-1;
+  DO_AESIMC(); r++; rr--; /* round 1 */
+  DO_AESIMC(); r++; rr--; /* round 2 */
+  DO_AESIMC(); r++; rr--; /* round 3 */
+  DO_AESIMC(); r++; rr--; /* round 4 */
+  DO_AESIMC(); r++; rr--; /* round 5 */
+  DO_AESIMC(); r++; rr--; /* round 6 */
+  DO_AESIMC(); r++; rr--; /* round 7 */
+  DO_AESIMC(); r++; rr--; /* round 8 */
+  DO_AESIMC(); r++; rr--; /* round 9 */
+  if (rounds >= 12)
+    {
+      if (rounds > 12)
+        {
+          DO_AESIMC(); r++; rr--; /* round 10 */
+          DO_AESIMC(); r++; rr--; /* round 11 */
+        }
+
+      DO_AESIMC(); r++; rr--; /* round 12 / 10 */
+      DO_AESIMC(); r++; rr--; /* round 13 / 11 */
+    }
+
+  dkey[r] = ekey[0];
+
+#undef DO_AESIMC
+}
+
+unsigned int
+_gcry_aes_armv8_ce_encrypt (const RIJNDAEL_context *ctx, unsigned char *dst,
+                            const unsigned char *src)
+{
+  const void *keysched = ctx->keyschenc32;
+  unsigned int nrounds = ctx->rounds;
+
+  return _gcry_aes_enc_armv8_ce(keysched, dst, src, nrounds);
+}
+
+unsigned int
+_gcry_aes_armv8_ce_decrypt (const RIJNDAEL_context *ctx, unsigned char *dst,
+                            const unsigned char *src)
+{
+  const void *keysched = ctx->keyschdec32;
+  unsigned int nrounds = ctx->rounds;
+
+  return _gcry_aes_dec_armv8_ce(keysched, dst, src, nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_cbc_enc (const RIJNDAEL_context *ctx, unsigned char *outbuf,
+                            const unsigned char *inbuf, unsigned char *iv,
+                            size_t nblocks, int cbc_mac)
+{
+  const void *keysched = ctx->keyschenc32;
+  unsigned int nrounds = ctx->rounds;
+
+  _gcry_aes_cbc_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, cbc_mac,
+                             nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf,
+                            const unsigned char *inbuf, unsigned char *iv,
+                            size_t nblocks)
+{
+  const void *keysched = ctx->keyschdec32;
+  unsigned int nrounds = ctx->rounds;
+
+  _gcry_aes_cbc_dec_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_cfb_enc (RIJNDAEL_context *ctx, unsigned char *outbuf,
+                            const unsigned char *inbuf, unsigned char *iv,
+                            size_t nblocks)
+{
+  const void *keysched = ctx->keyschenc32;
+  unsigned int nrounds = ctx->rounds;
+
+  _gcry_aes_cfb_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf,
+                            const unsigned char *inbuf, unsigned char *iv,
+                            size_t nblocks)
+{
+  const void *keysched = ctx->keyschenc32;
+  unsigned int nrounds = ctx->rounds;
+
+  _gcry_aes_cfb_dec_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf,
+                            const unsigned char *inbuf, unsigned char *iv,
+                            size_t nblocks)
+{
+  const void *keysched = ctx->keyschenc32;
+  unsigned int nrounds = ctx->rounds;
+
+  _gcry_aes_ctr_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds);
+}
+
+void
+_gcry_aes_armv8_ce_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg,
+                              const void *inbuf_arg, size_t nblocks,
+                              int encrypt)
+{
+  RIJNDAEL_context *ctx = (void *)&c->context.c;
+  const void *keysched = encrypt ? ctx->keyschenc32 : ctx->keyschdec32;
+  ocb_crypt_fn_t crypt_fn = encrypt ? _gcry_aes_ocb_enc_armv8_ce
+                                    : _gcry_aes_ocb_dec_armv8_ce;
+  unsigned char *outbuf = outbuf_arg;
+  const unsigned char *inbuf = inbuf_arg;
+  unsigned int nrounds = ctx->rounds;
+  u64 blkn = c->u_mode.ocb.data_nblocks;
+  u64 blkn_offs = blkn - blkn % 32;
+  unsigned int n = 32 - blkn % 32;
+  unsigned char l_tmp[16];
+  void *Ls[32];
+  void **l;
+  size_t i;
+
+  c->u_mode.ocb.data_nblocks = blkn + nblocks;
+
+  if (nblocks >= 32)
+    {
+      for (i = 0; i < 32; i += 8)
+        {
+          Ls[(i + 0 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 1 + n) % 32] = (void *)c->u_mode.ocb.L[1];
+          Ls[(i + 2 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 3 + n) % 32] = (void *)c->u_mode.ocb.L[2];
+          Ls[(i + 4 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 5 + n) % 32] = (void *)c->u_mode.ocb.L[1];
+          Ls[(i + 6 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+        }
+
+      Ls[(7 + n) % 32] = (void *)c->u_mode.ocb.L[3];
+      Ls[(15 + n) % 32] = (void *)c->u_mode.ocb.L[4];
+      Ls[(23 + n) % 32] = (void *)c->u_mode.ocb.L[3];
+      l = &Ls[(31 + n) % 32];
+
+      /* Process data in 32 block chunks. */
+      while (nblocks >= 32)
+        {
+          /* l_tmp will be used only every 65536-th block. */
+          blkn_offs += 32;
+          *l = (void *)ocb_get_l(c, l_tmp, blkn_offs);
+
+          crypt_fn(keysched, outbuf, inbuf, c->u_iv.iv, c->u_ctr.ctr, Ls, 32,
+                    nrounds);
+
+          nblocks -= 32;
+          outbuf += 32 * 16;
+          inbuf  += 32 * 16;
+        }
+
+      if (nblocks && l < &Ls[nblocks])
+        {
+          *l = (void *)ocb_get_l(c, l_tmp, 32 + blkn_offs);
+        }
+    }
+  else
+    {
+      for (i = 0; i < nblocks; i++)
+        Ls[i] = (void *)ocb_get_l(c, l_tmp, ++blkn);
+    }
+
+  if (nblocks)
+    {
+      crypt_fn(keysched, outbuf, inbuf, c->u_iv.iv, c->u_ctr.ctr, Ls, nblocks,
+               nrounds);
+    }
+
+  wipememory(&l_tmp, sizeof(l_tmp));
+}
+
+void
+_gcry_aes_armv8_ce_ocb_auth (gcry_cipher_hd_t c, void *abuf_arg,
+                             size_t nblocks)
+{
+  RIJNDAEL_context *ctx = (void *)&c->context.c;
+  const void *keysched = ctx->keyschenc32;
+  const unsigned char *abuf = abuf_arg;
+  unsigned int nrounds = ctx->rounds;
+  u64 blkn = c->u_mode.ocb.aad_nblocks;
+  u64 blkn_offs = blkn - blkn % 32;
+  unsigned int n = 32 - blkn % 32;
+  unsigned char l_tmp[16];
+  void *Ls[32];
+  void **l;
+  size_t i;
+
+  c->u_mode.ocb.aad_nblocks = blkn + nblocks;
+
+  if (nblocks >= 32)
+    {
+      for (i = 0; i < 32; i += 8)
+        {
+          Ls[(i + 0 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 1 + n) % 32] = (void *)c->u_mode.ocb.L[1];
+          Ls[(i + 2 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 3 + n) % 32] = (void *)c->u_mode.ocb.L[2];
+          Ls[(i + 4 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+          Ls[(i + 5 + n) % 32] = (void *)c->u_mode.ocb.L[1];
+          Ls[(i + 6 + n) % 32] = (void *)c->u_mode.ocb.L[0];
+        }
+
+      Ls[(7 + n) % 32] = (void *)c->u_mode.ocb.L[3];
+      Ls[(15 + n) % 32] = (void *)c->u_mode.ocb.L[4];
+      Ls[(23 + n) % 32] = (void *)c->u_mode.ocb.L[3];
+      l = &Ls[(31 + n) % 32];
+
+      /* Process data in 32 block chunks. */
+      while (nblocks >= 32)
+        {
+          /* l_tmp will be used only every 65536-th block. */
+          blkn_offs += 32;
+          *l = (void *)ocb_get_l(c, l_tmp, blkn_offs);
+
+          _gcry_aes_ocb_auth_armv8_ce(keysched, abuf, c->u_mode.ocb.aad_offset,
+                                      c->u_mode.ocb.aad_sum, Ls, 32, nrounds);
+
+          nblocks -= 32;
+          abuf += 32 * 16;
+        }
+
+      if (nblocks && l < &Ls[nblocks])
+        {
+          *l = (void *)ocb_get_l(c, l_tmp, 32 + blkn_offs);
+        }
+    }
+  else
+    {
+      for (i = 0; i < nblocks; i++)
+        Ls[i] = (void *)ocb_get_l(c, l_tmp, ++blkn);
+    }
+
+  if (nblocks)
+    {
+      _gcry_aes_ocb_auth_armv8_ce(keysched, abuf, c->u_mode.ocb.aad_offset,
+                                  c->u_mode.ocb.aad_sum, Ls, nblocks, nrounds);
+    }
+
+  wipememory(&l_tmp, sizeof(l_tmp));
+}
+
+#endif /* USE_ARM_CE */
diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h
index 6641728..7544fa0 100644
--- a/cipher/rijndael-internal.h
+++ b/cipher/rijndael-internal.h
@@ -82,6 +82,17 @@
 # endif
 #endif /* ENABLE_AESNI_SUPPORT */
 
+/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly
+ * code. */
+#undef USE_ARM_CE
+#ifdef ENABLE_ARM_CRYPTO_SUPPORT
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO)
+#  define USE_ARM_CE 1
+# endif
+#endif /* ENABLE_ARM_CRYPTO_SUPPORT */
+
 struct RIJNDAEL_context_s;
 
 typedef unsigned int (*rijndael_cryptfn_t)(const struct RIJNDAEL_context_s *ctx,
@@ -127,6 +138,9 @@ typedef struct RIJNDAEL_context_s
 #ifdef USE_SSSE3
   unsigned int use_ssse3:1;           /* SSSE3 shall be used.  */
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  unsigned int use_arm_ce:1;          /* ARMv8 CE shall be used.  */
+#endif /*USE_ARM_CE*/
   rijndael_cryptfn_t encrypt_fn;
   rijndael_cryptfn_t decrypt_fn;
   rijndael_prefetchfn_t prefetch_enc_fn;
diff --git a/cipher/rijndael.c b/cipher/rijndael.c
index 0130924..cc6a722 100644
--- a/cipher/rijndael.c
+++ b/cipher/rijndael.c
@@ -168,6 +168,46 @@ extern unsigned int _gcry_aes_arm_decrypt_block(const void *keysched_dec,
                                                 const void *decT);
 #endif /*USE_ARM_ASM*/
 
+#ifdef USE_ARM_CE
+/* ARMv8 Crypto Extension implementations of AES */
+extern void _gcry_aes_armv8_ce_setkey(RIJNDAEL_context *ctx, const byte *key);
+extern void _gcry_aes_armv8_ce_prepare_decryption(RIJNDAEL_context *ctx);
+
+extern unsigned int _gcry_aes_armv8_ce_encrypt(const RIJNDAEL_context *ctx,
+                                               unsigned char *dst,
+                                               const unsigned char *src);
+extern unsigned int _gcry_aes_armv8_ce_decrypt(const RIJNDAEL_context *ctx,
+                                               unsigned char *dst,
+                                               const unsigned char *src);
+
+extern void _gcry_aes_armv8_ce_cfb_enc (RIJNDAEL_context *ctx,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks);
+extern void _gcry_aes_armv8_ce_cbc_enc (RIJNDAEL_context *ctx,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks,
+                                        int cbc_mac);
+extern void _gcry_aes_armv8_ce_ctr_enc (RIJNDAEL_context *ctx,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *ctr, size_t nblocks);
+extern void _gcry_aes_armv8_ce_cfb_dec (RIJNDAEL_context *ctx,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks);
+extern void _gcry_aes_armv8_ce_cbc_dec (RIJNDAEL_context *ctx,
+                                        unsigned char *outbuf,
+                                        const unsigned char *inbuf,
+                                        unsigned char *iv, size_t nblocks);
+extern void _gcry_aes_armv8_ce_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg,
+                                          const void *inbuf_arg, size_t nblocks,
+                                          int encrypt);
+extern void _gcry_aes_armv8_ce_ocb_auth (gcry_cipher_hd_t c,
+                                         const void *abuf_arg, size_t nblocks);
+#endif /*USE_ARM_ASM*/
+
 static unsigned int do_encrypt (const RIJNDAEL_context *ctx, unsigned char *bx,
                                 const unsigned char *ax);
 static unsigned int do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx,
@@ -223,11 +263,12 @@ static gcry_err_code_t
 do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen)
 {
   static int initialized = 0;
-  static const char *selftest_failed=0;
+  static const char *selftest_failed = 0;
   int rounds;
   int i,j, r, t, rconpointer = 0;
   int KC;
-#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3)
+#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) \
+    || defined(USE_ARM_CE)
   unsigned int hwfeatures;
 #endif
 
@@ -268,7 +309,8 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen)
 
   ctx->rounds = rounds;
 
-#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3)
+#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) \
+    || defined(USE_ARM_CE)
   hwfeatures = _gcry_get_hw_features ();
 #endif
 
@@ -282,6 +324,9 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen)
 #ifdef USE_SSSE3
   ctx->use_ssse3 = 0;
 #endif
+#ifdef USE_ARM_CE
+  ctx->use_arm_ce = 0;
+#endif
 
   if (0)
     {
@@ -318,6 +363,16 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen)
       ctx->use_ssse3 = 1;
     }
 #endif
+#ifdef USE_ARM_CE
+  else if (hwfeatures & HWF_ARM_AES)
+    {
+      ctx->encrypt_fn = _gcry_aes_armv8_ce_encrypt;
+      ctx->decrypt_fn = _gcry_aes_armv8_ce_decrypt;
+      ctx->prefetch_enc_fn = NULL;
+      ctx->prefetch_dec_fn = NULL;
+      ctx->use_arm_ce = 1;
+    }
+#endif
   else
     {
       ctx->encrypt_fn = do_encrypt;
@@ -340,6 +395,10 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen)
   else if (ctx->use_ssse3)
     _gcry_aes_ssse3_do_setkey (ctx, key);
 #endif
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    _gcry_aes_armv8_ce_setkey (ctx, key);
+#endif
   else
     {
       const byte *sbox = ((const byte *)encT) + 1;
@@ -471,6 +530,12 @@ prepare_decryption( RIJNDAEL_context *ctx )
       _gcry_aes_ssse3_prepare_decryption (ctx);
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_prepare_decryption (ctx);
+    }
+#endif /*USE_SSSE3*/
 #ifdef USE_PADLOCK
   else if (ctx->use_padlock)
     {
@@ -744,6 +809,13 @@ _gcry_aes_cfb_enc (void *context, unsigned char *iv,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_cfb_enc (ctx, outbuf, inbuf, iv, nblocks);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn;
@@ -798,6 +870,13 @@ _gcry_aes_cbc_enc (void *context, unsigned char *iv,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_cbc_enc (ctx, outbuf, inbuf, iv, nblocks, cbc_mac);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn;
@@ -860,6 +939,13 @@ _gcry_aes_ctr_enc (void *context, unsigned char *ctr,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_ctr_enc (ctx, outbuf, inbuf, ctr, nblocks);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } tmp;
@@ -1120,6 +1206,13 @@ _gcry_aes_cfb_dec (void *context, unsigned char *iv,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_cfb_dec (ctx, outbuf, inbuf, iv, nblocks);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn;
@@ -1173,6 +1266,13 @@ _gcry_aes_cbc_dec (void *context, unsigned char *iv,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_cbc_dec (ctx, outbuf, inbuf, iv, nblocks);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       unsigned char savebuf[BLOCKSIZE] ATTR_ALIGNED_16;
@@ -1238,6 +1338,13 @@ _gcry_aes_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg,
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_ocb_crypt (c, outbuf, inbuf, nblocks, encrypt);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else if (encrypt)
     {
       union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } l_tmp;
@@ -1323,6 +1430,13 @@ _gcry_aes_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks)
       burn_depth = 0;
     }
 #endif /*USE_SSSE3*/
+#ifdef USE_ARM_CE
+  else if (ctx->use_arm_ce)
+    {
+      _gcry_aes_armv8_ce_ocb_auth (c, abuf, nblocks);
+      burn_depth = 0;
+    }
+#endif /*USE_ARM_CE*/
   else
     {
       union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } l_tmp;
diff --git a/configure.ac b/configure.ac
index 91dd285..915888a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2009,6 +2009,10 @@ if test "$found" = "1" ; then
       arm*-*-*)
          # Build with the assembly implementation
          GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-arm.lo"
+
+         # Build with the ARMv8/AArch32 CE implementation
+         GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-armv8-ce.lo"
+         GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-armv8-aarch32-ce.lo"
       ;;
    esac
 

From cvs at cvs.gnupg.org  Thu Jul 14 11:17:49 2016
From: cvs at cvs.gnupg.org (by Werner Koch)
Date: Thu, 14 Jul 2016 11:17:49 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-19-g2b26de6
Message-ID: <E1bNcjG-0007uG-98@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  2b26de65e61dc42c64120c463a33e944bf413e28 (commit)
      from  e535ea1bdc42309553007d60599d3147b8defe93 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 2b26de65e61dc42c64120c463a33e944bf413e28
Author: Werner Koch <wk at gnupg.org>
Date:   Thu Jul 14 11:15:38 2016 +0200

    build: Update NEWS.
    
    --

diff --git a/NEWS b/NEWS
index be5e084..498c9da 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,24 @@
 Noteworthy changes in version 1.7.2 (unreleased)  [C21/A1/R_]
 ------------------------------------------------
 
+ * Bug fixes:
+
+   - Fix setting of the ECC cofactor if parameters are specified.
+
+   - Fix memory leak in the ECC code.
+
+   - Remove debug message about unsupported getrandom syscall.
+
+   - Fix build problems related to AVX use
+
+   - Fix bus errors on ARM for Poly1305, ChaCha20, AES, and SHA-512.
+
+ * Internal chnages:
+
+   - Improved fatal error message for wrong use of gcry_md_read.
+
+   - Disallow symmetric encryption/decryption if key is not set.
+
 
 Noteworthy changes in version 1.7.1 (2016-06-15)  [C21/A1/R1]
 ------------------------------------------------

-----------------------------------------------------------------------

Summary of changes:
 NEWS | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From cvs at cvs.gnupg.org  Thu Jul 14 11:42:39 2016
From: cvs at cvs.gnupg.org (by Werner Koch)
Date: Thu, 14 Jul 2016 11:42:39 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.2-1-g62642c4
Message-ID: <E1bNd7H-0000Qk-W8@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  62642c4be0653a94fdec0c0b1f9d38673250a156 (commit)
       via  be0bec7d9208b2f2d2ffce9cc2ca6154853e7e59 (commit)
       via  5a4cbc5256e493563eb82a9bb73f22fe4d413579 (commit)
       via  b0b70e7fe37b1bf13ec0bfc8effcb5c7f5db6b7d (commit)
      from  2b26de65e61dc42c64120c463a33e944bf413e28 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
-----------------------------------------------------------------------

Summary of changes:
 Makefile.am  | 15 +++++++++++++++
 NEWS         | 10 +++++++---
 configure.ac |  4 ++--
 3 files changed, 24 insertions(+), 5 deletions(-)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From wk at gnupg.org  Mon Jul 18 20:43:27 2016
From: wk at gnupg.org (Werner Koch)
Date: Mon, 18 Jul 2016 20:43:27 +0200
Subject: Asm portability problems on OS X and Solaris
Message-ID: <87y44y7po0.fsf@wheatstone.g10code.de>

Hi,

I got reports about two portability bug reports for 1.7.2.

The first is on an OS X 10.7.5:

  CFLAGS=-m32 ./configure --disable-asm && make all check
  ...
  libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src
  -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF
  .deps/rijndael-aesni.Tpo -c rijndael-aesni.c  -fno-common -DPIC -o
  .libs/rijndael-aesni.o
  rijndael-aesni.c: In function 'do_aesni_ctr_4':
  rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints
     asm volatile (/* detect if 8-bit carry handling is needed */
     ^

So, --disable-asm is not honored in this case.  A similar problem exists
on OpenIndiana with gcc 4.9:

  crc-intel-pclmul.c: In function 'crc32_less_than_16':
  crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints
         asm volatile ("movd %[crc], %%xmm0\n\t"

using the Solaris compiler we get this error:

  "../src/cipher-proto.h", line 27: warning: useless declaration
  "md.c", line 912: identifier redeclared: _gcry_md_extract
  	current : function(pointer to struct gcry_md_handle {pointer to
  struct gcry_md_context {..} ctx, 
  ... long output ...
  	previous: function(pointer to struct gcry_md_handle {pointer to
  struct gcry_md_context {..} ctx, int bufpos, int bufsize, array[1] of
  unsigned char buf}, int, pointer to void, unsigned int) returning unsigned
  int : "../src/gcrypt-int.h", line 132
  cc: acomp failed for md.c


Salam-Shalom,

   Werner
  

-- 
Die Gedanken sind frei.  Ausnahmen regelt ein Bundesgesetz.
 /* Join us at OpenPGP.conf  <https://openpgp-conf.org> */


From jussi.kivilinna at iki.fi  Tue Jul 19 10:43:02 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 19 Jul 2016 11:43:02 +0300
Subject: Asm portability problems on OS X and Solaris
In-Reply-To: <87y44y7po0.fsf@wheatstone.g10code.de>
References: <87y44y7po0.fsf@wheatstone.g10code.de>
Message-ID: <578DE816.9060004@iki.fi>

On 18.07.2016 21:43, Werner Koch wrote:
> Hi,
> 
> I got reports about two portability bug reports for 1.7.2.
> 
> The first is on an OS X 10.7.5:
> 
>   CFLAGS=-m32 ./configure --disable-asm && make all check
>   ...
>   libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src
>   -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF
>   .deps/rijndael-aesni.Tpo -c rijndael-aesni.c  -fno-common -DPIC -o
>   .libs/rijndael-aesni.o
>   rijndael-aesni.c: In function 'do_aesni_ctr_4':
>   rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints
>      asm volatile (/* detect if 8-bit carry handling is needed */
>      ^
> 
> So, --disable-asm is not honored in this case.

Currently --disable-asm only disables MPI assembly modules.

Configure options --disable-aesni-support and --disable-pclmul-support can
be used to disable AESNI and PCLMUL assembly parts.

> A similar problem exists
> on OpenIndiana with gcc 4.9:
> 
>   crc-intel-pclmul.c: In function 'crc32_less_than_16':
>   crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints
>          asm volatile ("movd %[crc], %%xmm0\n\t"

GCC is failing to allocate register constraints for inline assembly blocks in
these cases. In these two asm blocks, quite many memory operands are used and
with PIC enabled, these might be requiring extra registers. So, these
blocks need be split to ease register pressure.

> 
> using the Solaris compiler we get this error:
> 
>   "../src/cipher-proto.h", line 27: warning: useless declaration
>   "md.c", line 912: identifier redeclared: _gcry_md_extract
>   	current : function(pointer to struct gcry_md_handle {pointer to
>   struct gcry_md_context {..} ctx, 
>   ... long output ...

This does not seem to be full 'current' declaration, but truncated.

-Jussi

>   	previous: function(pointer to struct gcry_md_handle {pointer to
>   struct gcry_md_context {..} ctx, int bufpos, int bufsize, array[1] of
>   unsigned char buf}, int, pointer to void, unsigned int) returning unsigned
>   int : "../src/gcrypt-int.h", line 132
>   cc: acomp failed for md.c
> 
> 
> 
> Salam-Shalom,
> 
>    Werner
>   
> 


From jussi.kivilinna at iki.fi  Tue Jul 19 12:48:57 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 19 Jul 2016 13:48:57 +0300
Subject: Asm portability problems on OS X and Solaris
In-Reply-To: <578DE816.9060004@iki.fi>
References: <87y44y7po0.fsf@wheatstone.g10code.de> <578DE816.9060004@iki.fi>
Message-ID: <578E0599.1000801@iki.fi>

On 19.07.2016 11:43, Jussi Kivilinna wrote:
> On 18.07.2016 21:43, Werner Koch wrote:
>> Hi,
>>
>> I got reports about two portability bug reports for 1.7.2.
>>
>> The first is on an OS X 10.7.5:
>>
>>   CFLAGS=-m32 ./configure --disable-asm && make all check
>>   ...
>>   libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src
>>   -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF
>>   .deps/rijndael-aesni.Tpo -c rijndael-aesni.c  -fno-common -DPIC -o
>>   .libs/rijndael-aesni.o
>>   rijndael-aesni.c: In function 'do_aesni_ctr_4':
>>   rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints
>>      asm volatile (/* detect if 8-bit carry handling is needed */
>>      ^
>>
>> So, --disable-asm is not honored in this case.
> 
> Currently --disable-asm only disables MPI assembly modules.
> 
> Configure options --disable-aesni-support and --disable-pclmul-support can
> be used to disable AESNI and PCLMUL assembly parts.
> 
>> A similar problem exists
>> on OpenIndiana with gcc 4.9:
>>
>>   crc-intel-pclmul.c: In function 'crc32_less_than_16':
>>   crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints
>>          asm volatile ("movd %[crc], %%xmm0\n\t"
> 
> GCC is failing to allocate register constraints for inline assembly blocks in
> these cases. In these two asm blocks, quite many memory operands are used and
> with PIC enabled, these might be requiring extra registers. So, these
> blocks need be split to ease register pressure.
> 

I managed to reproduce rijndael-aesni.c case on linux/gcc/i386 with GCC 4.7 and
4.8. GCC 4.9 and later are not affected. I'll post fix soon. Bug tracker had
issue for this on gentoo:
 https://bugs.gnupg.org/gnupg/issue2325

I also made similar change for crc-intel-pclmul.c but was unable to reproduce
the issue (on linux).

-Jussi


From jussi.kivilinna at iki.fi  Tue Jul 19 12:51:34 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 19 Jul 2016 13:51:34 +0300
Subject: [PATCH 1/2] rijndael-aesni: split assembly block to ease register
 pressure
Message-ID: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6>

* cipher/rijndael-aesni.c (do_aesni_ctr_4): Use single register
constraint for passing 'bige_addb' to assembly block; split
first inline assembly block into two parts.
--

Fixes compiling on i386 with GCC-4.8 and older.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c
index 97e0ad0..8b28b3a 100644
--- a/cipher/rijndael-aesni.c
+++ b/cipher/rijndael-aesni.c
@@ -794,6 +794,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
       { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 },
       { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4 }
     };
+  const void *bige_addb = bige_addb_const;
 #define aesenc_xmm1_xmm0      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xc1\n\t"
 #define aesenc_xmm1_xmm2      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd1\n\t"
 #define aesenc_xmm1_xmm3      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd9\n\t"
@@ -819,16 +820,15 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 "ja     .Ladd32bit%=\n\t"
 
                 "movdqa %%xmm5, %%xmm0\n\t"     /* xmm0 := CTR (xmm5) */
-                "movdqa %[addb_1], %%xmm2\n\t"  /* xmm2 := be(1) */
-                "movdqa %[addb_2], %%xmm3\n\t"  /* xmm3 := be(2) */
-                "movdqa %[addb_3], %%xmm4\n\t"  /* xmm4 := be(3) */
-                "movdqa %[addb_4], %%xmm5\n\t"  /* xmm5 := be(4) */
+                "movdqa 0*16(%[addb]), %%xmm2\n\t"  /* xmm2 := be(1) */
+                "movdqa 1*16(%[addb]), %%xmm3\n\t"  /* xmm3 := be(2) */
+                "movdqa 2*16(%[addb]), %%xmm4\n\t"  /* xmm4 := be(3) */
+                "movdqa 3*16(%[addb]), %%xmm5\n\t"  /* xmm5 := be(4) */
                 "paddb  %%xmm0, %%xmm2\n\t"     /* xmm2 := be(1) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm3\n\t"     /* xmm3 := be(2) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm4\n\t"     /* xmm4 := be(3) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm5\n\t"     /* xmm5 := be(4) + CTR (xmm0) */
                 "movdqa (%[key]), %%xmm1\n\t"   /* xmm1 := key[0] */
-                "movl   %[rounds], %%esi\n\t"
                 "jmp    .Lstore_ctr%=\n\t"
 
                 ".Ladd32bit%=:\n\t"
@@ -871,7 +871,6 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
 
                 ".Lno_carry%=:\n\t"
                 "movdqa (%[key]), %%xmm1\n\t"   /* xmm1 := key[0]    */
-                "movl %[rounds], %%esi\n\t"
 
                 "pshufb %%xmm6, %%xmm2\n\t"     /* xmm2 := be(xmm2) */
                 "pshufb %%xmm6, %%xmm3\n\t"     /* xmm3 := be(xmm3) */
@@ -880,8 +879,13 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
 
                 ".Lstore_ctr%=:\n\t"
                 "movdqa %%xmm5, (%[ctr])\n\t"   /* Update CTR (mem).  */
+                :
+                : [ctr] "r" (ctr),
+                  [key] "r" (ctx->keyschenc),
+                  [addb] "r" (bige_addb)
+                : "%esi", "cc", "memory");
 
-                "pxor   %%xmm1, %%xmm0\n\t"     /* xmm0 ^= key[0]    */
+  asm volatile ("pxor   %%xmm1, %%xmm0\n\t"     /* xmm0 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm2\n\t"     /* xmm2 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm3\n\t"     /* xmm3 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm4\n\t"     /* xmm4 ^= key[0]    */
@@ -931,7 +935,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenc_xmm1_xmm3
                 aesenc_xmm1_xmm4
                 "movdqa 0xa0(%[key]), %%xmm1\n\t"
-                "cmpl $10, %%esi\n\t"
+                "cmpl $10, %[rounds]\n\t"
                 "jz .Lenclast%=\n\t"
                 aesenc_xmm1_xmm0
                 aesenc_xmm1_xmm2
@@ -943,7 +947,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenc_xmm1_xmm3
                 aesenc_xmm1_xmm4
                 "movdqa 0xc0(%[key]), %%xmm1\n\t"
-                "cmpl $12, %%esi\n\t"
+                "cmpl $12, %[rounds]\n\t"
                 "jz .Lenclast%=\n\t"
                 aesenc_xmm1_xmm0
                 aesenc_xmm1_xmm2
@@ -962,14 +966,9 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenclast_xmm1_xmm3
                 aesenclast_xmm1_xmm4
                 :
-                : [ctr] "r" (ctr),
-                  [key] "r" (ctx->keyschenc),
-                  [rounds] "g" (ctx->rounds),
-                  [addb_1] "m" (bige_addb_const[0][0]),
-                  [addb_2] "m" (bige_addb_const[1][0]),
-                  [addb_3] "m" (bige_addb_const[2][0]),
-                  [addb_4] "m" (bige_addb_const[3][0])
-                : "%esi", "cc", "memory");
+                : [key] "r" (ctx->keyschenc),
+                  [rounds] "r" (ctx->rounds)
+                : "cc", "memory");
 
   asm volatile ("movdqu (%[src]), %%xmm1\n\t"    /* Get block 1.      */
                 "pxor %%xmm1, %%xmm0\n\t"        /* EncCTR-1 ^= input */


From jussi.kivilinna at iki.fi  Tue Jul 19 12:51:39 2016
From: jussi.kivilinna at iki.fi (Jussi Kivilinna)
Date: Tue, 19 Jul 2016 13:51:39 +0300
Subject: [PATCH 2/2] crc-intel-pclmul: split assembly block to ease register
 pressure
In-Reply-To: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6>
References: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6>
Message-ID: <146892549926.1140.16666077431608890245.stgit@localhost6.localdomain6>

* cipher/crc-intel-pclmul.c (crc32_less_than_16): Split inline
assembly block handling 4 byte input into multiple blocks.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
---
 0 files changed

diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c
index 2972fb4..7a344e2 100644
--- a/cipher/crc-intel-pclmul.c
+++ b/cipher/crc-intel-pclmul.c
@@ -747,22 +747,28 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen,
       asm volatile ("movd %[crc], %%xmm0\n\t"
 		    "movd %[in], %%xmm1\n\t"
 		    "movdqa %[my_p], %%xmm5\n\t"
-		    "pxor %%xmm1, %%xmm0\n\t"
+		    :
+		    : [in] "m" (*inbuf),
+		      [crc] "m" (*pcrc),
+		      [my_p] "m" (consts->my_p[0])
+		    : "cc" );
+
+      asm volatile ("pxor %%xmm1, %%xmm0\n\t"
 		    "pshufb %[bswap], %%xmm0\n\t" /* [xx][00][00][00] */
 
 		    "pclmulqdq $0x01, %%xmm5, %%xmm0\n\t" /* [00][xx][xx][00] */
 		    "pclmulqdq $0x11, %%xmm5, %%xmm0\n\t" /* [00][00][xx][xx] */
+		    :
+		    : [bswap] "m" (*crc32_bswap_shuf)
+		    : "cc" );
 
-		    /* store CRC in input endian */
+      asm volatile (/* store CRC in input endian */
 		    "movd %%xmm0, %%eax\n\t"
 		    "bswapl %%eax\n\t"
 		    "movl %%eax, %[out]\n\t"
 		    : [out] "=m" (*pcrc)
-		    : [in] "m" (*inbuf),
-		      [crc] "m" (*pcrc),
-		      [my_p] "m" (consts->my_p[0]),
-		      [bswap] "m" (*crc32_bswap_shuf)
-		    : "eax" );
+		    :
+		    : "eax", "cc" );
     }
   else
     {


From cvs at cvs.gnupg.org  Wed Jul 20 12:33:09 2016
From: cvs at cvs.gnupg.org (by Jussi Kivilinna)
Date: Wed, 20 Jul 2016 12:33:09 +0200
Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.2-8-gf38199d
Message-ID: <E1bPolI-0007Qb-Ly@lists.gnupg.org>

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  f38199dbc290003898a1799adc367265267784c2 (commit)
       via  a4d1595a2638db63ac4c73e722c8ba95fdd85ff7 (commit)
      from  05a4cecae0c02d2b4ee1cadd9c08115beae3a94a (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit f38199dbc290003898a1799adc367265267784c2
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Tue Jul 19 13:20:53 2016 +0300

    crc-intel-pclmul: split assembly block to ease register pressure
    
    * cipher/crc-intel-pclmul.c (crc32_less_than_16): Split inline
    assembly block handling 4 byte input into multiple blocks.
    --
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c
index 2972fb4..7a344e2 100644
--- a/cipher/crc-intel-pclmul.c
+++ b/cipher/crc-intel-pclmul.c
@@ -747,22 +747,28 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen,
       asm volatile ("movd %[crc], %%xmm0\n\t"
 		    "movd %[in], %%xmm1\n\t"
 		    "movdqa %[my_p], %%xmm5\n\t"
-		    "pxor %%xmm1, %%xmm0\n\t"
+		    :
+		    : [in] "m" (*inbuf),
+		      [crc] "m" (*pcrc),
+		      [my_p] "m" (consts->my_p[0])
+		    : "cc" );
+
+      asm volatile ("pxor %%xmm1, %%xmm0\n\t"
 		    "pshufb %[bswap], %%xmm0\n\t" /* [xx][00][00][00] */
 
 		    "pclmulqdq $0x01, %%xmm5, %%xmm0\n\t" /* [00][xx][xx][00] */
 		    "pclmulqdq $0x11, %%xmm5, %%xmm0\n\t" /* [00][00][xx][xx] */
+		    :
+		    : [bswap] "m" (*crc32_bswap_shuf)
+		    : "cc" );
 
-		    /* store CRC in input endian */
+      asm volatile (/* store CRC in input endian */
 		    "movd %%xmm0, %%eax\n\t"
 		    "bswapl %%eax\n\t"
 		    "movl %%eax, %[out]\n\t"
 		    : [out] "=m" (*pcrc)
-		    : [in] "m" (*inbuf),
-		      [crc] "m" (*pcrc),
-		      [my_p] "m" (consts->my_p[0]),
-		      [bswap] "m" (*crc32_bswap_shuf)
-		    : "eax" );
+		    :
+		    : "eax", "cc" );
     }
   else
     {

commit a4d1595a2638db63ac4c73e722c8ba95fdd85ff7
Author: Jussi Kivilinna <jussi.kivilinna at iki.fi>
Date:   Tue Jul 19 13:20:13 2016 +0300

    rijndael-aesni: split assembly block to ease register pressure
    
    * cipher/rijndael-aesni.c (do_aesni_ctr_4): Use single register
    constraint for passing 'bige_addb' to assembly block; split
    first inline assembly block into two parts.
    --
    
    Fixes compiling on i386 with GCC-4.8 and older.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>

diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c
index 97e0ad0..8b28b3a 100644
--- a/cipher/rijndael-aesni.c
+++ b/cipher/rijndael-aesni.c
@@ -794,6 +794,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
       { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 },
       { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4 }
     };
+  const void *bige_addb = bige_addb_const;
 #define aesenc_xmm1_xmm0      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xc1\n\t"
 #define aesenc_xmm1_xmm2      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd1\n\t"
 #define aesenc_xmm1_xmm3      ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd9\n\t"
@@ -819,16 +820,15 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 "ja     .Ladd32bit%=\n\t"
 
                 "movdqa %%xmm5, %%xmm0\n\t"     /* xmm0 := CTR (xmm5) */
-                "movdqa %[addb_1], %%xmm2\n\t"  /* xmm2 := be(1) */
-                "movdqa %[addb_2], %%xmm3\n\t"  /* xmm3 := be(2) */
-                "movdqa %[addb_3], %%xmm4\n\t"  /* xmm4 := be(3) */
-                "movdqa %[addb_4], %%xmm5\n\t"  /* xmm5 := be(4) */
+                "movdqa 0*16(%[addb]), %%xmm2\n\t"  /* xmm2 := be(1) */
+                "movdqa 1*16(%[addb]), %%xmm3\n\t"  /* xmm3 := be(2) */
+                "movdqa 2*16(%[addb]), %%xmm4\n\t"  /* xmm4 := be(3) */
+                "movdqa 3*16(%[addb]), %%xmm5\n\t"  /* xmm5 := be(4) */
                 "paddb  %%xmm0, %%xmm2\n\t"     /* xmm2 := be(1) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm3\n\t"     /* xmm3 := be(2) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm4\n\t"     /* xmm4 := be(3) + CTR (xmm0) */
                 "paddb  %%xmm0, %%xmm5\n\t"     /* xmm5 := be(4) + CTR (xmm0) */
                 "movdqa (%[key]), %%xmm1\n\t"   /* xmm1 := key[0] */
-                "movl   %[rounds], %%esi\n\t"
                 "jmp    .Lstore_ctr%=\n\t"
 
                 ".Ladd32bit%=:\n\t"
@@ -871,7 +871,6 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
 
                 ".Lno_carry%=:\n\t"
                 "movdqa (%[key]), %%xmm1\n\t"   /* xmm1 := key[0]    */
-                "movl %[rounds], %%esi\n\t"
 
                 "pshufb %%xmm6, %%xmm2\n\t"     /* xmm2 := be(xmm2) */
                 "pshufb %%xmm6, %%xmm3\n\t"     /* xmm3 := be(xmm3) */
@@ -880,8 +879,13 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
 
                 ".Lstore_ctr%=:\n\t"
                 "movdqa %%xmm5, (%[ctr])\n\t"   /* Update CTR (mem).  */
+                :
+                : [ctr] "r" (ctr),
+                  [key] "r" (ctx->keyschenc),
+                  [addb] "r" (bige_addb)
+                : "%esi", "cc", "memory");
 
-                "pxor   %%xmm1, %%xmm0\n\t"     /* xmm0 ^= key[0]    */
+  asm volatile ("pxor   %%xmm1, %%xmm0\n\t"     /* xmm0 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm2\n\t"     /* xmm2 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm3\n\t"     /* xmm3 ^= key[0]    */
                 "pxor   %%xmm1, %%xmm4\n\t"     /* xmm4 ^= key[0]    */
@@ -931,7 +935,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenc_xmm1_xmm3
                 aesenc_xmm1_xmm4
                 "movdqa 0xa0(%[key]), %%xmm1\n\t"
-                "cmpl $10, %%esi\n\t"
+                "cmpl $10, %[rounds]\n\t"
                 "jz .Lenclast%=\n\t"
                 aesenc_xmm1_xmm0
                 aesenc_xmm1_xmm2
@@ -943,7 +947,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenc_xmm1_xmm3
                 aesenc_xmm1_xmm4
                 "movdqa 0xc0(%[key]), %%xmm1\n\t"
-                "cmpl $12, %%esi\n\t"
+                "cmpl $12, %[rounds]\n\t"
                 "jz .Lenclast%=\n\t"
                 aesenc_xmm1_xmm0
                 aesenc_xmm1_xmm2
@@ -962,14 +966,9 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx,
                 aesenclast_xmm1_xmm3
                 aesenclast_xmm1_xmm4
                 :
-                : [ctr] "r" (ctr),
-                  [key] "r" (ctx->keyschenc),
-                  [rounds] "g" (ctx->rounds),
-                  [addb_1] "m" (bige_addb_const[0][0]),
-                  [addb_2] "m" (bige_addb_const[1][0]),
-                  [addb_3] "m" (bige_addb_const[2][0]),
-                  [addb_4] "m" (bige_addb_const[3][0])
-                : "%esi", "cc", "memory");
+                : [key] "r" (ctx->keyschenc),
+                  [rounds] "r" (ctx->rounds)
+                : "cc", "memory");
 
   asm volatile ("movdqu (%[src]), %%xmm1\n\t"    /* Get block 1.      */
                 "pxor %%xmm1, %%xmm0\n\t"        /* EncCTR-1 ^= input */

-----------------------------------------------------------------------

Summary of changes:
 cipher/crc-intel-pclmul.c | 20 +++++++++++++-------
 cipher/rijndael-aesni.c   | 33 ++++++++++++++++-----------------
 2 files changed, 29 insertions(+), 24 deletions(-)


hooks/post-receive
-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits at gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits


From ff at octo.it  Wed Jul 27 14:09:00 2016
From: ff at octo.it (Florian Forster)
Date: Wed, 27 Jul 2016 14:09:00 +0200
Subject: [PATCH] doc: Improve example gcry_control usage.
Message-ID: <1469621341-28825-1-git-send-email-ff@octo.it>

* doc/gcrypt.texi: Change example code to check return value of
  gcry_control(GCRYCTL_INIT_SECMEM, ?).
---
 doc/gcrypt.texi | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi
index c2c39ad..9486400 100644
--- a/doc/gcrypt.texi
+++ b/doc/gcrypt.texi
@@ -424,7 +424,10 @@ and freed memory, you need to initialize Libgcrypt this way:
 
   /* Allocate a pool of 16k secure memory.  This make the secure memory
      available and also drops privileges where needed.  */
-  gcry_control (GCRYCTL_INIT_SECMEM, 16384, 0);
+  if (gcry_control (GCRYCTL_INIT_SECMEM, 16384, 0)) {
+    fputs ("initializing secure memory failed\n", stderr);
+    exit (EXIT_FAILURE);
+  }
 
 @anchor{sample-use-resume-secmem}
   /* It is now okay to let Libgcrypt complain when there was/is
@@ -687,7 +690,8 @@ enabling the use of secure memory.  It also drops all extra privileges
 the process has (i.e. if it is run as setuid (root)).  If the argument
 @var{nbytes} is 0, secure memory will be disabled.  The minimum amount
 of secure memory allocated is currently 16384 bytes; you may thus use a
-value of 1 to request that default size.
+value of 1 to request that default size.  Returns zero on success and
+non-zero on failure.
 
 @item GCRYCTL_TERM_SECMEM; Arguments: none
 This command zeroises the secure memory and destroys the handler.  The
-- 
2.8.0.rc3.226.g39d4020