From borneo.antonio at gmail.com Fri Jul 1 12:09:12 2016 From: borneo.antonio at gmail.com (Antonio Borneo) Date: Fri, 1 Jul 2016 12:09:12 +0200 Subject: PIC, alignment problems with libcrypt on armv7 In-Reply-To: <5775916F.8010505@iki.fi> References: <5774CDA5.2090001@iki.fi> <5775916F.8010505@iki.fi> Message-ID: On Thu, Jun 30, 2016 at 11:38 PM, Jussi Kivilinna wrote: > On 30.06.2016 10:43, Jussi Kivilinna wrote: > >>> The second problem showed up as a bus error running tests/basic. >>> The problem is that ldm/stm don't deal with unaligned accesses even >>> on armv7 (see http://www.heyrick.co.uk/armwiki/Unaligned_data_access). >>> My workaround is to undef the gcc-defined feature symbol, but a better >>> fix would be to strip out the conditional guards, since the alignment >>> adjustments are needed on all versions. >> >> I have made wrong assumption about unaligned accesses with ldm/stm. >> I'll make the needed changes and add proper unaligned buffer test cases >> so that these will be caught in future. >> > > Appears that there is proper tests for unaligned buffers. However those > tests did not fail for me, since on Linux unaligned ldm/stm exception > is caught and handled by kernel. Jussi, on arnv7 target you can disable kernel unaligned handling and enable the unaligned fault with echo 4 > /proc/cpu/alignment as explained in kernel Documentation/arm/mem_alignment Regards, Antonio From jussi.kivilinna at iki.fi Fri Jul 1 22:07:58 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 Jul 2016 23:07:58 +0300 Subject: [PATCH] Fix static build Message-ID: <146740367798.16457.1799426351699167255.stgit@localhost6.localdomain6> * tests/pubkey.c (_gcry_pk_util_get_nbits): Make function 'static'. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/tests/pubkey.c b/tests/pubkey.c index 3eb5b4f..1271e43 100644 --- a/tests/pubkey.c +++ b/tests/pubkey.c @@ -175,7 +175,7 @@ show_sexp (const char *prefix, gcry_sexp_t a) } /* from ../cipher/pubkey-util.c */ -gpg_err_code_t +static gpg_err_code_t _gcry_pk_util_get_nbits (gcry_sexp_t list, unsigned int *r_nbits) { char buf[50]; From cvs at cvs.gnupg.org Sun Jul 3 17:18:00 2016 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sun, 03 Jul 2016 17:18:00 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-15-gcb79630 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via cb79630ec567a5f2e03e5f863cda168faa7b8cc8 (commit) via 07de9858032826f5a7b08c372f6bcc73bbb503eb (commit) via a6158a01a4d81a5d862e1e0a60bfd6063443311d (commit) via a09126242a51c4ea4564b0f70b808e4f27fe5a91 (commit) via 4a983e3bef58b9d056517e25e0ab10b72d12ceba (commit) from 6965515c73632a088fb126a4a55e95121671fa98 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit cb79630ec567a5f2e03e5f863cda168faa7b8cc8 Author: Jussi Kivilinna Date: Fri Jul 1 23:07:07 2016 +0300 Fix static build * tests/pubkey.c (_gcry_pk_util_get_nbits): Make function 'static'. -- Signed-off-by: Jussi Kivilinna diff --git a/tests/pubkey.c b/tests/pubkey.c index 3eb5b4f..1271e43 100644 --- a/tests/pubkey.c +++ b/tests/pubkey.c @@ -175,7 +175,7 @@ show_sexp (const char *prefix, gcry_sexp_t a) } /* from ../cipher/pubkey-util.c */ -gpg_err_code_t +static gpg_err_code_t _gcry_pk_util_get_nbits (gcry_sexp_t list, unsigned int *r_nbits) { char buf[50]; commit 07de9858032826f5a7b08c372f6bcc73bbb503eb Author: Jussi Kivilinna Date: Thu Jun 30 21:51:50 2016 +0300 Disallow encryption/decryption if key is not set * cipher/cipher.c (cipher_encrypt, cipher_decrypt): If mode is not NONE, make sure that key is set. * cipher/cipher-ccm.c (_gcry_cipher_ccm_set_nonce): Do not clear 'marks.key' when reseting state. -- Reported-by: Andreas Metzler Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-ccm.c b/cipher/cipher-ccm.c index 4d8f816..d7f14d8 100644 --- a/cipher/cipher-ccm.c +++ b/cipher/cipher-ccm.c @@ -110,6 +110,7 @@ gcry_err_code_t _gcry_cipher_ccm_set_nonce (gcry_cipher_hd_t c, const unsigned char *nonce, size_t noncelen) { + unsigned int marks_key; size_t L = 15 - noncelen; size_t L_; @@ -122,12 +123,14 @@ _gcry_cipher_ccm_set_nonce (gcry_cipher_hd_t c, const unsigned char *nonce, return GPG_ERR_INV_LENGTH; /* Reset state */ + marks_key = c->marks.key; memset (&c->u_mode, 0, sizeof(c->u_mode)); memset (&c->marks, 0, sizeof(c->marks)); memset (&c->u_iv, 0, sizeof(c->u_iv)); memset (&c->u_ctr, 0, sizeof(c->u_ctr)); memset (c->lastiv, 0, sizeof(c->lastiv)); c->unused = 0; + c->marks.key = marks_key; /* Setup CTR */ c->u_ctr.ctr[0] = L_; diff --git a/cipher/cipher.c b/cipher/cipher.c index 2b7bf21..ff3340f 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -818,6 +818,12 @@ cipher_encrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, { gcry_err_code_t rc; + if (c->mode != GCRY_CIPHER_MODE_NONE && !c->marks.key) + { + log_error ("cipher_encrypt: key not set\n"); + return GPG_ERR_MISSING_KEY; + } + switch (c->mode) { case GCRY_CIPHER_MODE_ECB: @@ -935,6 +941,12 @@ cipher_decrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, { gcry_err_code_t rc; + if (c->mode != GCRY_CIPHER_MODE_NONE && !c->marks.key) + { + log_error ("cipher_decrypt: key not set\n"); + return GPG_ERR_MISSING_KEY; + } + switch (c->mode) { case GCRY_CIPHER_MODE_ECB: commit a6158a01a4d81a5d862e1e0a60bfd6063443311d Author: Jussi Kivilinna Date: Thu Jun 30 21:34:46 2016 +0300 Avoid unaligned accesses with ARM ldm/stm instructions * cipher/rijndael-arm.S: Remove __ARM_FEATURE_UNALIGNED ifdefs, always compile with unaligned load/store code paths. * cipher/sha512-arm.S: Ditto. -- Reported-by: Michael Plass Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-arm.S b/cipher/rijndael-arm.S index 694369d..e3a91c2 100644 --- a/cipher/rijndael-arm.S +++ b/cipher/rijndael-arm.S @@ -225,7 +225,7 @@ _gcry_aes_arm_encrypt_block: push {%r4-%r11, %ip, %lr}; /* read input block */ -#ifndef __ARM_FEATURE_UNALIGNED + /* test if src is unaligned */ tst %r2, #3; beq 1f; @@ -238,7 +238,6 @@ _gcry_aes_arm_encrypt_block: b 2f; .ltorg 1: -#endif /* aligned load */ ldm %r2, {RA, RB, RC, RD}; #ifndef __ARMEL__ @@ -277,7 +276,7 @@ _gcry_aes_arm_encrypt_block: add %sp, #16; /* store output block */ -#ifndef __ARM_FEATURE_UNALIGNED + /* test if dst is unaligned */ tst RT0, #3; beq 1f; @@ -290,7 +289,6 @@ _gcry_aes_arm_encrypt_block: b 2f; .ltorg 1: -#endif /* aligned store */ #ifndef __ARMEL__ rev RA, RA; @@ -484,7 +482,7 @@ _gcry_aes_arm_decrypt_block: push {%r4-%r11, %ip, %lr}; /* read input block */ -#ifndef __ARM_FEATURE_UNALIGNED + /* test if src is unaligned */ tst %r2, #3; beq 1f; @@ -497,7 +495,6 @@ _gcry_aes_arm_decrypt_block: b 2f; .ltorg 1: -#endif /* aligned load */ ldm %r2, {RA, RB, RC, RD}; #ifndef __ARMEL__ @@ -533,7 +530,7 @@ _gcry_aes_arm_decrypt_block: add %sp, #16; /* store output block */ -#ifndef __ARM_FEATURE_UNALIGNED + /* test if dst is unaligned */ tst RT0, #3; beq 1f; @@ -546,7 +543,6 @@ _gcry_aes_arm_decrypt_block: b 2f; .ltorg 1: -#endif /* aligned store */ #ifndef __ARMEL__ rev RA, RA; diff --git a/cipher/sha512-arm.S b/cipher/sha512-arm.S index 28f156e..94ec014 100644 --- a/cipher/sha512-arm.S +++ b/cipher/sha512-arm.S @@ -323,7 +323,7 @@ _gcry_sha512_transform_arm: stm RWhi, {RT1lo,RT1hi,RT2lo,RT2hi,RT3lo,RT3hi,RT4lo,RT4hi} /* Load input to w[16] */ -#ifndef __ARM_FEATURE_UNALIGNED + /* test if data is unaligned */ tst %r1, #3; beq 1f; @@ -341,7 +341,6 @@ _gcry_sha512_transform_arm: read_be64_unaligned_4(%r1, 12 * 8, RT1lo, RT1hi, RT2lo, RT2hi, RT3lo, RT3hi, RT4lo, RT4hi, RWlo); b 2f; -#endif 1: /* aligned load */ add RWhi, %sp, #(w(0)); commit a09126242a51c4ea4564b0f70b808e4f27fe5a91 Author: Jussi Kivilinna Date: Thu Jun 30 21:23:05 2016 +0300 Fix non-PIC reference in PIC for poly1305/ARMv7-NEON * cipher/poly1305-armv7-neon.S (GET_DATA_POINTER): New. (_gcry_poly1305_armv7_neon_init_ext): Use GET_DATA_POINTER. -- Reported-by: Michael Plass Signed-off-by: Jussi Kivilinna diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S index 1134e85..b1554ed 100644 --- a/cipher/poly1305-armv7-neon.S +++ b/cipher/poly1305-armv7-neon.S @@ -33,6 +33,19 @@ .fpu neon .arm +#ifdef __PIC__ +# define GET_DATA_POINTER(reg, name, rtmp) \ + ldr reg, 1f; \ + ldr rtmp, 2f; \ + b 3f; \ + 1: .word _GLOBAL_OFFSET_TABLE_-(3f+8); \ + 2: .word name(GOT); \ + 3: add reg, pc, reg; \ + ldr reg, [reg, rtmp]; +#else +# define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name +#endif + .text .p2align 2 @@ -52,7 +65,7 @@ _gcry_poly1305_armv7_neon_init_ext: and r2, r2, r2 moveq r14, #-1 ldmia r1!, {r2-r5} - ldr r7, =.Lpoly1305_init_constants_neon + GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8) mov r6, r2 mov r8, r2, lsr #26 mov r9, r3, lsr #20 commit 4a983e3bef58b9d056517e25e0ab10b72d12ceba Author: Jussi Kivilinna Date: Thu Jun 30 21:17:32 2016 +0300 Fix wrong CPU feature #ifdef for SHA1/AVX * cipher/sha1-avx-amd64.S: Check for HAVE_GCC_INLINE_ASM_AVX instead of HAVE_GCC_INLINE_ASM_AVX2 & HAVE_GCC_INLINE_ASM_BMI2. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/sha1-avx-amd64.S b/cipher/sha1-avx-amd64.S index 062a45b..3b3a6d1 100644 --- a/cipher/sha1-avx-amd64.S +++ b/cipher/sha1-avx-amd64.S @@ -31,8 +31,7 @@ #if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ - defined(HAVE_GCC_INLINE_ASM_BMI2) && \ - defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(USE_SHA1) + defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA1) #ifdef __PIC__ # define RIP (%rip) ----------------------------------------------------------------------- Summary of changes: cipher/cipher-ccm.c | 3 +++ cipher/cipher.c | 12 ++++++++++++ cipher/poly1305-armv7-neon.S | 15 ++++++++++++++- cipher/rijndael-arm.S | 12 ++++-------- cipher/sha1-avx-amd64.S | 3 +-- cipher/sha512-arm.S | 3 +-- tests/pubkey.c | 2 +- 7 files changed, 36 insertions(+), 14 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Sun Jul 3 17:42:07 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 Jul 2016 18:42:07 +0300 Subject: [PATCH 1/3] bench-slope: add unaligned buffer mode Message-ID: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6> * tests/bench-slope.c (unaligned_mode): New. (do_slope_benchmark): Unalign buffer if in unaligned mode enabled. (print_help, main): Add '--unaligned' parameter. -- Patch adds --unaligned parameter to allow measurement of unaligned buffer overhead. Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/tests/bench-slope.c b/tests/bench-slope.c index d97494c..cdd0fa6 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -42,6 +42,7 @@ static int verbose; static int csv_mode; +static int unaligned_mode; static int num_measurement_repetitions; /* CPU Ghz value provided by user, allows constructing cycles/byte and other @@ -411,12 +412,14 @@ do_slope_benchmark (struct bench_obj *obj) obj->max_bufsize < 1 || obj->min_bufsize > obj->max_bufsize) goto err_free; - real_buffer = malloc (obj->max_bufsize + 128); + real_buffer = malloc (obj->max_bufsize + 128 + unaligned_mode); if (!real_buffer) goto err_free; /* Get aligned buffer */ buffer = real_buffer; buffer += 128 - ((real_buffer - (unsigned char *) 0) & (128 - 1)); + if (unaligned_mode) + buffer += unaligned_mode; /* Make buffer unaligned */ for (i = 0; i < obj->max_bufsize; i++) buffer[i] = 0x55 ^ (-i); @@ -1748,6 +1751,7 @@ print_help (void) " for benchmarking.", " --repetitions Use N repetitions (default " STR2(NUM_MEASUREMENT_REPETITIONS) ")", + " --unaligned Use unaligned input buffers.", " --csv Use CSV output format", NULL }; @@ -1832,6 +1836,12 @@ main (int argc, char **argv) argc--; argv++; } + else if (!strcmp (*argv, "--unaligned")) + { + unaligned_mode = 1; + argc--; + argv++; + } else if (!strcmp (*argv, "--disable-hwf")) { argc--; From jussi.kivilinna at iki.fi Sun Jul 3 17:42:17 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 Jul 2016 18:42:17 +0300 Subject: [PATCH 3/3] Add ARMv8/Aarch32 Crypto Extension implementation of SHA-1 In-Reply-To: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6> References: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6> Message-ID: <146756053731.707.11731047954620193783.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'sha1-armv8-aarch32-ce.S'. * cipher/sha1-armv7-neon.S (_gcry_sha1_transform_armv7_neon): Add missing size. * cipher/sha1-armv8-aarch32-ce.S: New. * cipher/sha1.c (USE_ARM_CE): New. (sha1_init): Check features for HWF_ARM_SHA1. [USE_ARM_CE] (_gcry_sha1_transform_armv8_ce): New. (transform) [USE_ARM_CE]: Use ARMv8 CE implementation if HW supports it. * cipher/sha1.h (SHA1_CONTEXT): Add 'use_arm_ce'. * configure.ac: Add 'sha1-armv8-aarch32-ce.lo'. -- Benchmark on Cortex-A53 (1152 Mhz): Before (SHA-1 NEON): | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 6.62 ns/B 144.2 MiB/s 7.62 c/B After (~3.7x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 1.78 ns/B 535.5 MiB/s 2.05 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index f60338a..571673e 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -87,7 +87,7 @@ scrypt.c \ seed.c \ serpent.c serpent-sse2-amd64.S serpent-avx2-amd64.S serpent-armv7-neon.S \ sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \ - sha1-armv7-neon.S \ + sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \ sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S sha512-arm.S \ diff --git a/cipher/sha1-armv7-neon.S b/cipher/sha1-armv7-neon.S index f314d8e..61cc541 100644 --- a/cipher/sha1-armv7-neon.S +++ b/cipher/sha1-armv7-neon.S @@ -521,5 +521,6 @@ _gcry_sha1_transform_armv7_neon: .Ldo_nothing: mov r0, #0; bx lr +.size _gcry_sha1_transform_armv7_neon,.-_gcry_sha1_transform_armv7_neon; #endif diff --git a/cipher/sha1-armv8-aarch32-ce.S b/cipher/sha1-armv8-aarch32-ce.S new file mode 100644 index 0000000..055d893 --- /dev/null +++ b/cipher/sha1-armv8-aarch32-ce.S @@ -0,0 +1,219 @@ +/* sha1-armv8-aarch32-ce.S - ARM/CE accelerated SHA-1 transform function + * Copyright (C) 2016 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#include + +#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \ + defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \ + defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA1) + +.syntax unified +.fpu crypto-neon-fp-armv8 +.arm + +.text + +#ifdef __PIC__ +# define GET_DATA_POINTER(reg, name, rtmp) \ + ldr reg, 1f; \ + ldr rtmp, 2f; \ + b 3f; \ + 1: .word _GLOBAL_OFFSET_TABLE_-(3f+8); \ + 2: .word name(GOT); \ + 3: add reg, pc, reg; \ + ldr reg, [reg, rtmp]; +#else +# define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name +#endif + + +/* Constants */ + +#define K1 0x5A827999 +#define K2 0x6ED9EBA1 +#define K3 0x8F1BBCDC +#define K4 0xCA62C1D6 +.align 4 +gcry_sha1_aarch32_ce_K_VEC: +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 + + +/* Register macros */ + +#define qH4 q0 +#define sH4 s0 +#define qH0123 q1 + +#define qABCD q2 +#define qE0 q3 +#define qE1 q4 + +#define qT0 q5 +#define qT1 q6 + +#define qW0 q8 +#define qW1 q9 +#define qW2 q10 +#define qW3 q11 + +#define qK1 q12 +#define qK2 q13 +#define qK3 q14 +#define qK4 q15 + + +/* Round macros */ + +#define _(...) /*_*/ +#define do_add(dst, src0, src1) vadd.u32 dst, src0, src1; +#define do_sha1su0(w0,w1,w2) sha1su0.32 w0,w1,w2; +#define do_sha1su1(w0,w3) sha1su1.32 w0,w3; + +#define do_rounds(f, e0, e1, t, k, w0, w1, w2, w3, add_fn, sha1su0_fn, sha1su1_fn) \ + sha1h.32 e0, qABCD; \ + sha1##f.32 qABCD, e1, t; \ + add_fn( t, w2, k ); \ + sha1su1_fn( w3, w2 ); \ + sha1su0_fn( w0, w1, w2 ); + + +/* Other functional macros */ + +#define CLEAR_REG(reg) veor reg, reg; + + +/* + * unsigned int + * _gcry_sha1_transform_armv8_ce (void *ctx, const unsigned char *data, + * size_t nblks) + */ +.align 3 +.globl _gcry_sha1_transform_armv8_ce +.type _gcry_sha1_transform_armv8_ce,%function; +_gcry_sha1_transform_armv8_ce: + /* input: + * r0: ctx, CTX + * r1: data (64*nblks bytes) + * r2: nblks + */ + + cmp r2, #0; + push {r4,lr}; + beq .Ldo_nothing; + + vpush {q4-q7}; + + GET_DATA_POINTER(r4, .LK_VEC, lr); + + veor qH4, qH4 + vld1.32 {qH0123}, [r0] /* load h0,h1,h2,h3 */ + + vld1.32 {qK1-qK2}, [r4]! /* load K1,K2 */ + vldr sH4, [r0, #16] /* load h4 */ + vld1.32 {qK3-qK4}, [r4] /* load K3,K4 */ + + vld1.8 {qW0-qW1}, [r1]! + vmov qABCD, qH0123 + vld1.8 {qW2-qW3}, [r1]! + + vrev32.8 qW0, qW0 + vrev32.8 qW1, qW1 + vrev32.8 qW2, qW2 + do_add(qT0, qW0, qK1) + vrev32.8 qW3, qW3 + do_add(qT1, qW1, qK1) + +.Loop: + do_rounds(c, qE1, qH4, qT0, qK1, qW0, qW1, qW2, qW3, do_add, do_sha1su0, _) + do_rounds(c, qE0, qE1, qT1, qK1, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1) + do_rounds(c, qE1, qE0, qT0, qK1, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1) + do_rounds(c, qE0, qE1, qT1, qK2, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1) + do_rounds(c, qE1, qE0, qT0, qK2, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1) + + do_rounds(p, qE0, qE1, qT1, qK2, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1) + do_rounds(p, qE1, qE0, qT0, qK2, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1) + do_rounds(p, qE0, qE1, qT1, qK2, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1) + do_rounds(p, qE1, qE0, qT0, qK3, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1) + do_rounds(p, qE0, qE1, qT1, qK3, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1) + + subs r2, r2, #1 + do_rounds(m, qE1, qE0, qT0, qK3, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1) + do_rounds(m, qE0, qE1, qT1, qK3, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1) + do_rounds(m, qE1, qE0, qT0, qK3, qW0, qW1, qW2, qW3, do_add, do_sha1su0, do_sha1su1) + do_rounds(m, qE0, qE1, qT1, qK4, qW1, qW2, qW3, qW0, do_add, do_sha1su0, do_sha1su1) + do_rounds(m, qE1, qE0, qT0, qK4, qW2, qW3, qW0, qW1, do_add, do_sha1su0, do_sha1su1) + + do_rounds(p, qE0, qE1, qT1, qK4, qW3, qW0, qW1, qW2, do_add, do_sha1su0, do_sha1su1) + beq .Lend + + do_rounds(p, qE1, qE0, qT0, qK4, _ , _ , qW2, qW3, do_add, _, do_sha1su1) + vld1.8 {qW0-qW1}, [r1]! /* preload */ + vld1.8 {qW2}, [r1]! + do_rounds(p, qE0, qE1, qT1, qK4, _ , _ , qW3, _ , do_add, _, _) + vrev32.8 qW0, qW0 + vld1.8 {qW3}, [r1]! + vrev32.8 qW1, qW1 + vrev32.8 qW2, qW2 + do_rounds(p, qE1, qE0, qT0, _, _, _, _, _, _, _, _) + vrev32.8 qW3, qW3 + do_rounds(p, qE0, qE1, qT1, _, _, _, _, _, _, _, _) + + do_add(qT0, qW0, qK1) + vadd.u32 qH4, qE0 + vadd.u32 qABCD, qH0123 + do_add(qT1, qW1, qK1) + + vmov qH0123, qABCD + + b .Loop + +.Lend: + do_rounds(p, qE1, qE0, qT0, qK4, _ , _ , qW2, qW3, do_add, _, do_sha1su1) + do_rounds(p, qE0, qE1, qT1, qK4, _ , _ , qW3, _ , do_add, _, _) + do_rounds(p, qE1, qE0, qT0, _, _, _, _, _, _, _, _) + do_rounds(p, qE0, qE1, qT1, _, _, _, _, _, _, _, _) + + vadd.u32 qH4, qE0 + vadd.u32 qH0123, qABCD + + CLEAR_REG(qW0) + CLEAR_REG(qW1) + CLEAR_REG(qW2) + CLEAR_REG(qW3) + CLEAR_REG(qABCD) + CLEAR_REG(qE1) + CLEAR_REG(qE0) + + vstr sH4, [r0, #16] /* store h4 */ + vst1.32 {qH0123}, [r0] /* store h0,h1,h2,h3 */ + + CLEAR_REG(qH0123) + CLEAR_REG(qH4) + vpop {q4-q7} + +.Ldo_nothing: + mov r0, #0 + pop {r4,pc} +.size _gcry_sha1_transform_armv8_ce,.-_gcry_sha1_transform_armv8_ce; + +#endif diff --git a/cipher/sha1.c b/cipher/sha1.c index d15c2a2..e0b68b2 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -76,8 +76,18 @@ && defined(HAVE_GCC_INLINE_ASM_NEON) # define USE_NEON 1 # endif -#endif /*ENABLE_NEON_SUPPORT*/ +#endif +/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly + * code. */ +#undef USE_ARM_CE +#ifdef ENABLE_ARM_CRYPTO_SUPPORT +# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \ + && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \ + && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) +# define USE_ARM_CE 1 +# endif +#endif /* A macro to test whether P is properly aligned for an u32 type. Note that config.h provides a suitable replacement for uintptr_t if @@ -127,6 +137,9 @@ sha1_init (void *context, unsigned int flags) #ifdef USE_NEON hd->use_neon = (features & HWF_ARM_NEON) != 0; #endif +#ifdef USE_ARM_CE + hd->use_arm_ce = (features & HWF_ARM_SHA1) != 0; +#endif (void)features; } @@ -164,13 +177,18 @@ _gcry_sha1_mixblock_init (SHA1_CONTEXT *hd) } while(0) - #ifdef USE_NEON unsigned int _gcry_sha1_transform_armv7_neon (void *state, const unsigned char *data, size_t nblks); #endif +#ifdef USE_ARM_CE +unsigned int +_gcry_sha1_transform_armv8_ce (void *state, const unsigned char *data, + size_t nblks); +#endif + /* * Transform NBLOCKS of each 64 bytes (16 32-bit words) at DATA. */ @@ -340,6 +358,10 @@ transform (void *ctx, const unsigned char *data, size_t nblks) return _gcry_sha1_transform_amd64_ssse3 (&hd->h0, data, nblks) + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif +#ifdef USE_ARM_CE + if (hd->use_arm_ce) + return _gcry_sha1_transform_armv8_ce (&hd->h0, data, nblks); +#endif #ifdef USE_NEON if (hd->use_neon) return _gcry_sha1_transform_armv7_neon (&hd->h0, data, nblks) diff --git a/cipher/sha1.h b/cipher/sha1.h index 6b87631..d448fca 100644 --- a/cipher/sha1.h +++ b/cipher/sha1.h @@ -30,6 +30,7 @@ typedef struct unsigned int use_avx:1; unsigned int use_bmi2:1; unsigned int use_neon:1; + unsigned int use_arm_ce:1; } SHA1_CONTEXT; diff --git a/configure.ac b/configure.ac index 2001596..073b7e9 100644 --- a/configure.ac +++ b/configure.ac @@ -2335,6 +2335,7 @@ case "${host}" in arm*-*-*) # Build with the assembly implementation GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha1-armv7-neon.lo" + GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha1-armv8-aarch32-ce.lo" ;; esac From jussi.kivilinna at iki.fi Sun Jul 3 17:42:12 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 Jul 2016 18:42:12 +0300 Subject: [PATCH 2/3] Add HW feature check for ARMv8 Aarch64 and crypto extensions In-Reply-To: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6> References: <146756052722.707.3663593201783876022.stgit@localhost6.localdomain6> Message-ID: <146756053229.707.15669215069250836807.stgit@localhost6.localdomain6> * configure.ac: Add '--disable-arm-crypto-support'; enable hwf-arm module on 64-bit ARM. (armcryptosupport, gcry_cv_gcc_inline_aarch32_crypto) (gcry_cv_inline_asm_aarch64_neon) (gcry_cv_gcc_inline_asm_aarch64_crypto): New. * src/g10lib.h (HWF_ARM_AES, HWF_ARM_SHA1, HWF_ARM_SHA2) (HWF_ARM_PMULL): New. * src/hwf-arm.c [__aarch64__]: Enable building in Aarch64 mode. (feature_map_s): New. [__arm__] (AT_HWCAP, AT_HWCAP2, HWCAP2_AES, HWCAP2_PMULL) (HWCAP2_SHA1, HWCAP2_SHA2, arm_features): New. [__aarch64__] (AT_HWCAP, AT_HWCAP2, HWCAP_ASIMD, HWCAP_AES) (HWCAP_PMULL, HWCAP_SHA1, HWCAP_SHA2, arm_features): New. (get_hwcap): Add reading of 'AT_HWCAP2'; Change auxv use 'unsigned long'. (detect_arm_at_hwcap): Add mapping of HWCAP/HWCAP2 to HWF flags. (detect_arm_proc_cpuinfo): Add mapping of CPU features to HWF flags. (_gcry_hwf_detect_arm): Use __ARM_NEON instead of legacy __ARM_NEON__. * src/hwfeatures.c (hwflist): Add 'arm-aes', 'arm-sha1', 'arm-sha2' and 'arm-pmull'. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/configure.ac b/configure.ac index 80e64fa..2001596 100644 --- a/configure.ac +++ b/configure.ac @@ -637,6 +637,14 @@ AC_ARG_ENABLE(neon-support, neonsupport=$enableval,neonsupport=yes) AC_MSG_RESULT($neonsupport) +# Implementation of the --disable-arm-crypto-support switch. +AC_MSG_CHECKING([whether ARMv8 Crypto Extension support is requested]) +AC_ARG_ENABLE(arm-crypto-support, + AC_HELP_STRING([--disable-arm-crypto-support], + [Disable support for the ARMv8 Crypto Extension instructions]), + armcryptosupport=$enableval,armcryptosupport=yes) +AC_MSG_RESULT($armcryptosupport) + # Implementation of the --disable-O-flag-munging switch. AC_MSG_CHECKING([whether a -O flag munging is requested]) AC_ARG_ENABLE([O-flag-munging], @@ -1125,7 +1133,10 @@ if test "$mpi_cpu_arch" != "x86" ; then fi if test "$mpi_cpu_arch" != "arm" ; then - neonsupport="n/a" + if test "$mpi_cpu_arch" != "aarch64" ; then + neonsupport="n/a" + armcryptosupport="n/a" + fi fi @@ -1532,6 +1543,116 @@ if test "$gcry_cv_gcc_inline_asm_neon" = "yes" ; then fi +# +# Check whether GCC inline assembler supports Aarch32 Crypto Extension instructions +# +AC_CACHE_CHECK([whether GCC inline assembler supports Aarch32 Crypto Extension instructions], + [gcry_cv_gcc_inline_asm_aarch32_crypto], + [if test "$mpi_cpu_arch" != "arm" ; then + gcry_cv_gcc_inline_asm_aarch32_crypto="n/a" + else + gcry_cv_gcc_inline_asm_aarch32_crypto=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[__asm__( + ".syntax unified\n\t" + ".arm\n\t" + ".fpu crypto-neon-fp-armv8\n\t" + + "sha1h.32 q0, q0;\n\t" + "sha1c.32 q0, q0, q0;\n\t" + "sha1p.32 q0, q0, q0;\n\t" + "sha1su0.32 q0, q0, q0;\n\t" + "sha1su1.32 q0, q0;\n\t" + + "sha256h.32 q0, q0, q0;\n\t" + "sha256h2.32 q0, q0, q0;\n\t" + "sha1p.32 q0, q0, q0;\n\t" + "sha256su0.32 q0, q0;\n\t" + "sha256su1.32 q0, q0, q15;\n\t" + + "aese.8 q0, q0;\n\t" + "aesd.8 q0, q0;\n\t" + "aesmc.8 q0, q0;\n\t" + "aesimc.8 q0, q0;\n\t" + + "vmull.p64 q0, d0, d0;\n\t" + ); + ]])], + [gcry_cv_gcc_inline_asm_aarch32_crypto=yes]) + fi]) +if test "$gcry_cv_gcc_inline_asm_aarch32_crypto" = "yes" ; then + AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO,1, + [Defined if inline assembler supports Aarch32 Crypto Extension instructions]) +fi + + +# +# Check whether GCC inline assembler supports Aarch64 NEON instructions +# +AC_CACHE_CHECK([whether GCC inline assembler supports Aarch64 NEON instructions], + [gcry_cv_gcc_inline_asm_aarch64_neon], + [if test "$mpi_cpu_arch" != "aarch64" ; then + gcry_cv_gcc_inline_asm_aarch64_neon="n/a" + else + gcry_cv_gcc_inline_asm_aarch64_neon=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[__asm__( + ".arch armv8-a\n\t" + "mov w0, \#42;\n\t" + "dup v0.8b, w0;\n\t" + "ld4 {v0.8b,v1.8b,v2.8b,v3.8b},[x0],\#32;\n\t" + ); + ]])], + [gcry_cv_gcc_inline_asm_aarch64_neon=yes]) + fi]) +if test "$gcry_cv_gcc_inline_asm_aarch64_neon" = "yes" ; then + AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH64_NEON,1, + [Defined if inline assembler supports Aarch64 NEON instructions]) +fi + + +# +# Check whether GCC inline assembler supports Aarch64 Crypto Extension instructions +# +AC_CACHE_CHECK([whether GCC inline assembler supports Aarch64 Crypto Extension instructions], + [gcry_cv_gcc_inline_asm_aarch64_crypto], + [if test "$mpi_cpu_arch" != "aarch64" ; then + gcry_cv_gcc_inline_asm_aarch64_crypto="n/a" + else + gcry_cv_gcc_inline_asm_aarch64_crypto=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[__asm__( + ".arch armv8-a+crypto\n\t" + + "sha1h s0, s0;\n\t" + "sha1c q0, s0, v0.4s;\n\t" + "sha1p q0, s0, v0.4s;\n\t" + "sha1su0 v0.4s, v0.4s, v0.4s;\n\t" + "sha1su1 v0.4s, v0.4s;\n\t" + + "sha256h q0, q0, v0.4s;\n\t" + "sha256h2 q0, q0, v0.4s;\n\t" + "sha1p q0, s0, v0.4s;\n\t" + "sha256su0 v0.4s, v0.4s;\n\t" + "sha256su1 v0.4s, v0.4s, v31.4s;\n\t" + + "aese v0.16b, v0.16b;\n\t" + "aesd v0.16b, v0.16b;\n\t" + "aesmc v0.16b, v0.16b;\n\t" + "aesimc v0.16b, v0.16b;\n\t" + + "pmull v0.1q, v0.1d, v31.1d;\n\t" + "pmull2 v0.1q, v0.2d, v31.2d;\n\t" + ); + ]])], + [gcry_cv_gcc_inline_asm_aarch64_crypto=yes]) + fi]) +if test "$gcry_cv_gcc_inline_asm_aarch64_crypto" = "yes" ; then + AC_DEFINE(HAVE_GCC_INLINE_ASM_AARCH64_CRYPTO,1, + [Defined if inline assembler supports Aarch64 Crypto Extension instructions]) +fi + + ####################################### #### Checks for library functions. #### ####################################### @@ -1758,7 +1879,16 @@ if test x"$avx2support" = xyes ; then fi if test x"$neonsupport" = xyes ; then if test "$gcry_cv_gcc_inline_asm_neon" != "yes" ; then - neonsupport="no (unsupported by compiler)" + if test "$gcry_cv_gcc_inline_asm_aarch64_neon" != "yes" ; then + neonsupport="no (unsupported by compiler)" + fi + fi +fi +if test x"$armcryptosupport" = xyes ; then + if test "$gcry_cv_gcc_inline_asm_aarch32_crypto" != "yes" ; then + if test "$gcry_cv_gcc_inline_asm_aarch64_crypto" != "yes" ; then + neonsupport="no (unsupported by compiler)" + fi fi fi @@ -1786,6 +1916,10 @@ if test x"$neonsupport" = xyes ; then AC_DEFINE(ENABLE_NEON_SUPPORT,1, [Enable support for ARM NEON instructions.]) fi +if test x"$armcryptosupport" = xyes ; then + AC_DEFINE(ENABLE_ARM_CRYPTO_SUPPORT,1, + [Enable support for ARMv8 Crypto Extension instructions.]) +fi if test x"$padlocksupport" = xyes ; then AC_DEFINE(ENABLE_PADLOCK_SUPPORT, 1, [Enable support for the PadLock engine.]) @@ -2299,6 +2433,10 @@ case "$mpi_cpu_arch" in AC_DEFINE(HAVE_CPU_ARCH_ARM, 1, [Defined for ARM platforms]) GCRYPT_HWF_MODULES="hwf-arm.lo" ;; + aarch64) + AC_DEFINE(HAVE_CPU_ARCH_ARM, 1, [Defined for ARM Aarch64 platforms]) + GCRYPT_HWF_MODULES="hwf-arm.lo" + ;; esac AC_SUBST([GCRYPT_HWF_MODULES]) @@ -2384,6 +2522,7 @@ GCRY_MSG_SHOW([Try using DRNG (RDRAND): ],[$drngsupport]) GCRY_MSG_SHOW([Try using Intel AVX: ],[$avxsupport]) GCRY_MSG_SHOW([Try using Intel AVX2: ],[$avx2support]) GCRY_MSG_SHOW([Try using ARM NEON: ],[$neonsupport]) +GCRY_MSG_SHOW([Try using ARMv8 crypto: ],[$armcryptosupport]) GCRY_MSG_SHOW([],[]) if test "x${gpg_config_script_warn}" != x; then diff --git a/src/g10lib.h b/src/g10lib.h index 170ffa1..444c868 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -211,6 +211,10 @@ char **_gcry_strtokenize (const char *string, const char *delim); #define HWF_INTEL_AVX2 (1 << 13) #define HWF_ARM_NEON (1 << 14) +#define HWF_ARM_AES (1 << 15) +#define HWF_ARM_SHA1 (1 << 16) +#define HWF_ARM_SHA2 (1 << 17) +#define HWF_ARM_PMULL (1 << 18) gpg_err_code_t _gcry_disable_hw_feature (const char *name); diff --git a/src/hwf-arm.c b/src/hwf-arm.c index 3dc050e..6f0bb95 100644 --- a/src/hwf-arm.c +++ b/src/hwf-arm.c @@ -27,7 +27,7 @@ #include "g10lib.h" #include "hwf-common.h" -#if !defined (__arm__) +#if !defined (__arm__) && !defined (__aarch64__) # error Module build for wrong CPU. #endif @@ -35,23 +35,81 @@ #undef HAS_PROC_CPUINFO #ifdef __linux__ +struct feature_map_s { + unsigned int hwcap_flag; + unsigned int hwcap2_flag; + const char *feature_match; + unsigned int hwf_flag; +}; + #define HAS_SYS_AT_HWCAP 1 +#define HAS_PROC_CPUINFO 1 + +#ifdef __arm__ + +#define AT_HWCAP 16 +#define AT_HWCAP2 26 + +#define HWCAP_NEON 4096 -#define AT_HWCAP 16 -#define HWCAP_NEON 4096 +#define HWCAP2_AES 1 +#define HWCAP2_PMULL 2 +#define HWCAP2_SHA1 3 +#define HWCAP2_SHA2 4 + +static const struct feature_map_s arm_features[] = + { +#ifdef ENABLE_NEON_SUPPORT + { HWCAP_NEON, 0, " neon", HWF_ARM_NEON }, +#endif +#ifdef ENABLE_ARM_CRYPTO_SUPPORT + { 0, HWCAP2_AES, " aes", HWF_ARM_AES }, + { 0, HWCAP2_SHA1," sha1", HWF_ARM_SHA1 }, + { 0, HWCAP2_SHA2, " sha2", HWF_ARM_SHA2 }, + { 0, HWCAP2_PMULL, " pmull", HWF_ARM_PMULL }, +#endif + }; + +#elif defined(__aarch64__) + +#define AT_HWCAP 16 +#define AT_HWCAP2 -1 + +#define HWCAP_ASIMD 2 +#define HWCAP_AES 8 +#define HWCAP_PMULL 16 +#define HWCAP_SHA1 32 +#define HWCAP_SHA2 64 + +static const struct feature_map_s arm_features[] = + { +#ifdef ENABLE_NEON_SUPPORT + { HWCAP_ASIMD, 0, " asimd", HWF_ARM_NEON }, +#endif +#ifdef ENABLE_ARM_CRYPTO_SUPPORT + { HWCAP_AES, 0, " aes", HWF_ARM_AES }, + { HWCAP_SHA1, 0, " sha1", HWF_ARM_SHA1 }, + { HWCAP_SHA2, 0, " sha2", HWF_ARM_SHA2 }, + { HWCAP_PMULL, 0, " pmull", HWF_ARM_PMULL }, +#endif + }; + +#endif static int -get_hwcap(unsigned int *hwcap) +get_hwcap(unsigned int *hwcap, unsigned int *hwcap2) { - struct { unsigned int a_type; unsigned int a_val; } auxv; + struct { unsigned long a_type; unsigned long a_val; } auxv; FILE *f; int err = -1; static int hwcap_initialized = 0; - static unsigned int stored_hwcap; + static unsigned int stored_hwcap = 0; + static unsigned int stored_hwcap2 = 0; if (hwcap_initialized) { *hwcap = stored_hwcap; + *hwcap2 = stored_hwcap2; return 0; } @@ -59,22 +117,31 @@ get_hwcap(unsigned int *hwcap) if (!f) { *hwcap = stored_hwcap; + *hwcap2 = stored_hwcap2; return -1; } while (fread(&auxv, sizeof(auxv), 1, f) > 0) { - if (auxv.a_type != AT_HWCAP) - continue; - - stored_hwcap = auxv.a_val; - hwcap_initialized = 1; - err = 0; - break; + if (auxv.a_type == AT_HWCAP) + { + stored_hwcap = auxv.a_val; + hwcap_initialized = 1; + } + + if (auxv.a_type == AT_HWCAP2) + { + stored_hwcap2 = auxv.a_val; + hwcap_initialized = 1; + } } + if (hwcap_initialized) + err = 0; + fclose(f); *hwcap = stored_hwcap; + *hwcap2 = stored_hwcap2; return err; } @@ -82,29 +149,34 @@ static unsigned int detect_arm_at_hwcap(void) { unsigned int hwcap; + unsigned int hwcap2; unsigned int features = 0; + unsigned int i; - if (get_hwcap(&hwcap) < 0) + if (get_hwcap(&hwcap, &hwcap2) < 0) return features; -#ifdef ENABLE_NEON_SUPPORT - if (hwcap & HWCAP_NEON) - features |= HWF_ARM_NEON; -#endif + for (i = 0; i < DIM(arm_features); i++) + { + if (hwcap & arm_features[i].hwcap_flag) + features |= arm_features[i].hwf_flag; + + if (hwcap2 & arm_features[i].hwcap2_flag) + features |= arm_features[i].hwf_flag; + } return features; } -#define HAS_PROC_CPUINFO 1 - static unsigned int detect_arm_proc_cpuinfo(unsigned int *broken_hwfs) { char buf[1024]; /* large enough */ - char *str_features, *str_neon; + char *str_features, *str_feat; int cpu_implementer, cpu_arch, cpu_variant, cpu_part, cpu_revision; FILE *f; int readlen, i; + size_t mlen; static int cpuinfo_initialized = 0; static unsigned int stored_cpuinfo_features; static unsigned int stored_broken_hwfs; @@ -162,7 +234,11 @@ detect_arm_proc_cpuinfo(unsigned int *broken_hwfs) continue; str += 2; - *cpu_entries[i].value = strtoul(str, NULL, 0); + if (strcmp(cpu_entries[i].name, "CPU architecture") == 0 + && strcmp(str, "Aarch64") == 0) + *cpu_entries[i].value = 8; + else + *cpu_entries[i].value = strtoul(str, NULL, 0); } /* Lines to strings. */ @@ -170,10 +246,19 @@ detect_arm_proc_cpuinfo(unsigned int *broken_hwfs) if (buf[i] == '\n') buf[i] = '\0'; - /* Check for NEON. */ - str_neon = strstr(str_features, " neon"); - if (str_neon && (str_neon[5] == ' ' || str_neon[5] == '\0')) - stored_cpuinfo_features |= HWF_ARM_NEON; + /* Check features. */ + for (i = 0; i < DIM(arm_features); i++) + { + str_feat = strstr(str_features, arm_features[i].feature_match); + if (str_feat) + { + mlen = strlen(arm_features[i].feature_match); + if (str_feat[mlen] == ' ' || str_feat[mlen] == '\0') + { + stored_cpuinfo_features |= arm_features[i].hwf_flag; + } + } + } /* Check for CPUs with broken NEON implementation. See * https://code.google.com/p/chromium/issues/detail?id=341598 @@ -207,7 +292,7 @@ _gcry_hwf_detect_arm (void) ret |= detect_arm_proc_cpuinfo (&broken_hwfs); #endif -#if defined(__ARM_NEON__) && defined(ENABLE_NEON_SUPPORT) +#if defined(__ARM_NEON) && defined(ENABLE_NEON_SUPPORT) ret |= HWF_ARM_NEON; #endif diff --git a/src/hwfeatures.c b/src/hwfeatures.c index 4cafae1..07221e8 100644 --- a/src/hwfeatures.c +++ b/src/hwfeatures.c @@ -56,7 +56,11 @@ static struct { HWF_INTEL_RDRAND, "intel-rdrand" }, { HWF_INTEL_AVX, "intel-avx" }, { HWF_INTEL_AVX2, "intel-avx2" }, - { HWF_ARM_NEON, "arm-neon" } + { HWF_ARM_NEON, "arm-neon" }, + { HWF_ARM_AES, "arm-aes" }, + { HWF_ARM_SHA1, "arm-sha1" }, + { HWF_ARM_SHA2, "arm-sha2" }, + { HWF_ARM_PMULL, "arm-pmull" } }; /* A bit vector with the hardware features which shall not be used. From orthecreedence at gmail.com Thu Jul 7 21:49:02 2016 From: orthecreedence at gmail.com (Andrew Lyon) Date: Thu, 7 Jul 2016 12:49:02 -0700 Subject: Errors compiling libgcrypt-1.7.1 Message-ID: Hello, everyone. I hope this is the correct place to post. I am trying to compile gcrypt 1.7.1 on my machine (MSYS2, Mingw64). I doing the following: tar -xvf libgcrypt-1.7.1.tar.bz2 cd libgcrypt-1.7.1/ mkdir build && cd build/ ../configure --prefix=/usr --enable-static --disable-shared Outputs: Libgcrypt v1.7.1 has been configured as follows: Platform: W32 (x86_64-w64-mingw32) Hardware detection module: hwf-x86 Enabled cipher algorithms: arcfour blowfish cast5 des aes twofish serpent rfc2268 seed camellia idea salsa20 gost28147 chacha20 Enabled digest algorithms: crc gostr3411-94 md4 md5 rmd160 sha1 sha256 sha512 sha3 tiger whirlpool stribog Enabled kdf algorithms: s2k pkdf2 scrypt Enabled pubkey algorithms: dsa elgamal rsa ecc Random number generator: default Using linux capabilities: no Try using Padlock crypto: yes Try using AES-NI crypto: yes Try using Intel PCLMUL: yes Try using Intel SSE4.1: yes Try using DRNG (RDRAND): yes Try using Intel AVX: yes Try using Intel AVX2: yes Try using ARM NEON: n/a Then running: make The build seems to run fine, until I get to the tests, then: /bin/sh ../libtool --tag=CC --mode=link /mingw64/bin/gcc -I/usr/include -L/usr/lib -Wall -no-install -o pubkey.exe pubkey.o ../src/libgcrypt.la ../compat/libcompat.la -lgpg-error libtool: link: warning: `-no-install' is ignored for x86_64-w64-mingw32 libtool: link: warning: assuming `-no-fast-install' instead libtool: link: /mingw64/bin/gcc -I/usr/include -Wall -o pubkey.exe pubkey.o -L/usr/lib ../src/.libs/libgcrypt.a ../compat/.libs/libcompat.a /usr/lib/libgpg-error.a ../src/.libs/libgcrypt.a(pubkey-util.o):pubkey-util.c:(.text+0x6e0): multiple definition of `_gcry_pk_util_get_nbits' pubkey.o:pubkey.c:(.text+0x211): first defined here collect2.exe: error: ld returned 1 exit status make[2]: *** [Makefile:671: pubkey.exe] Error 1 make[2]: Leaving directory '/d/src/libgcrypt-1.7.1/build/tests' make[1]: *** [Makefile:477: all-recursive] Error 1 make[1]: Leaving directory '/d/src/libgcrypt-1.7.1/build' make: *** [Makefile:408: all] Error 2 Note that I tried running configure with --disable-asm / make CFLAGS+="-DNO_ASM" as well, no luck (I read somewhere that might help with build problems on windows). I tried building v1.7.0 as well, same result, while v1.6.5 compiles just fine. Ok, so it failed while building the tests. I figured I'd try running them to see if it works regardless: ./tests/basic selftest for CTR failed - see syslog for details pass 0, algo 7, mode 1, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 2, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 5, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 6, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 8, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 9, gcry_cipher_setkey failed: Selftest failed pass 0, algo 7, mode 11, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 1, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 2, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 5, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 6, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 8, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 9, gcry_cipher_setkey failed: Selftest failed pass 0, algo 8, mode 11, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 1, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 2, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 5, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 3, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 6, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 8, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 9, gcry_cipher_setkey failed: Selftest failed pass 0, algo 9, mode 11, gcry_cipher_setkey failed: Selftest failed aes-cbc-cts, gcry_cipher_setkey failed: Selftest failed cbc-mac algo 7, gcry_cipher_setkey failed: Selftest failed aes-ctr, gcry_cipher_setkey failed: Selftest failed aes-cfb, gcry_cipher_setkey failed: Selftest failed aes-ofb, gcry_cipher_setkey failed: Selftest failed cipher-ccm, gcry_cipher_setkey failed: Selftest failed aes-gcm, gcry_cipher_setkey failed: Selftest failed aes-gcm, gcry_cipher_setkey failed: Selftest failed aes-gcm, gcry_cipher_setkey failed: Selftest failed aes-gcm, gcry_cipher_setkey failed: Selftest failed cipher-ocb, gcry_cipher_setkey failed (tv 0): Selftest failed cipher-ocb, gcry_cipher_setkey failed (tv 0): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 7): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb, gcry_cipher_setkey failed (large, algo 9): Selftest failed cipher-ocb-splitaad, gcry_cipher_setkey failed: Selftest failed gcry_cipher_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_setkey failed: Selftest failed algo 201, mac gcry_mac_s I tried using the lib in my app anyway (it builds fine against the static lib) but I just keep getting "Selftest failed" errors. Once again, v1.6.5 builds fine, and I get no "Selftest error" when using it in my app. Any ideas? Thanks for the help! Please let me know if I omitted any details. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From jussi.kivilinna at iki.fi Fri Jul 8 00:28:27 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 08 Jul 2016 01:28:27 +0300 Subject: [PATCH] Fix unaligned accesses with ldm/stm in ChaCha20 and Poly1305 ARM/NEON Message-ID: <146793050732.767.4788160689520723569.stgit@localhost6.localdomain6> * cipher/chacha20-armv7-neon.S (UNALIGNED_STMIA8) (UNALIGNED_LDMIA4): New. (_gcry_chacha20_armv7_neon_blocks): Use new helper macros instead of ldm/stm instructions directly. * cipher/poly1305-armv7-neon.S (UNALIGNED_LDMIA2) (UNALIGNED_LDMIA4): New. (_gcry_poly1305_armv7_neon_init_ext, _gcry_poly1305_armv7_neon_blocks) (_gcry_poly1305_armv7_neon_finish_ext): Use new helper macros instead of ldm instruction directly. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S index 1a395ba..4d3340b 100644 --- a/cipher/chacha20-armv7-neon.S +++ b/cipher/chacha20-armv7-neon.S @@ -33,6 +33,40 @@ .fpu neon .arm +#define UNALIGNED_STMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0-d3}; \ + vmov s0, l0; \ + vmov s1, l1; \ + vmov s2, l2; \ + vmov s3, l3; \ + vmov s4, l4; \ + vmov s5, l5; \ + vmov s6, l6; \ + vmov s7, l7; \ + vst1.32 {d0-d3}, [ptr]; \ + add ptr, #32; \ + vpop {d0-d3}; \ + b 2f; \ + 1: stmia ptr!, {l0-l7}; \ + 2: ; + +#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \ + tst ptr, #3; \ + /*beq 1f;*/ \ + vpush {d0-d1}; \ + vld1.32 {d0-d1}, [ptr]; \ + add ptr, #16; \ + vmov l0, s0; \ + vmov l1, s1; \ + vmov l2, s2; \ + vmov l3, s3; \ + vpop {d0-d1}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l3}; \ + 2: ; + .text .globl _gcry_chacha20_armv7_neon_blocks @@ -352,7 +386,8 @@ _gcry_chacha20_armv7_neon_blocks: add r7, r7, r11 vadd.i32 q11, q11, q14 beq .Lchacha_blocks_neon_nomessage11 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -367,7 +402,8 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage11: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) + tst r12, r12 ldm sp, {r0-r7} ldr r8, [sp, #(64 +32)] ldr r9, [sp, #(64 +36)] @@ -391,7 +427,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 str r9, [sp, #(64 +52)] beq .Lchacha_blocks_neon_nomessage12 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -406,7 +443,8 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage12: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) + tst r12, r12 beq .Lchacha_blocks_neon_nomessage13 vld1.32 {q12,q13}, [r12]! vld1.32 {q14,q15}, [r12]! @@ -613,7 +651,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 add r7, r7, r11 beq .Lchacha_blocks_neon_nomessage21 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -628,7 +667,7 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage21: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) ldm sp, {r0-r7} ldr r8, [sp, #(64 +32)] ldr r9, [sp, #(64 +36)] @@ -652,7 +691,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 str r9, [sp, #(64 +52)] beq .Lchacha_blocks_neon_nomessage22 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -667,7 +707,7 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage22: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) str r12, [sp, #48] str r14, [sp, #40] ldr r3, [sp, #52] diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S index b1554ed..13cb4a5 100644 --- a/cipher/poly1305-armv7-neon.S +++ b/cipher/poly1305-armv7-neon.S @@ -46,6 +46,32 @@ # define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name #endif +#define UNALIGNED_LDMIA2(ptr, l0, l1) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0}; \ + vld1.32 {d0}, [ptr]!; \ + vmov l0, s0; \ + vmov l1, s1; \ + vpop {d0}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l1}; \ + 2: ; + +#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0-d1}; \ + vld1.32 {d0-d1}, [ptr]!; \ + vmov l0, s0; \ + vmov l1, s1; \ + vmov l2, s2; \ + vmov l3, s3; \ + vpop {d0-d1}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l3}; \ + 2: ; + .text .p2align 2 @@ -64,7 +90,7 @@ _gcry_poly1305_armv7_neon_init_ext: mov r14, r2 and r2, r2, r2 moveq r14, #-1 - ldmia r1!, {r2-r5} + UNALIGNED_LDMIA4(r1, r2, r3, r4, r5) GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8) mov r6, r2 mov r8, r2, lsr #26 @@ -175,7 +201,7 @@ _gcry_poly1305_armv7_neon_init_ext: eor r6, r6, r6 stmia r0!, {r2-r6} stmia r0!, {r2-r6} - ldmia r1!, {r2-r5} + UNALIGNED_LDMIA4(r1, r2, r3, r4, r5) stmia r0, {r2-r6} add sp, sp, #32 ldmfd sp!, {r4-r11, lr} @@ -286,7 +312,7 @@ _gcry_poly1305_armv7_neon_blocks: vmov d14, d12 vmul.i32 q6, q5, d0[0] .Lpoly1305_blocks_neon_mainloop: - ldmia r0!, {r2-r5} + UNALIGNED_LDMIA4(r0, r2, r3, r4, r5) vmull.u32 q0, d25, d12[0] mov r7, r2, lsr #26 vmlal.u32 q0, d24, d12[1] @@ -302,7 +328,7 @@ _gcry_poly1305_armv7_neon_blocks: orr r4, r8, r4, lsl #12 orr r5, r9, r5, lsl #18 vmlal.u32 q1, d24, d13[0] - ldmia r0!, {r7-r10} + UNALIGNED_LDMIA4(r0, r7, r8, r9, r10) vmlal.u32 q1, d23, d13[1] mov r1, r7, lsr #26 vmlal.u32 q1, d22, d14[0] @@ -344,7 +370,7 @@ _gcry_poly1305_armv7_neon_blocks: vmlal.u32 q4, d21, d11[1] vld1.64 {d21-d24}, [r14, :256]! vld1.64 {d25}, [r14, :64] - ldmia r0!, {r2-r5} + UNALIGNED_LDMIA4(r0, r2, r3, r4, r5) vmlal.u32 q0, d25, d26 mov r7, r2, lsr #26 vmlal.u32 q0, d24, d27 @@ -360,7 +386,7 @@ _gcry_poly1305_armv7_neon_blocks: orr r4, r8, r4, lsl #12 orr r5, r9, r5, lsl #18 vmlal.u32 q1, d24, d28 - ldmia r0!, {r7-r10} + UNALIGNED_LDMIA4(r0, r7, r8, r9, r10) vmlal.u32 q1, d23, d29 mov r1, r7, lsr #26 vmlal.u32 q1, d22, d20 @@ -643,7 +669,7 @@ _gcry_poly1305_armv7_neon_finish_ext: .Lpoly1305_finish_ext_neon_skip16: tst r7, #8 beq .Lpoly1305_finish_ext_neon_skip8 - ldmia r1!, {r10-r11} + UNALIGNED_LDMIA2(r1, r10, r11) stmia r9!, {r10-r11} .Lpoly1305_finish_ext_neon_skip8: tst r7, #4 From mfpnb at plass-family.net Fri Jul 1 00:53:41 2016 From: mfpnb at plass-family.net (Michael Plass) Date: Thu, 30 Jun 2016 15:53:41 -0700 Subject: PIC, alignment problems with libcrypt on armv7 In-Reply-To: <5775916F.8010505@iki.fi> References: <5774CDA5.2090001@iki.fi> <5775916F.8010505@iki.fi> Message-ID: On Jun 30, 2016, at 2:38 PM, Jussi Kivilinna wrote: > On 30.06.2016 10:43, Jussi Kivilinna wrote: > > Appears that there is proper tests for unaligned buffers. However those > tests did not fail for me, since on Linux unaligned ldm/stm exception > is caught and handled by kernel. > > -Jussi > > > I see - that explains why this was not noticed before. A bus error in the basic tests is what called my attention to this. Thanks, - Michael From mfpnb at plass-family.net Fri Jul 1 08:17:58 2016 From: mfpnb at plass-family.net (Michael Plass) Date: Thu, 30 Jun 2016 23:17:58 -0700 Subject: PIC, alignment problems with libcrypt on armv7 In-Reply-To: <5774CDA5.2090001@iki.fi> References: <5774CDA5.2090001@iki.fi> Message-ID: On Jun 30, 2016, at 12:43 AM, Jussi Kivilinna wrote: > I wonder if there is automated way to check resulting library for such > non-PIC references. If there is, such check could be incorporated to > build process and abort build if found. Nick Hudson on the netbsd lists suggests this: > > There is the ld(1) option --warn-shared-textrel and this with --fatal-warning should > do the trick. > From cvs at cvs.gnupg.org Fri Jul 8 12:31:13 2016 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Fri, 08 Jul 2016 12:31:13 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-17-g1111d31 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 1111d311fd6452abd4080d1072c75ddb1b5a3dd1 (commit) via 496790940753226f96b731a43d950bd268acd97a (commit) from cb79630ec567a5f2e03e5f863cda168faa7b8cc8 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 1111d311fd6452abd4080d1072c75ddb1b5a3dd1 Author: Jussi Kivilinna Date: Fri Jul 8 01:22:58 2016 +0300 Fix unaligned accesses with ldm/stm in ChaCha20 and Poly1305 ARM/NEON * cipher/chacha20-armv7-neon.S (UNALIGNED_STMIA8) (UNALIGNED_LDMIA4): New. (_gcry_chacha20_armv7_neon_blocks): Use new helper macros instead of ldm/stm instructions directly. * cipher/poly1305-armv7-neon.S (UNALIGNED_LDMIA2) (UNALIGNED_LDMIA4): New. (_gcry_poly1305_armv7_neon_init_ext, _gcry_poly1305_armv7_neon_blocks) (_gcry_poly1305_armv7_neon_finish_ext): Use new helper macros instead of ldm instruction directly. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S index 1a395ba..4d3340b 100644 --- a/cipher/chacha20-armv7-neon.S +++ b/cipher/chacha20-armv7-neon.S @@ -33,6 +33,40 @@ .fpu neon .arm +#define UNALIGNED_STMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0-d3}; \ + vmov s0, l0; \ + vmov s1, l1; \ + vmov s2, l2; \ + vmov s3, l3; \ + vmov s4, l4; \ + vmov s5, l5; \ + vmov s6, l6; \ + vmov s7, l7; \ + vst1.32 {d0-d3}, [ptr]; \ + add ptr, #32; \ + vpop {d0-d3}; \ + b 2f; \ + 1: stmia ptr!, {l0-l7}; \ + 2: ; + +#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \ + tst ptr, #3; \ + /*beq 1f;*/ \ + vpush {d0-d1}; \ + vld1.32 {d0-d1}, [ptr]; \ + add ptr, #16; \ + vmov l0, s0; \ + vmov l1, s1; \ + vmov l2, s2; \ + vmov l3, s3; \ + vpop {d0-d1}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l3}; \ + 2: ; + .text .globl _gcry_chacha20_armv7_neon_blocks @@ -352,7 +386,8 @@ _gcry_chacha20_armv7_neon_blocks: add r7, r7, r11 vadd.i32 q11, q11, q14 beq .Lchacha_blocks_neon_nomessage11 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -367,7 +402,8 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage11: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) + tst r12, r12 ldm sp, {r0-r7} ldr r8, [sp, #(64 +32)] ldr r9, [sp, #(64 +36)] @@ -391,7 +427,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 str r9, [sp, #(64 +52)] beq .Lchacha_blocks_neon_nomessage12 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -406,7 +443,8 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage12: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) + tst r12, r12 beq .Lchacha_blocks_neon_nomessage13 vld1.32 {q12,q13}, [r12]! vld1.32 {q14,q15}, [r12]! @@ -613,7 +651,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 add r7, r7, r11 beq .Lchacha_blocks_neon_nomessage21 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -628,7 +667,7 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage21: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) ldm sp, {r0-r7} ldr r8, [sp, #(64 +32)] ldr r9, [sp, #(64 +36)] @@ -652,7 +691,8 @@ _gcry_chacha20_armv7_neon_blocks: tst r12, r12 str r9, [sp, #(64 +52)] beq .Lchacha_blocks_neon_nomessage22 - ldmia r12!, {r8-r11} + UNALIGNED_LDMIA4(r12, r8, r9, r10, r11) + tst r12, r12 eor r0, r0, r8 eor r1, r1, r9 eor r2, r2, r10 @@ -667,7 +707,7 @@ _gcry_chacha20_armv7_neon_blocks: add r12, r12, #16 eor r7, r7, r11 .Lchacha_blocks_neon_nomessage22: - stmia r14!, {r0-r7} + UNALIGNED_STMIA8(r14, r0, r1, r2, r3, r4, r5, r6, r7) str r12, [sp, #48] str r14, [sp, #40] ldr r3, [sp, #52] diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S index b1554ed..13cb4a5 100644 --- a/cipher/poly1305-armv7-neon.S +++ b/cipher/poly1305-armv7-neon.S @@ -46,6 +46,32 @@ # define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name #endif +#define UNALIGNED_LDMIA2(ptr, l0, l1) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0}; \ + vld1.32 {d0}, [ptr]!; \ + vmov l0, s0; \ + vmov l1, s1; \ + vpop {d0}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l1}; \ + 2: ; + +#define UNALIGNED_LDMIA4(ptr, l0, l1, l2, l3) \ + tst ptr, #3; \ + beq 1f; \ + vpush {d0-d1}; \ + vld1.32 {d0-d1}, [ptr]!; \ + vmov l0, s0; \ + vmov l1, s1; \ + vmov l2, s2; \ + vmov l3, s3; \ + vpop {d0-d1}; \ + b 2f; \ + 1: ldmia ptr!, {l0-l3}; \ + 2: ; + .text .p2align 2 @@ -64,7 +90,7 @@ _gcry_poly1305_armv7_neon_init_ext: mov r14, r2 and r2, r2, r2 moveq r14, #-1 - ldmia r1!, {r2-r5} + UNALIGNED_LDMIA4(r1, r2, r3, r4, r5) GET_DATA_POINTER(r7,.Lpoly1305_init_constants_neon,r8) mov r6, r2 mov r8, r2, lsr #26 @@ -175,7 +201,7 @@ _gcry_poly1305_armv7_neon_init_ext: eor r6, r6, r6 stmia r0!, {r2-r6} stmia r0!, {r2-r6} - ldmia r1!, {r2-r5} + UNALIGNED_LDMIA4(r1, r2, r3, r4, r5) stmia r0, {r2-r6} add sp, sp, #32 ldmfd sp!, {r4-r11, lr} @@ -286,7 +312,7 @@ _gcry_poly1305_armv7_neon_blocks: vmov d14, d12 vmul.i32 q6, q5, d0[0] .Lpoly1305_blocks_neon_mainloop: - ldmia r0!, {r2-r5} + UNALIGNED_LDMIA4(r0, r2, r3, r4, r5) vmull.u32 q0, d25, d12[0] mov r7, r2, lsr #26 vmlal.u32 q0, d24, d12[1] @@ -302,7 +328,7 @@ _gcry_poly1305_armv7_neon_blocks: orr r4, r8, r4, lsl #12 orr r5, r9, r5, lsl #18 vmlal.u32 q1, d24, d13[0] - ldmia r0!, {r7-r10} + UNALIGNED_LDMIA4(r0, r7, r8, r9, r10) vmlal.u32 q1, d23, d13[1] mov r1, r7, lsr #26 vmlal.u32 q1, d22, d14[0] @@ -344,7 +370,7 @@ _gcry_poly1305_armv7_neon_blocks: vmlal.u32 q4, d21, d11[1] vld1.64 {d21-d24}, [r14, :256]! vld1.64 {d25}, [r14, :64] - ldmia r0!, {r2-r5} + UNALIGNED_LDMIA4(r0, r2, r3, r4, r5) vmlal.u32 q0, d25, d26 mov r7, r2, lsr #26 vmlal.u32 q0, d24, d27 @@ -360,7 +386,7 @@ _gcry_poly1305_armv7_neon_blocks: orr r4, r8, r4, lsl #12 orr r5, r9, r5, lsl #18 vmlal.u32 q1, d24, d28 - ldmia r0!, {r7-r10} + UNALIGNED_LDMIA4(r0, r7, r8, r9, r10) vmlal.u32 q1, d23, d29 mov r1, r7, lsr #26 vmlal.u32 q1, d22, d20 @@ -643,7 +669,7 @@ _gcry_poly1305_armv7_neon_finish_ext: .Lpoly1305_finish_ext_neon_skip16: tst r7, #8 beq .Lpoly1305_finish_ext_neon_skip8 - ldmia r1!, {r10-r11} + UNALIGNED_LDMIA2(r1, r10, r11) stmia r9!, {r10-r11} .Lpoly1305_finish_ext_neon_skip8: tst r7, #4 commit 496790940753226f96b731a43d950bd268acd97a Author: Jussi Kivilinna Date: Sun Jul 3 18:39:40 2016 +0300 bench-slope: add unaligned buffer mode * tests/bench-slope.c (unaligned_mode): New. (do_slope_benchmark): Unalign buffer if in unaligned mode enabled. (print_help, main): Add '--unaligned' parameter. -- Patch adds --unaligned parameter to allow measurement of unaligned buffer overhead. Signed-off-by: Jussi Kivilinna diff --git a/tests/bench-slope.c b/tests/bench-slope.c index d97494c..cdd0fa6 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -42,6 +42,7 @@ static int verbose; static int csv_mode; +static int unaligned_mode; static int num_measurement_repetitions; /* CPU Ghz value provided by user, allows constructing cycles/byte and other @@ -411,12 +412,14 @@ do_slope_benchmark (struct bench_obj *obj) obj->max_bufsize < 1 || obj->min_bufsize > obj->max_bufsize) goto err_free; - real_buffer = malloc (obj->max_bufsize + 128); + real_buffer = malloc (obj->max_bufsize + 128 + unaligned_mode); if (!real_buffer) goto err_free; /* Get aligned buffer */ buffer = real_buffer; buffer += 128 - ((real_buffer - (unsigned char *) 0) & (128 - 1)); + if (unaligned_mode) + buffer += unaligned_mode; /* Make buffer unaligned */ for (i = 0; i < obj->max_bufsize; i++) buffer[i] = 0x55 ^ (-i); @@ -1748,6 +1751,7 @@ print_help (void) " for benchmarking.", " --repetitions Use N repetitions (default " STR2(NUM_MEASUREMENT_REPETITIONS) ")", + " --unaligned Use unaligned input buffers.", " --csv Use CSV output format", NULL }; @@ -1832,6 +1836,12 @@ main (int argc, char **argv) argc--; argv++; } + else if (!strcmp (*argv, "--unaligned")) + { + unaligned_mode = 1; + argc--; + argv++; + } else if (!strcmp (*argv, "--disable-hwf")) { argc--; ----------------------------------------------------------------------- Summary of changes: cipher/chacha20-armv7-neon.S | 56 +++++++++++++++++++++++++++++++++++++------- cipher/poly1305-armv7-neon.S | 40 +++++++++++++++++++++++++------ tests/bench-slope.c | 12 +++++++++- 3 files changed, 92 insertions(+), 16 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Tue Jul 12 11:54:01 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 12 Jul 2016 12:54:01 +0300 Subject: [PATCH 2/3] Add ARMv8/AArch32 Crypto Extension implementation of GCM In-Reply-To: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6> References: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6> Message-ID: <146831724159.1076.14802585401018300755.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch32-ce.S'. * cipher/cipher-gcm-armv8-aarch32-ce.S: New. * cipher/cipher-gcm.c [GCM_USE_ARM_PMULL] (_gcry_ghash_setup_armv8_ce_pmull, _gcry_ghash_armv8_ce_pmull) (ghash_setup_armv8_ce_pmull, ghash_armv8_ce_pmull): New. (setupM) [GCM_USE_ARM_PMULL]: Enable ARM PMULL implementation if HWF_ARM_PULL HW feature flag is enabled. * cipher/cipher-gcm.h (GCM_USE_ARM_PMULL): New. -- Benchmark on Cortex-A53 (1152 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 24.10 ns/B 39.57 MiB/s 27.76 c/B After (~26x faster): | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 0.924 ns/B 1032.2 MiB/s 1.06 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 1e97050..5d69a38 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -43,6 +43,7 @@ libcipher_la_SOURCES = \ cipher.c cipher-internal.h \ cipher-cbc.c cipher-cfb.c cipher-ofb.c cipher-ctr.c cipher-aeswrap.c \ cipher-ccm.c cipher-cmac.c cipher-gcm.c cipher-gcm-intel-pclmul.c \ + cipher-gcm-armv8-aarch32-ce.S \ cipher-poly1305.c cipher-ocb.c \ cipher-selftest.c cipher-selftest.h \ pubkey.c pubkey-internal.h pubkey-util.c \ diff --git a/cipher/cipher-gcm-armv8-aarch32-ce.S b/cipher/cipher-gcm-armv8-aarch32-ce.S new file mode 100644 index 0000000..b879fb2 --- /dev/null +++ b/cipher/cipher-gcm-armv8-aarch32-ce.S @@ -0,0 +1,235 @@ +/* cipher-gcm-armv8-aarch32-ce.S - ARM/CE accelerated GHASH + * Copyright (C) 2016 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#include + +#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \ + defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \ + defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) + +.syntax unified +.fpu crypto-neon-fp-armv8 +.arm + +.text + +#ifdef __PIC__ +# define GET_DATA_POINTER(reg, name, rtmp) \ + ldr reg, 1f; \ + ldr rtmp, 2f; \ + b 3f; \ + 1: .word _GLOBAL_OFFSET_TABLE_-(3f+8); \ + 2: .word name(GOT); \ + 3: add reg, pc, reg; \ + ldr reg, [reg, rtmp]; +#else +# define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name +#endif + + +/* Constants */ + +.align 4 +gcry_gcm_reduction_constant: +.Lrconst64: + .quad 0xc200000000000000 + + +/* Register macros */ + +#define rhash q0 +#define rhash_l d0 +#define rhash_h d1 + +#define rbuf q1 +#define rbuf_l d2 +#define rbuf_h d3 + +#define rh0 q2 +#define rh0_l d4 +#define rh0_h d5 + +#define rt0 q3 +#define rt0_l d6 +#define rt0_h d7 + +#define rr0 q8 +#define rr0_l d16 +#define rr0_h d17 + +#define rr1 q9 +#define rr1_l d18 +#define rr1_h d19 + +#define rrconst q15 +#define rrconst_l d30 +#define rrconst_h d31 + +#define ia rbuf_h +#define ib rbuf_l +#define oa rh0_l +#define ob rh0_h +#define co rrconst_l +#define ma rrconst_h + +/* GHASH macros */ + +/* See "Gouv?a, C. P. L. & L?pez, J. Implementing GCM on ARMv8. Topics in + * Cryptology ? CT-RSA 2015" for details. + */ + +/* Input: 'a' and 'b', Output: 'r0:r1' (low 128-bits in r0, high in r1) */ +#define PMUL_128x128(r0, r1, a, b, t, interleave_op) \ + veor t##_h, b##_l, b##_h; \ + veor t##_l, a##_l, a##_h; \ + vmull.p64 r0, a##_l, b##_l; \ + vmull.p64 r1, a##_h, b##_h; \ + vmull.p64 t, t##_h, t##_l; \ + interleave_op(); \ + veor t, r0; \ + veor t, r1; \ + veor r0##_h, t##_l; \ + veor r1##_l, t##_h; + +/* Input: 'r0:r1', Output: 'a' */ +#define REDUCTION(a, r0, r1, rconst, t, interleave_op) \ + vmull.p64 t, r0##_l, rconst; \ + veor r0##_h, t##_l; \ + veor r1##_l, t##_h; \ + interleave_op(); \ + vmull.p64 t, r0##_h, rconst; \ + veor r1, t; \ + veor a, r0, r1; + +#define _(...) /*_*/ +#define vrev_rbuf() vrev64.8 rbuf, rbuf; +#define vext_rbuf() vext.8 rbuf, rbuf, rbuf, #8; + +/* Other functional macros */ + +#define CLEAR_REG(reg) veor reg, reg; + + +/* + * unsigned int _gcry_ghash_armv8_ce_pmull (void *gcm_key, byte *result, + * const byte *buf, size_t nblocks, + * void *gcm_table); + */ +.align 3 +.globl _gcry_ghash_armv8_ce_pmull +.type _gcry_ghash_armv8_ce_pmull,%function; +_gcry_ghash_armv8_ce_pmull: + /* input: + * r0: gcm_key + * r1: result/hash + * r2: buf + * r3: nblocks + * %st+0: gcm_table + */ + push {r4, lr} + + cmp r3, #0 + beq .Ldo_nothing + + GET_DATA_POINTER(lr, .Lrconst64, r4) + + subs r3, r3, #1 + vld1.64 {rhash}, [r1] + vld1.64 {rh0}, [r0] + + vrev64.8 rhash, rhash /* byte-swap */ + vld1.64 {rrconst_h}, [lr] + vext.8 rhash, rhash, rhash, #8 + + vld1.64 {rbuf}, [r2]! + + vrev64.8 rbuf, rbuf /* byte-swap */ + vext.8 rbuf, rbuf, rbuf, #8 + + veor rhash, rhash, rbuf + + beq .Lend + +.Loop: + vld1.64 {rbuf}, [r2]! + subs r3, r3, #1 + PMUL_128x128(rr0, rr1, rh0, rhash, rt0, vrev_rbuf) + REDUCTION(rhash, rr0, rr1, rrconst_h, rt0, vext_rbuf) + veor rhash, rhash, rbuf + + bne .Loop + +.Lend: + PMUL_128x128(rr0, rr1, rh0, rhash, rt0, _) + REDUCTION(rhash, rr0, rr1, rrconst_h, rt0, _) + + CLEAR_REG(rr1) + CLEAR_REG(rr0) + vrev64.8 rhash, rhash /* byte-swap */ + CLEAR_REG(rbuf) + CLEAR_REG(rt0) + vext.8 rhash, rhash, rhash, #8 + CLEAR_REG(rh0) + + vst1.64 {rhash}, [r1] + CLEAR_REG(rhash) + +.Ldo_nothing: + mov r0, #0 + pop {r4, pc} +.size _gcry_ghash_armv8_ce_pmull,.-_gcry_ghash_armv8_ce_pmull; + + +/* + * void _gcry_ghash_setup_armv8_ce_pmull (void *gcm_key, void *gcm_table); + */ +.align 3 +.globl _gcry_ghash_setup_armv8_ce_pmull +.type _gcry_ghash_setup_armv8_ce_pmull,%function; +_gcry_ghash_setup_armv8_ce_pmull: + /* input: + * r0: gcm_key + * r1: gcm_table + */ + + push {r4, lr} + + GET_DATA_POINTER(r4, .Lrconst64, lr) + + /* H <<< 1 */ + vld1.64 {ib,ia}, [r0] + vld1.64 {co}, [r4] + vrev64.8 ib, ib; + vrev64.8 ia, ia; + vshr.s64 ma, ib, #63 + vshr.u64 oa, ib, #63 + vshr.u64 ob, ia, #63 + vand ma, co + vshl.u64 ib, ib, #1 + vshl.u64 ia, ia, #1 + vorr ob, ib + vorr oa, ia + veor ob, ma + + vst1.64 {oa, ob}, [r0] + + pop {r4, pc} +.size _gcry_ghash_setup_armv8_ce_pmull,.-_gcry_ghash_setup_armv8_ce_pmull; + +#endif diff --git a/cipher/cipher-gcm.c b/cipher/cipher-gcm.c index 6e0959a..2b8b454 100644 --- a/cipher/cipher-gcm.c +++ b/cipher/cipher-gcm.c @@ -37,6 +37,30 @@ extern unsigned int _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, size_t nblocks); #endif +#ifdef GCM_USE_ARM_PMULL +extern void _gcry_ghash_setup_armv8_ce_pmull (void *gcm_key, void *gcm_table); + +extern unsigned int _gcry_ghash_armv8_ce_pmull (void *gcm_key, byte *result, + const byte *buf, size_t nblocks, + void *gcm_table); + +static void +ghash_setup_armv8_ce_pmull (gcry_cipher_hd_t c) +{ + _gcry_ghash_setup_armv8_ce_pmull(c->u_mode.gcm.u_ghash_key.key, + c->u_mode.gcm.gcm_table); +} + +static unsigned int +ghash_armv8_ce_pmull (gcry_cipher_hd_t c, byte *result, const byte *buf, + size_t nblocks) +{ + return _gcry_ghash_armv8_ce_pmull(c->u_mode.gcm.u_ghash_key.key, result, buf, + nblocks, c->u_mode.gcm.gcm_table); +} + +#endif + #ifdef GCM_USE_TABLES static const u16 gcmR[256] = { @@ -379,15 +403,26 @@ ghash_internal (gcry_cipher_hd_t c, byte *result, const byte *buf, static void setupM (gcry_cipher_hd_t c) { +#if defined(GCM_USE_INTEL_PCLMUL) || defined(GCM_USE_ARM_PMULL) + unsigned int features = _gcry_get_hw_features (); +#endif + if (0) ; #ifdef GCM_USE_INTEL_PCLMUL - else if (_gcry_get_hw_features () & HWF_INTEL_PCLMUL) + else if (features & HWF_INTEL_PCLMUL) { c->u_mode.gcm.ghash_fn = _gcry_ghash_intel_pclmul; _gcry_ghash_setup_intel_pclmul (c); } #endif +#ifdef GCM_USE_ARM_PMULL + else if (features & HWF_ARM_PMULL) + { + c->u_mode.gcm.ghash_fn = ghash_armv8_ce_pmull; + ghash_setup_armv8_ce_pmull (c); + } +#endif else { c->u_mode.gcm.ghash_fn = ghash_internal; diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 9fd1d91..52504f6 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -72,6 +72,16 @@ # endif #endif /* GCM_USE_INTEL_PCLMUL */ +/* GCM_USE_ARM_PMULL indicates whether to compile GCM with ARMv8 PMULL code. */ +#undef GCM_USE_ARM_PMULL +#if defined(ENABLE_ARM_CRYPTO_SUPPORT) && defined(GCM_USE_TABLES) +# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \ + && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \ + && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) +# define GCM_USE_ARM_PMULL 1 +# endif +#endif /* GCM_USE_ARM_PMULL */ + typedef unsigned int (*ghash_fn_t) (gcry_cipher_hd_t c, byte *result, const byte *buf, size_t nblocks); From jussi.kivilinna at iki.fi Tue Jul 12 11:53:56 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 12 Jul 2016 12:53:56 +0300 Subject: [PATCH 1/3] Add ARMv8/AArch32 Crypto Extension implemenation of SHA-256 Message-ID: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'sha256-armv8-aarch32-ce.S'. * cipher/sha256-armv8-aarch32-ce.S: New. * cipher/sha256.c (USE_ARM_CE): New. (sha256_init, sha224_init): Check features for HWF_ARM_SHA1. [USE_ARM_CE] (_gcry_sha256_transform_armv8_ce): New. (transform) [USE_ARM_CE]: Use ARMv8 CE implementation if HW supports. (SHA256_CONTEXT): Add 'use_arm_ce'. * configure.ac: Add 'sha256-armv8-aarch32-ce.lo'. -- Benchmark on Cortex-A53 (1152 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 17.38 ns/B 54.88 MiB/s 20.02 c/B After (~9.3x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 1.85 ns/B 515.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 571673e..1e97050 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -89,6 +89,7 @@ serpent.c serpent-sse2-amd64.S serpent-avx2-amd64.S serpent-armv7-neon.S \ sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \ sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \ sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \ + sha256-armv8-aarch32-ce.S \ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S sha512-arm.S \ keccak.c keccak_permute_32.h keccak_permute_64.h keccak-armv7-neon.S \ diff --git a/cipher/sha256-armv8-aarch32-ce.S b/cipher/sha256-armv8-aarch32-ce.S new file mode 100644 index 0000000..a0dbcea --- /dev/null +++ b/cipher/sha256-armv8-aarch32-ce.S @@ -0,0 +1,231 @@ +/* sha256-armv8-aarch32-ce.S - ARM/CE accelerated SHA-256 transform function + * Copyright (C) 2016 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#include + +#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \ + defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \ + defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA256) + +.syntax unified +.fpu crypto-neon-fp-armv8 +.arm + +.text + +#ifdef __PIC__ +# define GET_DATA_POINTER(reg, name, rtmp) \ + ldr reg, 1f; \ + ldr rtmp, 2f; \ + b 3f; \ + 1: .word _GLOBAL_OFFSET_TABLE_-(3f+8); \ + 2: .word name(GOT); \ + 3: add reg, pc, reg; \ + ldr reg, [reg, rtmp]; +#else +# define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name +#endif + + +/* Constants */ + +.align 4 +gcry_sha256_aarch32_ce_K: +.LK: + .long 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .long 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .long 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .long 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .long 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .long 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .long 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .long 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .long 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .long 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .long 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .long 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .long 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .long 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .long 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .long 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 + + +/* Register macros */ + +#define qH0123 q0 +#define qH4567 q1 + +#define qABCD0 q2 +#define qABCD1 q3 +#define qEFGH q4 + +#define qT0 q5 +#define qT1 q6 + +#define qW0 q8 +#define qW1 q9 +#define qW2 q10 +#define qW3 q11 + +#define qK0 q12 +#define qK1 q13 +#define qK2 q14 +#define qK3 q15 + + +/* Round macros */ + +#define _(...) /*_*/ + + + +/* Other functional macros */ + +#define CLEAR_REG(reg) veor reg, reg; + + +/* + * unsigned int + * _gcry_sha256_transform_armv8_ce (u32 state[8], const void *input_data, + * size_t num_blks) + */ +.align 3 +.globl _gcry_sha256_transform_armv8_ce +.type _gcry_sha256_transform_armv8_ce,%function; +_gcry_sha256_transform_armv8_ce: + /* input: + * r0: ctx, CTX + * r1: data (64*nblks bytes) + * r2: nblks + */ + + cmp r2, #0; + push {r4,lr}; + beq .Ldo_nothing; + + vpush {q4-q7}; + + GET_DATA_POINTER(r4, .LK, lr); + mov lr, r4 + + vld1.32 {qH0123-qH4567}, [r0] /* load state */ + +#define do_loadk(nk0, nk1) vld1.32 {nk0-nk1},[lr]!; +#define do_add(a, b) vadd.u32 a, a, b; +#define do_sha256su0(w0, w1) sha256su0.32 w0, w1; +#define do_sha256su1(w0, w2, w3) sha256su1.32 w0, w2, w3; + +#define do_rounds(k, nk0, nk1, w0, w1, w2, w3, loadk_fn, add_fn, su0_fn, su1_fn) \ + loadk_fn( nk0, nk1 ); \ + su0_fn( w0, w1 ); \ + vmov qABCD1, qABCD0; \ + sha256h.32 qABCD0, qEFGH, k; \ + sha256h2.32 qEFGH, qABCD1, k; \ + add_fn( nk0, w2 ); \ + su1_fn( w0, w2, w3 ); \ + + vld1.8 {qW0-qW1}, [r1]! + do_loadk(qK0, qK1) + vld1.8 {qW2-qW3}, [r1]! + vmov qABCD0, qH0123 + vmov qEFGH, qH4567 + + vrev32.8 qW0, qW0 + vrev32.8 qW1, qW1 + vrev32.8 qW2, qW2 + do_add(qK0, qW0) + vrev32.8 qW3, qW3 + do_add(qK1, qW1) + +.Loop: + do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1) + subs r2,r2,#1 + do_rounds(qK1, qK3, _ , qW1, qW2, qW3, qW0, _ , do_add, do_sha256su0, do_sha256su1) + do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1) + do_rounds(qK3, qK1, _ , qW3, qW0, qW1, qW2, _ , do_add, do_sha256su0, do_sha256su1) + + do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1) + do_rounds(qK1, qK3, _ , qW1, qW2, qW3, qW0, _ , do_add, do_sha256su0, do_sha256su1) + do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1) + do_rounds(qK3, qK1, _ , qW3, qW0, qW1, qW2, _ , do_add, do_sha256su0, do_sha256su1) + + do_rounds(qK0, qK2, qK3, qW0, qW1, qW2, qW3, do_loadk, do_add, do_sha256su0, do_sha256su1) + do_rounds(qK1, qK3, _ , qW1, qW2, qW3, qW0, _ , do_add, do_sha256su0, do_sha256su1) + do_rounds(qK2, qK0, qK1, qW2, qW3, qW0, qW1, do_loadk, do_add, do_sha256su0, do_sha256su1) + do_rounds(qK3, qK1, _ , qW3, qW0, qW1, qW2, _ , do_add, do_sha256su0, do_sha256su1) + + beq .Lend + + do_rounds(qK0, qK2, qK3, qW0, _ , qW2, qW3, do_loadk, do_add, _, _) + vld1.8 {qW0}, [r1]! + mov lr, r4 + do_rounds(qK1, qK3, _ , qW1, _ , qW3, _ , _ , do_add, _, _) + vld1.8 {qW1}, [r1]! + vrev32.8 qW0, qW0 + do_rounds(qK2, qK0, qK1, qW2, _ , qW0, _ , do_loadk, do_add, _, _) + vrev32.8 qW1, qW1 + vld1.8 {qW2}, [r1]! + do_rounds(qK3, qK1, _ , qW3, _ , qW1, _ , _ , do_add, _, _) + vld1.8 {qW3}, [r1]! + + vadd.u32 qH0123, qABCD0 + vadd.u32 qH4567, qEFGH + + vrev32.8 qW2, qW2 + vmov qABCD0, qH0123 + vrev32.8 qW3, qW3 + vmov qEFGH, qH4567 + + b .Loop + +.Lend: + + do_rounds(qK0, qK2, qK3, qW0, _ , qW2, qW3, do_loadk, do_add, _, _) + do_rounds(qK1, qK3, _ , qW1, _ , qW3, _ , _ , do_add, _, _) + do_rounds(qK2, _ , _ , qW2, _ , _ , _ , _ , _, _, _) + do_rounds(qK3, _ , _ , qW3, _ , _ , _ , _ , _, _, _) + + CLEAR_REG(qW0) + CLEAR_REG(qW1) + CLEAR_REG(qW2) + CLEAR_REG(qW3) + CLEAR_REG(qK0) + CLEAR_REG(qK1) + CLEAR_REG(qK2) + CLEAR_REG(qK3) + + vadd.u32 qH0123, qABCD0 + vadd.u32 qH4567, qEFGH + + CLEAR_REG(qABCD0) + CLEAR_REG(qABCD1) + CLEAR_REG(qEFGH) + + vst1.32 {qH0123-qH4567}, [r0] /* store state */ + + CLEAR_REG(qH0123) + CLEAR_REG(qH4567) + vpop {q4-q7} + +.Ldo_nothing: + mov r0, #0 + pop {r4,pc} +.size _gcry_sha256_transform_armv8_ce,.-_gcry_sha256_transform_armv8_ce; + +#endif diff --git a/cipher/sha256.c b/cipher/sha256.c index 1b82ee7..72818ce 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -75,6 +75,17 @@ # define USE_AVX2 1 #endif +/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly + * code. */ +#undef USE_ARM_CE +#ifdef ENABLE_ARM_CRYPTO_SUPPORT +# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \ + && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \ + && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) +# define USE_ARM_CE 1 +# endif +#endif + typedef struct { gcry_md_block_ctx_t bctx; @@ -88,6 +99,9 @@ typedef struct { #ifdef USE_AVX2 unsigned int use_avx2:1; #endif +#ifdef USE_ARM_CE + unsigned int use_arm_ce:1; +#endif } SHA256_CONTEXT; @@ -129,6 +143,9 @@ sha256_init (void *context, unsigned int flags) #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); #endif +#ifdef USE_ARM_CE + hd->use_arm_ce = (features & HWF_ARM_SHA2) != 0; +#endif (void)features; } @@ -167,6 +184,9 @@ sha224_init (void *context, unsigned int flags) #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); #endif +#ifdef USE_ARM_CE + hd->use_arm_ce = (features & HWF_ARM_SHA2) != 0; +#endif (void)features; } @@ -355,6 +375,11 @@ unsigned int _gcry_sha256_transform_amd64_avx2(const void *input_data, size_t num_blks) ASM_FUNC_ABI; #endif +#ifdef USE_ARM_CE +unsigned int _gcry_sha256_transform_armv8_ce(u32 state[8], + const void *input_data, + size_t num_blks); +#endif static unsigned int transform (void *ctx, const unsigned char *data, size_t nblks) @@ -380,6 +405,11 @@ transform (void *ctx, const unsigned char *data, size_t nblks) + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif +#ifdef USE_ARM_CE + if (hd->use_arm_ce) + return _gcry_sha256_transform_armv8_ce (&hd->h0, data, nblks); +#endif + do { burn = transform_blk (hd, data); diff --git a/configure.ac b/configure.ac index 613a3d6..91dd285 100644 --- a/configure.ac +++ b/configure.ac @@ -2256,6 +2256,10 @@ if test "$found" = "1" ; then GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-avx-amd64.lo" GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-avx2-bmi2-amd64.lo" ;; + arm*-*-*) + # Build with the assembly implementation + GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha512-arm.lo" + ;; esac fi @@ -2273,7 +2277,7 @@ if test "$found" = "1" ; then ;; arm*-*-*) # Build with the assembly implementation - GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha512-arm.lo" + GCRYPT_DIGESTS="$GCRYPT_DIGESTS sha256-armv8-aarch32-ce.lo" ;; esac From cvs at cvs.gnupg.org Wed Jul 13 19:08:37 2016 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Wed, 13 Jul 2016 19:08:37 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-18-ge535ea1 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via e535ea1bdc42309553007d60599d3147b8defe93 (commit) from 1111d311fd6452abd4080d1072c75ddb1b5a3dd1 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit e535ea1bdc42309553007d60599d3147b8defe93 Author: Werner Koch Date: Wed Jul 13 19:05:34 2016 +0200 build: Update config.{guess,sub} to {2016-05-15,2016-06-20}. * build-aux/config.guess: Update. * build-aux/config.sub: Update. Signed-off-by: Werner Koch diff --git a/build-aux/config.guess b/build-aux/config.guess index dbfb978..c4bd827 100755 --- a/build-aux/config.guess +++ b/build-aux/config.guess @@ -1,8 +1,8 @@ #! /bin/sh # Attempt to guess a canonical system name. -# Copyright 1992-2015 Free Software Foundation, Inc. +# Copyright 1992-2016 Free Software Foundation, Inc. -timestamp='2015-01-01' +timestamp='2016-05-15' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -27,7 +27,7 @@ timestamp='2015-01-01' # Originally written by Per Bothner; maintained since 2000 by Ben Elliston. # # You can get the latest version of this script from: -# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD +# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess # # Please send patches to . @@ -50,7 +50,7 @@ version="\ GNU config.guess ($timestamp) Originally written by Per Bothner. -Copyright 1992-2015 Free Software Foundation, Inc. +Copyright 1992-2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -168,19 +168,29 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in # Note: NetBSD doesn't particularly care about the vendor # portion of the name. We always set it to "unknown". sysctl="sysctl -n hw.machine_arch" - UNAME_MACHINE_ARCH=`(/sbin/$sysctl 2>/dev/null || \ - /usr/sbin/$sysctl 2>/dev/null || echo unknown)` + UNAME_MACHINE_ARCH=`(uname -p 2>/dev/null || \ + /sbin/$sysctl 2>/dev/null || \ + /usr/sbin/$sysctl 2>/dev/null || \ + echo unknown)` case "${UNAME_MACHINE_ARCH}" in armeb) machine=armeb-unknown ;; arm*) machine=arm-unknown ;; sh3el) machine=shl-unknown ;; sh3eb) machine=sh-unknown ;; sh5el) machine=sh5le-unknown ;; + earmv*) + arch=`echo ${UNAME_MACHINE_ARCH} | sed -e 's,^e\(armv[0-9]\).*$,\1,'` + endian=`echo ${UNAME_MACHINE_ARCH} | sed -ne 's,^.*\(eb\)$,\1,p'` + machine=${arch}${endian}-unknown + ;; *) machine=${UNAME_MACHINE_ARCH}-unknown ;; esac # The Operating System including object format, if it has switched - # to ELF recently, or will in the future. + # to ELF recently (or will in the future) and ABI. case "${UNAME_MACHINE_ARCH}" in + earm*) + os=netbsdelf + ;; arm*|i386|m68k|ns32k|sh3*|sparc|vax) eval $set_cc_for_build if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \ @@ -197,6 +207,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in os=netbsd ;; esac + # Determine ABI tags. + case "${UNAME_MACHINE_ARCH}" in + earm*) + expr='s/^earmv[0-9]/-eabi/;s/eb$//' + abi=`echo ${UNAME_MACHINE_ARCH} | sed -e "$expr"` + ;; + esac # The OS release # Debian GNU/NetBSD machines have a different userland, and # thus, need a distinct triplet. However, they do not need @@ -207,13 +224,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in release='-gnu' ;; *) - release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'` + release=`echo ${UNAME_RELEASE} | sed -e 's/[-_].*//' | cut -d. -f1,2` ;; esac # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM: # contains redundant information, the shorter form: # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used. - echo "${machine}-${os}${release}" + echo "${machine}-${os}${release}${abi}" exit ;; *:Bitrig:*:*) UNAME_MACHINE_ARCH=`arch | sed 's/Bitrig.//'` @@ -223,6 +240,10 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'` echo ${UNAME_MACHINE_ARCH}-unknown-openbsd${UNAME_RELEASE} exit ;; + *:LibertyBSD:*:*) + UNAME_MACHINE_ARCH=`arch | sed 's/^.*BSD\.//'` + echo ${UNAME_MACHINE_ARCH}-unknown-libertybsd${UNAME_RELEASE} + exit ;; *:ekkoBSD:*:*) echo ${UNAME_MACHINE}-unknown-ekkobsd${UNAME_RELEASE} exit ;; @@ -235,6 +256,9 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in *:MirBSD:*:*) echo ${UNAME_MACHINE}-unknown-mirbsd${UNAME_RELEASE} exit ;; + *:Sortix:*:*) + echo ${UNAME_MACHINE}-unknown-sortix + exit ;; alpha:OSF1:*:*) case $UNAME_RELEASE in *4.0) @@ -251,42 +275,42 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^ The alpha \(.*\) processor.*$/\1/p' | head -n 1` case "$ALPHA_CPU_TYPE" in "EV4 (21064)") - UNAME_MACHINE="alpha" ;; + UNAME_MACHINE=alpha ;; "EV4.5 (21064)") - UNAME_MACHINE="alpha" ;; + UNAME_MACHINE=alpha ;; "LCA4 (21066/21068)") - UNAME_MACHINE="alpha" ;; + UNAME_MACHINE=alpha ;; "EV5 (21164)") - UNAME_MACHINE="alphaev5" ;; + UNAME_MACHINE=alphaev5 ;; "EV5.6 (21164A)") - UNAME_MACHINE="alphaev56" ;; + UNAME_MACHINE=alphaev56 ;; "EV5.6 (21164PC)") - UNAME_MACHINE="alphapca56" ;; + UNAME_MACHINE=alphapca56 ;; "EV5.7 (21164PC)") - UNAME_MACHINE="alphapca57" ;; + UNAME_MACHINE=alphapca57 ;; "EV6 (21264)") - UNAME_MACHINE="alphaev6" ;; + UNAME_MACHINE=alphaev6 ;; "EV6.7 (21264A)") - UNAME_MACHINE="alphaev67" ;; + UNAME_MACHINE=alphaev67 ;; "EV6.8CB (21264C)") - UNAME_MACHINE="alphaev68" ;; + UNAME_MACHINE=alphaev68 ;; "EV6.8AL (21264B)") - UNAME_MACHINE="alphaev68" ;; + UNAME_MACHINE=alphaev68 ;; "EV6.8CX (21264D)") - UNAME_MACHINE="alphaev68" ;; + UNAME_MACHINE=alphaev68 ;; "EV6.9A (21264/EV69A)") - UNAME_MACHINE="alphaev69" ;; + UNAME_MACHINE=alphaev69 ;; "EV7 (21364)") - UNAME_MACHINE="alphaev7" ;; + UNAME_MACHINE=alphaev7 ;; "EV7.9 (21364A)") - UNAME_MACHINE="alphaev79" ;; + UNAME_MACHINE=alphaev79 ;; esac # A Pn.n version is a patched version. # A Vn.n version is a released version. # A Tn.n version is a released field test version. # A Xn.n version is an unreleased experimental baselevel. # 1.2 uses "1.2" for uname -r. - echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` + echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` # Reset EXIT trap before exiting to avoid spurious non-zero exit code. exitcode=$? trap '' 0 @@ -359,16 +383,16 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in exit ;; i86pc:SunOS:5.*:* | i86xen:SunOS:5.*:*) eval $set_cc_for_build - SUN_ARCH="i386" + SUN_ARCH=i386 # If there is a compiler, see if it is configured for 64-bit objects. # Note that the Sun cc does not turn __LP64__ into 1 like gcc does. # This test works for both compilers. - if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then + if [ "$CC_FOR_BUILD" != no_compiler_found ]; then if (echo '#ifdef __amd64'; echo IS_64BIT_ARCH; echo '#endif') | \ - (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ grep IS_64BIT_ARCH >/dev/null then - SUN_ARCH="x86_64" + SUN_ARCH=x86_64 fi fi echo ${SUN_ARCH}-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` @@ -393,7 +417,7 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in exit ;; sun*:*:4.2BSD:*) UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null` - test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3 + test "x${UNAME_RELEASE}" = x && UNAME_RELEASE=3 case "`/bin/arch`" in sun3) echo m68k-sun-sunos${UNAME_RELEASE} @@ -618,13 +642,13 @@ EOF sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null` sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` case "${sc_cpu_version}" in - 523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0 - 528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1 + 523) HP_ARCH=hppa1.0 ;; # CPU_PA_RISC1_0 + 528) HP_ARCH=hppa1.1 ;; # CPU_PA_RISC1_1 532) # CPU_PA_RISC2_0 case "${sc_kernel_bits}" in - 32) HP_ARCH="hppa2.0n" ;; - 64) HP_ARCH="hppa2.0w" ;; - '') HP_ARCH="hppa2.0" ;; # HP-UX 10.20 + 32) HP_ARCH=hppa2.0n ;; + 64) HP_ARCH=hppa2.0w ;; + '') HP_ARCH=hppa2.0 ;; # HP-UX 10.20 esac ;; esac fi @@ -663,11 +687,11 @@ EOF exit (0); } EOF - (CCOPTS= $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy` + (CCOPTS="" $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy` test -z "$HP_ARCH" && HP_ARCH=hppa fi ;; esac - if [ ${HP_ARCH} = "hppa2.0w" ] + if [ ${HP_ARCH} = hppa2.0w ] then eval $set_cc_for_build @@ -680,12 +704,12 @@ EOF # $ CC_FOR_BUILD="cc +DA2.0w" ./config.guess # => hppa64-hp-hpux11.23 - if echo __LP64__ | (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | + if echo __LP64__ | (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | grep -q __LP64__ then - HP_ARCH="hppa2.0w" + HP_ARCH=hppa2.0w else - HP_ARCH="hppa64" + HP_ARCH=hppa64 fi fi echo ${HP_ARCH}-hp-hpux${HPUX_REV} @@ -790,14 +814,14 @@ EOF echo craynv-cray-unicosmp${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit ;; F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*) - FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` - FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'` + FUJITSU_PROC=`uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` + FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'` echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" exit ;; 5000:UNIX_System_V:4.*:*) - FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'` - FUJITSU_REL=`echo ${UNAME_RELEASE} | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/ /_/'` + FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` + FUJITSU_REL=`echo ${UNAME_RELEASE} | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/'` echo "sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" exit ;; i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*) @@ -879,7 +903,7 @@ EOF exit ;; *:GNU/*:*:*) # other systems with GNU libc and userland - echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC} + echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]"``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC} exit ;; i*86:Minix:*:*) echo ${UNAME_MACHINE}-pc-minix @@ -902,7 +926,7 @@ EOF EV68*) UNAME_MACHINE=alphaev68 ;; esac objdump --private-headers /bin/sh | grep -q ld.so.1 - if test "$?" = 0 ; then LIBC="gnulibc1" ; fi + if test "$?" = 0 ; then LIBC=gnulibc1 ; fi echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; arc:Linux:*:* | arceb:Linux:*:*) @@ -933,6 +957,9 @@ EOF crisv32:Linux:*:*) echo ${UNAME_MACHINE}-axis-linux-${LIBC} exit ;; + e2k:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} + exit ;; frv:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; @@ -945,6 +972,9 @@ EOF ia64:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; + k1om:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} + exit ;; m32r*:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; @@ -1021,7 +1051,7 @@ EOF echo ${UNAME_MACHINE}-dec-linux-${LIBC} exit ;; x86_64:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-${LIBC} + echo ${UNAME_MACHINE}-pc-linux-${LIBC} exit ;; xtensa*:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-${LIBC} @@ -1100,7 +1130,7 @@ EOF # uname -m prints for DJGPP always 'pc', but it prints nothing about # the processor, so we play safe by assuming i586. # Note: whatever this is, it MUST be the same as what config.sub - # prints for the "djgpp" host, or else GDB configury will decide that + # prints for the "djgpp" host, or else GDB configure will decide that # this is a cross-build. echo i586-pc-msdosdjgpp exit ;; @@ -1249,6 +1279,9 @@ EOF SX-8R:SUPER-UX:*:*) echo sx8r-nec-superux${UNAME_RELEASE} exit ;; + SX-ACE:SUPER-UX:*:*) + echo sxace-nec-superux${UNAME_RELEASE} + exit ;; Power*:Rhapsody:*:*) echo powerpc-apple-rhapsody${UNAME_RELEASE} exit ;; @@ -1262,9 +1295,9 @@ EOF UNAME_PROCESSOR=powerpc fi if test `echo "$UNAME_RELEASE" | sed -e 's/\..*//'` -le 10 ; then - if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then + if [ "$CC_FOR_BUILD" != no_compiler_found ]; then if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \ - (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ grep IS_64BIT_ARCH >/dev/null then case $UNAME_PROCESSOR in @@ -1286,7 +1319,7 @@ EOF exit ;; *:procnto*:*:* | *:QNX:[0123456789]*:*) UNAME_PROCESSOR=`uname -p` - if test "$UNAME_PROCESSOR" = "x86"; then + if test "$UNAME_PROCESSOR" = x86; then UNAME_PROCESSOR=i386 UNAME_MACHINE=pc fi @@ -1317,7 +1350,7 @@ EOF # "uname -m" is not consistent, so use $cputype instead. 386 # is converted to i386 for consistency with other x86 # operating systems. - if test "$cputype" = "386"; then + if test "$cputype" = 386; then UNAME_MACHINE=i386 else UNAME_MACHINE="$cputype" @@ -1359,7 +1392,7 @@ EOF echo i386-pc-xenix exit ;; i*86:skyos:*:*) - echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE}` | sed -e 's/ .*$//' + echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE} | sed -e 's/ .*$//'` exit ;; i*86:rdos:*:*) echo ${UNAME_MACHINE}-pc-rdos @@ -1370,23 +1403,25 @@ EOF x86_64:VMkernel:*:*) echo ${UNAME_MACHINE}-unknown-esx exit ;; + amd64:Isilon\ OneFS:*:*) + echo x86_64-unknown-onefs + exit ;; esac cat >&2 < in order to provide the needed -information to handle your system. +If $0 has already been updated, send the following data and any +information you think might be pertinent to config-patches at gnu.org to +provide the necessary information to handle your system. config.guess timestamp = $timestamp diff --git a/build-aux/config.sub b/build-aux/config.sub index 6d2e94c..9feb73b 100755 --- a/build-aux/config.sub +++ b/build-aux/config.sub @@ -1,8 +1,8 @@ #! /bin/sh # Configuration validation subroutine script. -# Copyright 1992-2015 Free Software Foundation, Inc. +# Copyright 1992-2016 Free Software Foundation, Inc. -timestamp='2015-01-01' +timestamp='2016-06-20' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -33,7 +33,7 @@ timestamp='2015-01-01' # Otherwise, we print the canonical config type on stdout and succeed. # You can get the latest version of this script from: -# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD +# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub # This file is supposed to be the same for all GNU packages # and recognize all the CPU types, system types and aliases @@ -53,8 +53,7 @@ timestamp='2015-01-01' me=`echo "$0" | sed -e 's,.*/,,'` usage="\ -Usage: $0 [OPTION] CPU-MFR-OPSYS - $0 [OPTION] ALIAS +Usage: $0 [OPTION] CPU-MFR-OPSYS or ALIAS Canonicalize a configuration name. @@ -68,7 +67,7 @@ Report bugs and patches to ." version="\ GNU config.sub ($timestamp) -Copyright 1992-2015 Free Software Foundation, Inc. +Copyright 1992-2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -117,7 +116,7 @@ maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'` case $maybe_os in nto-qnx* | linux-gnu* | linux-android* | linux-dietlibc | linux-newlib* | \ linux-musl* | linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | kfreebsd*-gnu* | \ - knetbsd*-gnu* | netbsd*-gnu* | \ + knetbsd*-gnu* | netbsd*-gnu* | netbsd*-eabi* | \ kopensolaris*-gnu* | \ storm-chaos* | os2-emx* | rtmk-nova*) os=-$maybe_os @@ -255,11 +254,12 @@ case $basic_machine in | arc | arceb \ | arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \ | avr | avr32 \ + | ba \ | be32 | be64 \ | bfin \ | c4x | c8051 | clipper \ | d10v | d30v | dlx | dsp16xx \ - | epiphany \ + | e2k | epiphany \ | fido | fr30 | frv | ft32 \ | h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \ | hexagon \ @@ -305,7 +305,7 @@ case $basic_machine in | riscv32 | riscv64 \ | rl78 | rx \ | score \ - | sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[34]eb | sheb | shbe | shle | sh[1234]le | sh3ele \ + | sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[234]eb | sheb | shbe | shle | sh[1234]le | sh3ele \ | sh64 | sh64le \ | sparc | sparc64 | sparc64b | sparc64v | sparc86x | sparclet | sparclite \ | sparcv8 | sparcv9 | sparcv9b | sparcv9v \ @@ -376,12 +376,13 @@ case $basic_machine in | alphapca5[67]-* | alpha64pca5[67]-* | arc-* | arceb-* \ | arm-* | armbe-* | armle-* | armeb-* | armv*-* \ | avr-* | avr32-* \ + | ba-* \ | be32-* | be64-* \ | bfin-* | bs2000-* \ | c[123]* | c30-* | [cjt]90-* | c4x-* \ | c8051-* | clipper-* | craynv-* | cydra-* \ | d10v-* | d30v-* | dlx-* \ - | elxsi-* \ + | e2k-* | elxsi-* \ | f30[01]-* | f700-* | fido-* | fr30-* | frv-* | fx80-* \ | h8300-* | h8500-* \ | hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \ @@ -428,12 +429,13 @@ case $basic_machine in | pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \ | powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* \ | pyramid-* \ + | riscv32-* | riscv64-* \ | rl78-* | romp-* | rs6000-* | rx-* \ | sh-* | sh[1234]-* | sh[24]a-* | sh[24]aeb-* | sh[23]e-* | sh[34]eb-* | sheb-* | shbe-* \ | shle-* | sh[1234]le-* | sh3ele-* | sh64-* | sh64le-* \ | sparc-* | sparc64-* | sparc64b-* | sparc64v-* | sparc86x-* | sparclet-* \ | sparclite-* \ - | sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx?-* \ + | sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx*-* \ | tahoe-* \ | tic30-* | tic4x-* | tic54x-* | tic55x-* | tic6x-* | tic80-* \ | tile*-* \ @@ -518,6 +520,9 @@ case $basic_machine in basic_machine=i386-pc os=-aros ;; + asmjs) + basic_machine=asmjs-unknown + ;; aux) basic_machine=m68k-apple os=-aux @@ -638,6 +643,14 @@ case $basic_machine in basic_machine=m68k-bull os=-sysv3 ;; + e500v[12]) + basic_machine=powerpc-unknown + os=$os"spe" + ;; + e500v[12]-*) + basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'` + os=$os"spe" + ;; ebmon29k) basic_machine=a29k-amd os=-ebmon @@ -1373,18 +1386,18 @@ case $os in | -hpux* | -unos* | -osf* | -luna* | -dgux* | -auroraux* | -solaris* \ | -sym* | -kopensolaris* | -plan9* \ | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \ - | -aos* | -aros* \ + | -aos* | -aros* | -cloudabi* | -sortix* \ | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \ | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \ | -hiux* | -386bsd* | -knetbsd* | -mirbsd* | -netbsd* \ - | -bitrig* | -openbsd* | -solidbsd* \ + | -bitrig* | -openbsd* | -solidbsd* | -libertybsd* \ | -ekkobsd* | -kfreebsd* | -freebsd* | -riscix* | -lynxos* \ | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \ | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \ | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \ | -chorusos* | -chorusrdb* | -cegcc* \ | -cygwin* | -msys* | -pe* | -psos* | -moss* | -proelf* | -rtems* \ - | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \ + | -midipix* | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \ | -linux-newlib* | -linux-musl* | -linux-uclibc* \ | -uxpv* | -beos* | -mpeix* | -udk* | -moxiebox* \ | -interix* | -uwin* | -mks* | -rhapsody* | -darwin* | -opened* \ @@ -1393,7 +1406,8 @@ case $os in | -os2* | -vos* | -palmos* | -uclinux* | -nucleus* \ | -morphos* | -superux* | -rtmk* | -rtmk-nova* | -windiss* \ | -powermax* | -dnix* | -nx6 | -nx7 | -sei* | -dragonfly* \ - | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* | -tirtos*) + | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* \ + | -onefs* | -tirtos* | -phoenix*) # Remember, each alternative MUST END IN *, to match a version number. ;; -qnx*) @@ -1525,6 +1539,8 @@ case $os in ;; -nacl*) ;; + -ios) + ;; -none) ;; *) ----------------------------------------------------------------------- Summary of changes: build-aux/config.guess | 159 ++++++++++++++++++++++++++++++------------------- build-aux/config.sub | 46 +++++++++----- 2 files changed, 128 insertions(+), 77 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Tue Jul 12 11:54:06 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 12 Jul 2016 12:54:06 +0300 Subject: [PATCH 3/3] Add ARMv8/AArch32 Crypto Extension implementation of AES In-Reply-To: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6> References: <146831723654.1076.3194185258807123808.stgit@localhost6.localdomain6> Message-ID: <146831724662.1076.2155032204748217085.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and 'rijndael-armv-aarch32-ce.S'. * cipher/rijndael-armv8-aarch32-ce.S: New. * cipher/rijndael-armv8-ce.c: New. * cipher/rijndael-internal.h (USE_ARM_CE): New. (RIJNDAEL_context_s): Add 'use_arm_ce'. * cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey) (_gcry_aes_armv8_ce_prepare_decryption) (_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt) (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_ocb_auth): New. (do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key setup for ARM CE. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add ARM CE support. * configure.ac: Add 'rijndael-armv8-ce.lo' and 'rijndael-armv8-aarch32-ce.lo'. -- Improvement vs ARM assembly on Cortex-A53: AES-128 AES-192 AES-256 CBC enc: 14.8x 12.8x 11.4x CBC dec: 21.4x 20.5x 19.4x CFB enc: 16.2x 13.6x 11.6x CFB dec: 21.6x 20.5x 19.4x CTR: 19.1x 18.6x 17.8x OCB enc: 16.0x 16.2x 16.1x OCB dec: 15.6x 15.9x 15.8x OCB auth: 18.3x 18.4x 18.0x Benchmark on Cortex-A53 (1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 24.42 ns/B 39.06 MiB/s 28.13 c/B ECB dec | 25.07 ns/B 38.05 MiB/s 28.88 c/B CBC enc | 21.05 ns/B 45.30 MiB/s 24.25 c/B CBC dec | 21.16 ns/B 45.07 MiB/s 24.38 c/B CFB enc | 21.05 ns/B 45.31 MiB/s 24.25 c/B CFB dec | 21.38 ns/B 44.61 MiB/s 24.62 c/B OFB enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B OFB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B CTR enc | 21.17 ns/B 45.06 MiB/s 24.38 c/B CTR dec | 21.16 ns/B 45.06 MiB/s 24.38 c/B CCM enc | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM dec | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM auth | 21.17 ns/B 45.06 MiB/s 24.38 c/B GCM enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B GCM dec | 22.08 ns/B 43.18 MiB/s 25.44 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 26.20 ns/B 36.40 MiB/s 30.18 c/B OCB dec | 25.97 ns/B 36.73 MiB/s 29.91 c/B OCB auth | 24.52 ns/B 38.90 MiB/s 28.24 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 27.83 ns/B 34.26 MiB/s 32.06 c/B ECB dec | 28.54 ns/B 33.42 MiB/s 32.88 c/B CBC enc | 24.47 ns/B 38.97 MiB/s 28.19 c/B CBC dec | 25.27 ns/B 37.74 MiB/s 29.11 c/B CFB enc | 25.08 ns/B 38.02 MiB/s 28.89 c/B CFB dec | 25.31 ns/B 37.68 MiB/s 29.16 c/B OFB enc | 29.57 ns/B 32.25 MiB/s 34.06 c/B OFB dec | 29.57 ns/B 32.25 MiB/s 34.06 c/B CTR enc | 25.24 ns/B 37.78 MiB/s 29.08 c/B CTR dec | 25.24 ns/B 37.79 MiB/s 29.08 c/B CCM enc | 49.81 ns/B 19.15 MiB/s 57.38 c/B CCM dec | 49.80 ns/B 19.15 MiB/s 57.37 c/B CCM auth | 24.58 ns/B 38.80 MiB/s 28.32 c/B GCM enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B GCM dec | 26.11 ns/B 36.52 MiB/s 30.08 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 29.59 ns/B 32.23 MiB/s 34.09 c/B OCB dec | 29.42 ns/B 32.42 MiB/s 33.89 c/B OCB auth | 27.92 ns/B 34.16 MiB/s 32.16 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 31.20 ns/B 30.57 MiB/s 35.94 c/B ECB dec | 31.80 ns/B 29.99 MiB/s 36.63 c/B CBC enc | 27.83 ns/B 34.27 MiB/s 32.06 c/B CBC dec | 27.87 ns/B 34.21 MiB/s 32.11 c/B CFB enc | 27.88 ns/B 34.20 MiB/s 32.12 c/B CFB dec | 28.16 ns/B 33.87 MiB/s 32.44 c/B OFB enc | 32.93 ns/B 28.96 MiB/s 37.94 c/B OFB dec | 32.93 ns/B 28.96 MiB/s 37.94 c/B CTR enc | 27.95 ns/B 34.13 MiB/s 32.19 c/B CTR dec | 27.95 ns/B 34.12 MiB/s 32.20 c/B CCM enc | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM dec | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM auth | 27.95 ns/B 34.12 MiB/s 32.20 c/B GCM enc | 28.86 ns/B 33.05 MiB/s 33.25 c/B GCM dec | 28.87 ns/B 33.04 MiB/s 33.25 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 32.96 ns/B 28.94 MiB/s 37.97 c/B OCB dec | 32.73 ns/B 29.14 MiB/s 37.70 c/B OCB auth | 31.29 ns/B 30.48 MiB/s 36.04 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.10 ns/B 187.0 MiB/s 5.88 c/B ECB dec | 5.27 ns/B 181.0 MiB/s 6.07 c/B CBC enc | 1.41 ns/B 675.8 MiB/s 1.63 c/B CBC dec | 0.992 ns/B 961.7 MiB/s 1.14 c/B CFB enc | 1.30 ns/B 732.4 MiB/s 1.50 c/B CFB dec | 0.991 ns/B 962.7 MiB/s 1.14 c/B OFB enc | 7.05 ns/B 135.2 MiB/s 8.13 c/B OFB dec | 7.05 ns/B 135.2 MiB/s 8.13 c/B CTR enc | 1.11 ns/B 856.9 MiB/s 1.28 c/B CTR dec | 1.11 ns/B 857.0 MiB/s 1.28 c/B CCM enc | 2.58 ns/B 369.8 MiB/s 2.97 c/B CCM dec | 2.58 ns/B 369.5 MiB/s 2.97 c/B CCM auth | 1.58 ns/B 605.2 MiB/s 1.82 c/B GCM enc | 2.04 ns/B 467.9 MiB/s 2.35 c/B GCM dec | 2.04 ns/B 466.6 MiB/s 2.35 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 1.64 ns/B 579.8 MiB/s 1.89 c/B OCB dec | 1.66 ns/B 574.5 MiB/s 1.91 c/B OCB auth | 1.33 ns/B 715.5 MiB/s 1.54 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.64 ns/B 169.0 MiB/s 6.50 c/B ECB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B CBC enc | 1.90 ns/B 502.1 MiB/s 2.19 c/B CBC dec | 1.24 ns/B 771.7 MiB/s 1.42 c/B CFB enc | 1.84 ns/B 517.1 MiB/s 2.12 c/B CFB dec | 1.23 ns/B 772.5 MiB/s 1.42 c/B OFB enc | 7.60 ns/B 125.5 MiB/s 8.75 c/B OFB dec | 7.60 ns/B 125.6 MiB/s 8.75 c/B CTR enc | 1.36 ns/B 702.7 MiB/s 1.56 c/B CTR dec | 1.36 ns/B 702.5 MiB/s 1.56 c/B CCM enc | 3.31 ns/B 287.8 MiB/s 3.82 c/B CCM dec | 3.31 ns/B 288.0 MiB/s 3.81 c/B CCM auth | 2.06 ns/B 462.1 MiB/s 2.38 c/B GCM enc | 2.28 ns/B 418.4 MiB/s 2.63 c/B GCM dec | 2.28 ns/B 418.0 MiB/s 2.63 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 1.83 ns/B 520.1 MiB/s 2.11 c/B OCB dec | 1.84 ns/B 517.8 MiB/s 2.12 c/B OCB auth | 1.52 ns/B 626.1 MiB/s 1.75 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.86 ns/B 162.7 MiB/s 6.75 c/B ECB dec | 6.02 ns/B 158.3 MiB/s 6.94 c/B CBC enc | 2.44 ns/B 390.5 MiB/s 2.81 c/B CBC dec | 1.45 ns/B 656.4 MiB/s 1.67 c/B CFB enc | 2.39 ns/B 399.5 MiB/s 2.75 c/B CFB dec | 1.45 ns/B 656.8 MiB/s 1.67 c/B OFB enc | 7.81 ns/B 122.1 MiB/s 9.00 c/B OFB dec | 7.81 ns/B 122.1 MiB/s 9.00 c/B CTR enc | 1.57 ns/B 605.8 MiB/s 1.81 c/B CTR dec | 1.57 ns/B 605.9 MiB/s 1.81 c/B CCM enc | 4.07 ns/B 234.3 MiB/s 4.69 c/B CCM dec | 4.07 ns/B 234.1 MiB/s 4.69 c/B CCM auth | 2.61 ns/B 365.7 MiB/s 3.00 c/B GCM enc | 2.50 ns/B 381.9 MiB/s 2.88 c/B GCM dec | 2.49 ns/B 382.3 MiB/s 2.87 c/B GCM auth | 0.926 ns/B 1029.7 MiB/s 1.07 c/B OCB enc | 2.05 ns/B 465.6 MiB/s 2.36 c/B OCB dec | 2.06 ns/B 462.0 MiB/s 2.38 c/B OCB auth | 1.74 ns/B 548.4 MiB/s 2.00 c/B --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 5d69a38..de619fe 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -81,6 +81,7 @@ md5.c \ poly1305-sse2-amd64.S poly1305-avx2-amd64.S poly1305-armv7-neon.S \ rijndael.c rijndael-internal.h rijndael-tables.h rijndael-aesni.c \ rijndael-padlock.c rijndael-amd64.S rijndael-arm.S rijndael-ssse3-amd64.c \ + rijndael-armv8-ce.c rijndael-armv8-aarch32-ce.S \ rmd160.c \ rsa.c \ salsa20.c salsa20-amd64.S salsa20-armv7-neon.S \ diff --git a/cipher/rijndael-armv8-aarch32-ce.S b/cipher/rijndael-armv8-aarch32-ce.S new file mode 100644 index 0000000..f3b5400 --- /dev/null +++ b/cipher/rijndael-armv8-aarch32-ce.S @@ -0,0 +1,1483 @@ +/* ARMv8 CE accelerated AES + * Copyright (C) 2016 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#include + +#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \ + defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \ + defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) + +.syntax unified +.fpu crypto-neon-fp-armv8 +.arm + +.text + +#ifdef __PIC__ +# define GET_DATA_POINTER(reg, name, rtmp) \ + ldr reg, 1f; \ + ldr rtmp, 2f; \ + b 3f; \ + 1: .word _GLOBAL_OFFSET_TABLE_-(3f+8); \ + 2: .word name(GOT); \ + 3: add reg, pc, reg; \ + ldr reg, [reg, rtmp]; +#else +# define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name +#endif + + +/* AES macros */ + +#define aes_preload_keys(keysched, rekeysched) \ + vldmia keysched!, {q5-q7}; \ + mov rekeysched, keysched; \ + vldmialo keysched!, {q8-q15}; /* 128-bit */ \ + addeq keysched, #(2*16); \ + vldmiaeq keysched!, {q10-q15}; /* 192-bit */ \ + addhi keysched, #(4*16); \ + vldmiahi keysched!, {q12-q15}; /* 256-bit */ \ + +#define do_aes_one128(ed, mcimc, qo, qb) \ + aes##ed.8 qb, q5; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q6; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q7; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q8; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q9; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q10; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q11; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q12; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q13; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q14; \ + veor qo, qb, q15; + +#define do_aes_one128re(ed, mcimc, qo, qb, keysched, rekeysched) \ + vldm rekeysched, {q8-q9}; \ + do_aes_one128(ed, mcimc, qo, qb); + +#define do_aes_one192(ed, mcimc, qo, qb, keysched, rekeysched) \ + vldm rekeysched!, {q8}; \ + aes##ed.8 qb, q5; \ + aes##mcimc.8 qb, qb; \ + vldm rekeysched, {q9}; \ + aes##ed.8 qb, q6; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q7; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q8; \ + aes##mcimc.8 qb, qb; \ + vldmia keysched!, {q8}; \ + aes##ed.8 qb, q9; \ + aes##mcimc.8 qb, qb; \ + sub rekeysched, #(1*16); \ + aes##ed.8 qb, q10; \ + aes##mcimc.8 qb, qb; \ + vldm keysched, {q9}; \ + aes##ed.8 qb, q11; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q12; \ + aes##mcimc.8 qb, qb; \ + sub keysched, #16; \ + aes##ed.8 qb, q13; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q14; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q15; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q8; \ + veor qo, qb, q9; \ + +#define do_aes_one256(ed, mcimc, qo, qb, keysched, rekeysched) \ + vldmia rekeysched!, {q8}; \ + aes##ed.8 qb, q5; \ + aes##mcimc.8 qb, qb; \ + vldmia rekeysched!, {q9}; \ + aes##ed.8 qb, q6; \ + aes##mcimc.8 qb, qb; \ + vldmia rekeysched!, {q10}; \ + aes##ed.8 qb, q7; \ + aes##mcimc.8 qb, qb; \ + vldm rekeysched, {q11}; \ + aes##ed.8 qb, q8; \ + aes##mcimc.8 qb, qb; \ + vldmia keysched!, {q8}; \ + aes##ed.8 qb, q9; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q10; \ + aes##mcimc.8 qb, qb; \ + vldmia keysched!, {q9}; \ + aes##ed.8 qb, q11; \ + aes##mcimc.8 qb, qb; \ + sub rekeysched, #(3*16); \ + aes##ed.8 qb, q12; \ + aes##mcimc.8 qb, qb; \ + vldmia keysched!, {q10}; \ + aes##ed.8 qb, q13; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q14; \ + aes##mcimc.8 qb, qb; \ + vldm keysched, {q11}; \ + aes##ed.8 qb, q15; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q8; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q9; \ + aes##mcimc.8 qb, qb; \ + aes##ed.8 qb, q10; \ + veor qo, qb, q11; \ + sub keysched, #(3*16); \ + +#define aes_round_4(ed, mcimc, b0, b1, b2, b3, key) \ + aes##ed.8 b0, key; \ + aes##mcimc.8 b0, b0; \ + aes##ed.8 b1, key; \ + aes##mcimc.8 b1, b1; \ + aes##ed.8 b2, key; \ + aes##mcimc.8 b2, b2; \ + aes##ed.8 b3, key; \ + aes##mcimc.8 b3, b3; + +#define do_aes_4_128(ed, mcimc, b0, b1, b2, b3) \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \ + aes##ed.8 b0, q14; \ + veor b0, b0, q15; \ + aes##ed.8 b1, q14; \ + veor b1, b1, q15; \ + aes##ed.8 b2, q14; \ + veor b2, b2, q15; \ + aes##ed.8 b3, q14; \ + veor b3, b3, q15; + +#define do_aes_4_128re(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \ + vldm rekeysched, {q8-q9}; \ + do_aes_4_128(ed, mcimc, b0, b1, b2, b3); + +#define do_aes_4_192(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \ + vldm rekeysched!, {q8}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \ + vldm rekeysched, {q9}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \ + vldmia keysched!, {q8}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \ + sub rekeysched, #(1*16); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \ + vldm keysched, {q9}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \ + sub keysched, #16; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q14); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q15); \ + aes##ed.8 b0, q8; \ + veor b0, b0, q9; \ + aes##ed.8 b1, q8; \ + veor b1, b1, q9; \ + aes##ed.8 b2, q8; \ + veor b2, b2, q9; \ + aes##ed.8 b3, q8; \ + veor b3, b3, q9; + +#define do_aes_4_256(ed, mcimc, b0, b1, b2, b3, keysched, rekeysched) \ + vldmia rekeysched!, {q8}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q5); \ + vldmia rekeysched!, {q9}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q6); \ + vldmia rekeysched!, {q10}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q7); \ + vldm rekeysched, {q11}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \ + vldmia keysched!, {q8}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q10); \ + vldmia keysched!, {q9}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q11); \ + sub rekeysched, #(3*16); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q12); \ + vldmia keysched!, {q10}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q13); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q14); \ + vldm keysched, {q11}; \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q15); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q8); \ + aes_round_4(ed, mcimc, b0, b1, b2, b3, q9); \ + sub keysched, #(3*16); \ + aes##ed.8 b0, q10; \ + veor b0, b0, q11; \ + aes##ed.8 b1, q10; \ + veor b1, b1, q11; \ + aes##ed.8 b2, q10; \ + veor b2, b2, q11; \ + aes##ed.8 b3, q10; \ + veor b3, b3, q11; + + +/* Other functional macros */ + +#define CLEAR_REG(reg) veor reg, reg; + + +/* + * unsigned int _gcry_aes_enc_armv8_ce(void *keysched, byte *dst, + * const byte *src, + * unsigned int nrounds); + */ +.align 3 +.globl _gcry_aes_enc_armv8_ce +.type _gcry_aes_enc_armv8_ce,%function; +_gcry_aes_enc_armv8_ce: + /* input: + * r0: keysched + * r1: dst + * r2: src + * r3: nrounds + */ + + vldmia r0!, {q1-q3} /* load 3 round keys */ + + cmp r3, #12 + + vld1.8 {q0}, [r2] + + bhi .Lenc1_256 + beq .Lenc1_192 + +.Lenc1_128: + +.Lenc1_tail: + vldmia r0, {q8-q15} /* load 8 round keys */ + + aese.8 q0, q1 + aesmc.8 q0, q0 + CLEAR_REG(q1) + + aese.8 q0, q2 + aesmc.8 q0, q0 + CLEAR_REG(q2) + + aese.8 q0, q3 + aesmc.8 q0, q0 + CLEAR_REG(q3) + + aese.8 q0, q8 + aesmc.8 q0, q0 + CLEAR_REG(q8) + + aese.8 q0, q9 + aesmc.8 q0, q0 + CLEAR_REG(q9) + + aese.8 q0, q10 + aesmc.8 q0, q0 + CLEAR_REG(q10) + + aese.8 q0, q11 + aesmc.8 q0, q0 + CLEAR_REG(q11) + + aese.8 q0, q12 + aesmc.8 q0, q0 + CLEAR_REG(q12) + + aese.8 q0, q13 + aesmc.8 q0, q0 + CLEAR_REG(q13) + + aese.8 q0, q14 + veor q0, q15 + CLEAR_REG(q14) + CLEAR_REG(q15) + + vst1.8 {q0}, [r1] + CLEAR_REG(q0) + + mov r0, #0 + bx lr + +.Lenc1_192: + aese.8 q0, q1 + aesmc.8 q0, q0 + vmov q1, q3 + + aese.8 q0, q2 + aesmc.8 q0, q0 + vldm r0!, {q2-q3} /* load 3 round keys */ + + b .Lenc1_tail + +.Lenc1_256: + vldm r0!, {q15} /* load 1 round key */ + aese.8 q0, q1 + aesmc.8 q0, q0 + + aese.8 q0, q2 + aesmc.8 q0, q0 + + aese.8 q0, q3 + aesmc.8 q0, q0 + vldm r0!, {q1-q3} /* load 3 round keys */ + + aese.8 q0, q15 + aesmc.8 q0, q0 + + b .Lenc1_tail +.size _gcry_aes_enc_armv8_ce,.-_gcry_aes_enc_armv8_ce; + + +/* + * unsigned int _gcry_aes_dec_armv8_ce(void *keysched, byte *dst, + * const byte *src, + * unsigned int nrounds); + */ +.align 3 +.globl _gcry_aes_dec_armv8_ce +.type _gcry_aes_dec_armv8_ce,%function; +_gcry_aes_dec_armv8_ce: + /* input: + * r0: keysched + * r1: dst + * r2: src + * r3: nrounds + */ + + vldmia r0!, {q1-q3} /* load 3 round keys */ + + cmp r3, #12 + + vld1.8 {q0}, [r2] + + bhi .Ldec1_256 + beq .Ldec1_192 + +.Ldec1_128: + +.Ldec1_tail: + vldmia r0, {q8-q15} /* load 8 round keys */ + + aesd.8 q0, q1 + aesimc.8 q0, q0 + CLEAR_REG(q1) + + aesd.8 q0, q2 + aesimc.8 q0, q0 + CLEAR_REG(q2) + + aesd.8 q0, q3 + aesimc.8 q0, q0 + CLEAR_REG(q3) + + aesd.8 q0, q8 + aesimc.8 q0, q0 + CLEAR_REG(q8) + + aesd.8 q0, q9 + aesimc.8 q0, q0 + CLEAR_REG(q9) + + aesd.8 q0, q10 + aesimc.8 q0, q0 + CLEAR_REG(q10) + + aesd.8 q0, q11 + aesimc.8 q0, q0 + CLEAR_REG(q11) + + aesd.8 q0, q12 + aesimc.8 q0, q0 + CLEAR_REG(q12) + + aesd.8 q0, q13 + aesimc.8 q0, q0 + CLEAR_REG(q13) + + aesd.8 q0, q14 + veor q0, q15 + CLEAR_REG(q14) + CLEAR_REG(q15) + + vst1.8 {q0}, [r1] + CLEAR_REG(q0) + + mov r0, #0 + bx lr + +.Ldec1_192: + aesd.8 q0, q1 + aesimc.8 q0, q0 + vmov q1, q3 + + aesd.8 q0, q2 + aesimc.8 q0, q0 + vldm r0!, {q2-q3} /* load 3 round keys */ + + b .Ldec1_tail + +.Ldec1_256: + vldm r0!, {q15} /* load 1 round key */ + aesd.8 q0, q1 + aesimc.8 q0, q0 + + aesd.8 q0, q2 + aesimc.8 q0, q0 + + aesd.8 q0, q3 + aesimc.8 q0, q0 + vldm r0!, {q1-q3} /* load 3 round keys */ + + aesd.8 q0, q15 + aesimc.8 q0, q0 + + b .Ldec1_tail +.size _gcry_aes_dec_armv8_ce,.-_gcry_aes_dec_armv8_ce; + + +/* + * void _gcry_aes_cbc_enc_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *iv, size_t nblocks, + * int cbc_mac, unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_cbc_enc_armv8_ce +.type _gcry_aes_cbc_enc_armv8_ce,%function; +_gcry_aes_cbc_enc_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: iv + * %st+0: nblocks => r4 + * %st+4: cbc_mac => r5 + * %st+8: nrounds => r6 + */ + + push {r4-r6,lr} /* 4*4 = 16b */ + ldr r4, [sp, #(16+0)] + ldr r5, [sp, #(16+4)] + cmp r4, #0 + ldr r6, [sp, #(16+8)] + beq .Lcbc_enc_skip + cmp r5, #0 + vpush {q4-q7} + moveq r5, #16 + movne r5, #0 + + cmp r6, #12 + vld1.8 {q1}, [r3] /* load IV */ + + aes_preload_keys(r0, lr); + + beq .Lcbc_enc_loop192 + bhi .Lcbc_enc_loop256 + +#define CBC_ENC(bits, ...) \ + .Lcbc_enc_loop##bits: \ + vld1.8 {q0}, [r2]!; /* load plaintext */ \ + veor q1, q0, q1; \ + subs r4, r4, #1; \ + \ + do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \ + \ + vst1.8 {q1}, [r1], r5; /* store ciphertext */ \ + \ + bne .Lcbc_enc_loop##bits; \ + b .Lcbc_enc_done; + + CBC_ENC(128) + CBC_ENC(192, r0, lr) + CBC_ENC(256, r0, lr) + +#undef CBC_ENC + +.Lcbc_enc_done: + vst1.8 {q1}, [r3] /* store IV */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + vpop {q4-q7} + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + +.Lcbc_enc_skip: + pop {r4-r6,pc} +.size _gcry_aes_cbc_enc_armv8_ce,.-_gcry_aes_cbc_enc_armv8_ce; + + +/* + * void _gcry_aes_cbc_dec_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *iv, unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_cbc_dec_armv8_ce +.type _gcry_aes_cbc_dec_armv8_ce,%function; +_gcry_aes_cbc_dec_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: iv + * %st+0: nblocks => r4 + * %st+4: nrounds => r5 + */ + + push {r4-r6,lr} /* 4*4 = 16b */ + ldr r4, [sp, #(16+0)] + ldr r5, [sp, #(16+4)] + cmp r4, #0 + beq .Lcbc_dec_skip + vpush {q4-q7} + + cmp r5, #12 + vld1.8 {q0}, [r3] /* load IV */ + + aes_preload_keys(r0, r6); + + beq .Lcbc_dec_entry_192 + bhi .Lcbc_dec_entry_256 + +#define CBC_DEC(bits, ...) \ + .Lcbc_dec_entry_##bits: \ + cmp r4, #4; \ + blo .Lcbc_dec_loop_##bits; \ + \ + .Lcbc_dec_loop4_##bits: \ + \ + vld1.8 {q1-q2}, [r2]!; /* load ciphertext */ \ + sub r4, r4, #4; \ + vld1.8 {q3-q4}, [r2]; /* load ciphertext */ \ + cmp r4, #4; \ + sub r2, #32; \ + \ + do_aes_4_##bits(d, imc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + veor q1, q1, q0; \ + vld1.8 {q0}, [r2]!; /* load next IV */ \ + veor q2, q2, q0; \ + vld1.8 {q0}, [r2]!; /* load next IV */ \ + vst1.8 {q1-q2}, [r1]!; /* store plaintext */ \ + veor q3, q3, q0; \ + vld1.8 {q0}, [r2]!; /* load next IV */ \ + veor q4, q4, q0; \ + vld1.8 {q0}, [r2]!; /* load next IV */ \ + vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \ + \ + bhs .Lcbc_dec_loop4_##bits; \ + cmp r4, #0; \ + beq .Lcbc_dec_done; \ + \ + .Lcbc_dec_loop_##bits: \ + vld1.8 {q1}, [r2]!; /* load ciphertext */ \ + subs r4, r4, #1; \ + vmov q2, q1; \ + \ + do_aes_one##bits(d, imc, q1, q1, ##__VA_ARGS__); \ + \ + veor q1, q1, q0; \ + vmov q0, q2; \ + vst1.8 {q1}, [r1]!; /* store plaintext */ \ + \ + bne .Lcbc_dec_loop_##bits; \ + b .Lcbc_dec_done; + + CBC_DEC(128) + CBC_DEC(192, r0, r6) + CBC_DEC(256, r0, r6) + +#undef CBC_DEC + +.Lcbc_dec_done: + vst1.8 {q0}, [r3] /* store IV */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + vpop {q4-q7} + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + +.Lcbc_dec_skip: + pop {r4-r6,pc} +.size _gcry_aes_cbc_dec_armv8_ce,.-_gcry_aes_cbc_dec_armv8_ce; + + +/* + * void _gcry_aes_cfb_enc_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *iv, unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_cfb_enc_armv8_ce +.type _gcry_aes_cfb_enc_armv8_ce,%function; +_gcry_aes_cfb_enc_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: iv + * %st+0: nblocks => r4 + * %st+4: nrounds => r5 + */ + + push {r4-r6,lr} /* 4*4 = 16b */ + ldr r4, [sp, #(16+0)] + ldr r5, [sp, #(16+4)] + cmp r4, #0 + beq .Lcfb_enc_skip + vpush {q4-q7} + + cmp r5, #12 + vld1.8 {q0}, [r3] /* load IV */ + + aes_preload_keys(r0, r6); + + beq .Lcfb_enc_entry_192 + bhi .Lcfb_enc_entry_256 + +#define CFB_ENC(bits, ...) \ + .Lcfb_enc_entry_##bits: \ + .Lcfb_enc_loop_##bits: \ + vld1.8 {q1}, [r2]!; /* load plaintext */ \ + subs r4, r4, #1; \ + \ + do_aes_one##bits(e, mc, q0, q0, ##__VA_ARGS__); \ + \ + veor q0, q1, q0; \ + vst1.8 {q0}, [r1]!; /* store ciphertext */ \ + \ + bne .Lcfb_enc_loop_##bits; \ + b .Lcfb_enc_done; + + CFB_ENC(128) + CFB_ENC(192, r0, r6) + CFB_ENC(256, r0, r6) + +#undef CFB_ENC + +.Lcfb_enc_done: + vst1.8 {q0}, [r3] /* store IV */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + vpop {q4-q7} + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + +.Lcfb_enc_skip: + pop {r4-r6,pc} +.size _gcry_aes_cfb_enc_armv8_ce,.-_gcry_aes_cfb_enc_armv8_ce; + + +/* + * void _gcry_aes_cfb_dec_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *iv, unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_cfb_dec_armv8_ce +.type _gcry_aes_cfb_dec_armv8_ce,%function; +_gcry_aes_cfb_dec_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: iv + * %st+0: nblocks => r4 + * %st+4: nrounds => r5 + */ + + push {r4-r6,lr} /* 4*4 = 16b */ + ldr r4, [sp, #(16+0)] + ldr r5, [sp, #(16+4)] + cmp r4, #0 + beq .Lcfb_dec_skip + vpush {q4-q7} + + cmp r5, #12 + vld1.8 {q0}, [r3] /* load IV */ + + aes_preload_keys(r0, r6); + + beq .Lcfb_dec_entry_192 + bhi .Lcfb_dec_entry_256 + +#define CFB_DEC(bits, ...) \ + .Lcfb_dec_entry_##bits: \ + cmp r4, #4; \ + blo .Lcfb_dec_loop_##bits; \ + \ + .Lcfb_dec_loop4_##bits: \ + \ + vld1.8 {q2-q3}, [r2]!; /* load ciphertext */ \ + vmov q1, q0; \ + sub r4, r4, #4; \ + vld1.8 {q4}, [r2]; /* load ciphertext */ \ + sub r2, #32; \ + cmp r4, #4; \ + \ + do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + vld1.8 {q0}, [r2]!; /* load ciphertext */ \ + veor q1, q1, q0; \ + vld1.8 {q0}, [r2]!; /* load ciphertext */ \ + veor q2, q2, q0; \ + vst1.8 {q1-q2}, [r1]!; /* store plaintext */ \ + vld1.8 {q0}, [r2]!; \ + veor q3, q3, q0; \ + vld1.8 {q0}, [r2]!; /* load next IV / ciphertext */ \ + veor q4, q4, q0; \ + vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \ + \ + bhs .Lcfb_dec_loop4_##bits; \ + cmp r4, #0; \ + beq .Lcfb_dec_done; \ + \ + .Lcfb_dec_loop_##bits: \ + \ + vld1.8 {q1}, [r2]!; /* load ciphertext */ \ + \ + subs r4, r4, #1; \ + \ + do_aes_one##bits(e, mc, q0, q0, ##__VA_ARGS__); \ + \ + veor q2, q1, q0; \ + vmov q0, q1; \ + vst1.8 {q2}, [r1]!; /* store plaintext */ \ + \ + bne .Lcfb_dec_loop_##bits; \ + b .Lcfb_dec_done; + + CFB_DEC(128) + CFB_DEC(192, r0, r6) + CFB_DEC(256, r0, r6) + +#undef CFB_DEC + +.Lcfb_dec_done: + vst1.8 {q0}, [r3] /* store IV */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + vpop {q4-q7} + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + +.Lcfb_dec_skip: + pop {r4-r6,pc} +.size _gcry_aes_cfb_dec_armv8_ce,.-_gcry_aes_cfb_dec_armv8_ce; + + +/* + * void _gcry_aes_ctr_enc_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *iv, unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_ctr_enc_armv8_ce +.type _gcry_aes_ctr_enc_armv8_ce,%function; +_gcry_aes_ctr_enc_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: iv + * %st+0: nblocks => r4 + * %st+4: nrounds => r5 + */ + + vpush {q4-q7} + push {r4-r12,lr} /* 4*16 + 4*10 = 104b */ + ldr r4, [sp, #(104+0)] + ldr r5, [sp, #(104+4)] + cmp r4, #0 + beq .Lctr_enc_skip + + cmp r5, #12 + ldm r3, {r7-r10} + vld1.8 {q0}, [r3] /* load IV */ + rev r7, r7 + rev r8, r8 + rev r9, r9 + rev r10, r10 + + aes_preload_keys(r0, r6); + + beq .Lctr_enc_entry_192 + bhi .Lctr_enc_entry_256 + +#define CTR_ENC(bits, ...) \ + .Lctr_enc_entry_##bits: \ + cmp r4, #4; \ + blo .Lctr_enc_loop_##bits; \ + \ + .Lctr_enc_loop4_##bits: \ + cmp r10, #0xfffffffc; \ + sub r4, r4, #4; \ + blo .Lctr_enc_loop4_##bits##_nocarry; \ + cmp r9, #0xffffffff; \ + bne .Lctr_enc_loop4_##bits##_nocarry; \ + \ + adds r10, #1; \ + vmov q1, q0; \ + blcs .Lctr_overflow_one; \ + rev r11, r10; \ + vmov.32 d1[1], r11; \ + \ + adds r10, #1; \ + vmov q2, q0; \ + blcs .Lctr_overflow_one; \ + rev r11, r10; \ + vmov.32 d1[1], r11; \ + \ + adds r10, #1; \ + vmov q3, q0; \ + blcs .Lctr_overflow_one; \ + rev r11, r10; \ + vmov.32 d1[1], r11; \ + \ + adds r10, #1; \ + vmov q4, q0; \ + blcs .Lctr_overflow_one; \ + rev r11, r10; \ + vmov.32 d1[1], r11; \ + \ + b .Lctr_enc_loop4_##bits##_store_ctr; \ + \ + .Lctr_enc_loop4_##bits##_nocarry: \ + \ + veor q2, q2; \ + vrev64.8 q1, q0; \ + vceq.u32 d5, d5; \ + vadd.u64 q3, q2, q2; \ + vadd.u64 q4, q3, q2; \ + vadd.u64 q0, q3, q3; \ + vsub.u64 q2, q1, q2; \ + vsub.u64 q3, q1, q3; \ + vsub.u64 q4, q1, q4; \ + vsub.u64 q0, q1, q0; \ + vrev64.8 q1, q1; \ + vrev64.8 q2, q2; \ + vrev64.8 q3, q3; \ + vrev64.8 q0, q0; \ + vrev64.8 q4, q4; \ + add r10, #4; \ + \ + .Lctr_enc_loop4_##bits##_store_ctr: \ + \ + vst1.8 {q0}, [r3]; \ + cmp r4, #4; \ + vld1.8 {q0}, [r2]!; /* load ciphertext */ \ + \ + do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + veor q1, q1, q0; \ + vld1.8 {q0}, [r2]!; /* load ciphertext */ \ + vst1.8 {q1}, [r1]!; /* store plaintext */ \ + vld1.8 {q1}, [r2]!; /* load ciphertext */ \ + veor q2, q2, q0; \ + veor q3, q3, q1; \ + vld1.8 {q0}, [r2]!; /* load ciphertext */ \ + vst1.8 {q2}, [r1]!; /* store plaintext */ \ + veor q4, q4, q0; \ + vld1.8 {q0}, [r3]; /* reload IV */ \ + vst1.8 {q3-q4}, [r1]!; /* store plaintext */ \ + \ + bhs .Lctr_enc_loop4_##bits; \ + cmp r4, #0; \ + beq .Lctr_enc_done; \ + \ + .Lctr_enc_loop_##bits: \ + \ + adds r10, #1; \ + vmov q1, q0; \ + blcs .Lctr_overflow_one; \ + rev r11, r10; \ + subs r4, r4, #1; \ + vld1.8 {q2}, [r2]!; /* load ciphertext */ \ + vmov.32 d1[1], r11; \ + \ + do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \ + \ + veor q1, q2, q1; \ + vst1.8 {q1}, [r1]!; /* store plaintext */ \ + \ + bne .Lctr_enc_loop_##bits; \ + b .Lctr_enc_done; + + CTR_ENC(128) + CTR_ENC(192, r0, r6) + CTR_ENC(256, r0, r6) + +#undef CTR_ENC + +.Lctr_enc_done: + vst1.8 {q0}, [r3] /* store IV */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + +.Lctr_enc_skip: + pop {r4-r12,lr} + vpop {q4-q7} + bx lr + +.Lctr_overflow_one: + adcs r9, #0 + adcs r8, #0 + adc r7, #0 + rev r11, r9 + rev r12, r8 + vmov.32 d1[0], r11 + rev r11, r7 + vmov.32 d0[1], r12 + vmov.32 d0[0], r11 + bx lr +.size _gcry_aes_ctr_enc_armv8_ce,.-_gcry_aes_ctr_enc_armv8_ce; + + +/* + * void _gcry_aes_ocb_enc_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *offset, + * unsigned char *checksum, + * void **Ls, + * size_t nblocks, + * unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_ocb_enc_armv8_ce +.type _gcry_aes_ocb_enc_armv8_ce,%function; +_gcry_aes_ocb_enc_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: offset + * %st+0: checksum => r4 + * %st+4: Ls => r5 + * %st+8: nblocks => r6 (0 < nblocks <= 32) + * %st+12: nrounds => r7 + */ + + vpush {q4-q7} + push {r4-r12,lr} /* 4*16 + 4*10 = 104b */ + ldr r7, [sp, #(104+12)] + ldr r4, [sp, #(104+0)] + ldr r5, [sp, #(104+4)] + ldr r6, [sp, #(104+8)] + + cmp r7, #12 + vld1.8 {q0}, [r3] /* load offset */ + + aes_preload_keys(r0, r12); + + beq .Locb_enc_entry_192 + bhi .Locb_enc_entry_256 + +#define OCB_ENC(bits, ...) \ + .Locb_enc_entry_##bits: \ + cmp r6, #4; \ + blo .Locb_enc_loop_##bits; \ + \ + .Locb_enc_loop4_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* Checksum_i = Checksum_{i-1} xor P_i */ \ + /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i) */ \ + \ + ldm r5!, {r8, r9, r10, r11}; \ + sub r6, #4; \ + \ + vld1.8 {q9}, [r8]; /* load L_{ntz(i+0)} */ \ + vld1.8 {q1-q2}, [r2]!; /* load P_i+<0-1> */ \ + vld1.8 {q8}, [r4]; /* load Checksum_{i-1} */ \ + veor q0, q0, q9; /* Offset_i+0 */ \ + vld1.8 {q9}, [r9]; /* load L_{ntz(i+1)} */ \ + veor q8, q8, q1; /* Checksum_i+0 */ \ + veor q1, q1, q0; /* P_i+0 xor Offset_i+0 */\ + vld1.8 {q3-q4}, [r2]!; /* load P_i+<2-3> */ \ + vst1.8 {q0}, [r1]!; /* store Offset_i+0 */\ + veor q0, q0, q9; /* Offset_i+1 */ \ + vld1.8 {q9}, [r10]; /* load L_{ntz(i+2)} */ \ + veor q8, q8, q2; /* Checksum_i+1 */ \ + veor q2, q2, q0; /* P_i+1 xor Offset_i+1 */\ + vst1.8 {q0}, [r1]!; /* store Offset_i+1 */\ + veor q0, q0, q9; /* Offset_i+2 */ \ + vld1.8 {q9}, [r11]; /* load L_{ntz(i+3)} */ \ + veor q8, q8, q3; /* Checksum_i+2 */ \ + veor q3, q3, q0; /* P_i+2 xor Offset_i+2 */\ + vst1.8 {q0}, [r1]!; /* store Offset_i+2 */\ + veor q0, q0, q9; /* Offset_i+3 */ \ + veor q8, q8, q4; /* Checksum_i+3 */ \ + veor q4, q4, q0; /* P_i+3 xor Offset_i+3 */\ + vst1.8 {q0}, [r1]; /* store Offset_i+3 */\ + sub r1, #(3*16); \ + vst1.8 {q8}, [r4]; /* store Checksum_i+3 */\ + \ + cmp r6, #4; \ + \ + do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + mov r8, r1; \ + vld1.8 {q8-q9}, [r1]!; \ + veor q1, q1, q8; \ + veor q2, q2, q9; \ + vld1.8 {q8-q9}, [r1]!; \ + vst1.8 {q1-q2}, [r8]!; \ + veor q3, q3, q8; \ + veor q4, q4, q9; \ + vst1.8 {q3-q4}, [r8]; \ + \ + bhs .Locb_enc_loop4_##bits; \ + cmp r6, #0; \ + beq .Locb_enc_done; \ + \ + .Locb_enc_loop_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* Checksum_i = Checksum_{i-1} xor P_i */ \ + /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i) */ \ + \ + ldr r8, [r5], #4; \ + vld1.8 {q1}, [r2]!; /* load plaintext */ \ + vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \ + vld1.8 {q3}, [r4]; /* load checksum */ \ + subs r6, #1; \ + veor q0, q0, q2; \ + veor q3, q3, q1; \ + veor q1, q1, q0; \ + vst1.8 {q3}, [r4]; /* store checksum */ \ + \ + do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__); \ + \ + veor q1, q1, q0; \ + vst1.8 {q1}, [r1]!; /* store ciphertext */ \ + \ + bne .Locb_enc_loop_##bits; \ + b .Locb_enc_done; + + OCB_ENC(128re, r0, r12) + OCB_ENC(192, r0, r12) + OCB_ENC(256, r0, r12) + +#undef OCB_ENC + +.Locb_enc_done: + vst1.8 {q0}, [r3] /* store offset */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + + pop {r4-r12,lr} + vpop {q4-q7} + bx lr +.size _gcry_aes_ocb_enc_armv8_ce,.-_gcry_aes_ocb_enc_armv8_ce; + + +/* + * void _gcry_aes_ocb_dec_armv8_ce (const void *keysched, + * unsigned char *outbuf, + * const unsigned char *inbuf, + * unsigned char *offset, + * unsigned char *checksum, + * void **Ls, + * size_t nblocks, + * unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_ocb_dec_armv8_ce +.type _gcry_aes_ocb_dec_armv8_ce,%function; +_gcry_aes_ocb_dec_armv8_ce: + /* input: + * r0: keysched + * r1: outbuf + * r2: inbuf + * r3: offset + * %st+0: checksum => r4 + * %st+4: Ls => r5 + * %st+8: nblocks => r6 (0 < nblocks <= 32) + * %st+12: nrounds => r7 + */ + + vpush {q4-q7} + push {r4-r12,lr} /* 4*16 + 4*10 = 104b */ + ldr r7, [sp, #(104+12)] + ldr r4, [sp, #(104+0)] + ldr r5, [sp, #(104+4)] + ldr r6, [sp, #(104+8)] + + cmp r7, #12 + vld1.8 {q0}, [r3] /* load offset */ + + aes_preload_keys(r0, r12); + + beq .Locb_dec_entry_192 + bhi .Locb_dec_entry_256 + +#define OCB_DEC(bits, ...) \ + .Locb_dec_entry_##bits: \ + cmp r6, #4; \ + blo .Locb_dec_loop_##bits; \ + \ + .Locb_dec_loop4_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* P_i = Offset_i xor DECIPHER(K, C_i xor Offset_i) */ \ + /* Checksum_i = Checksum_{i-1} xor P_i */ \ + \ + ldm r5!, {r8, r9, r10, r11}; \ + sub r6, #4; \ + \ + vld1.8 {q9}, [r8]; /* load L_{ntz(i+0)} */ \ + vld1.8 {q1-q2}, [r2]!; /* load P_i+<0-1> */ \ + veor q0, q0, q9; /* Offset_i+0 */ \ + vld1.8 {q9}, [r9]; /* load L_{ntz(i+1)} */ \ + veor q1, q1, q0; /* P_i+0 xor Offset_i+0 */\ + vld1.8 {q3-q4}, [r2]!; /* load P_i+<2-3> */ \ + vst1.8 {q0}, [r1]!; /* store Offset_i+0 */\ + veor q0, q0, q9; /* Offset_i+1 */ \ + vld1.8 {q9}, [r10]; /* load L_{ntz(i+2)} */ \ + veor q2, q2, q0; /* P_i+1 xor Offset_i+1 */\ + vst1.8 {q0}, [r1]!; /* store Offset_i+1 */\ + veor q0, q0, q9; /* Offset_i+2 */ \ + vld1.8 {q9}, [r11]; /* load L_{ntz(i+3)} */ \ + veor q3, q3, q0; /* P_i+2 xor Offset_i+2 */\ + vst1.8 {q0}, [r1]!; /* store Offset_i+2 */\ + veor q0, q0, q9; /* Offset_i+3 */ \ + veor q4, q4, q0; /* P_i+3 xor Offset_i+3 */\ + vst1.8 {q0}, [r1]; /* store Offset_i+3 */\ + sub r1, #(3*16); \ + \ + cmp r6, #4; \ + \ + do_aes_4_##bits(d, imc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + mov r8, r1; \ + vld1.8 {q8-q9}, [r1]!; \ + veor q1, q1, q8; \ + veor q2, q2, q9; \ + vld1.8 {q8-q9}, [r1]!; \ + vst1.8 {q1-q2}, [r8]!; \ + veor q1, q1, q2; \ + vld1.8 {q2}, [r4]; /* load Checksum_{i-1} */ \ + veor q3, q3, q8; \ + veor q1, q1, q3; \ + veor q4, q4, q9; \ + veor q1, q1, q4; \ + vst1.8 {q3-q4}, [r8]; \ + veor q2, q2, q1; \ + vst1.8 {q2}, [r4]; /* store Checksum_i+3 */ \ + \ + bhs .Locb_dec_loop4_##bits; \ + cmp r6, #0; \ + beq .Locb_dec_done; \ + \ + .Locb_dec_loop_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* P_i = Offset_i xor DECIPHER(K, C_i xor Offset_i) */ \ + /* Checksum_i = Checksum_{i-1} xor P_i */ \ + \ + ldr r8, [r5], #4; \ + vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \ + vld1.8 {q1}, [r2]!; /* load ciphertext */ \ + subs r6, #1; \ + veor q0, q0, q2; \ + veor q1, q1, q0; \ + \ + do_aes_one##bits(d, imc, q1, q1, ##__VA_ARGS__) \ + \ + vld1.8 {q2}, [r4]; /* load checksum */ \ + veor q1, q1, q0; \ + vst1.8 {q1}, [r1]!; /* store plaintext */ \ + veor q2, q2, q1; \ + vst1.8 {q2}, [r4]; /* store checksum */ \ + \ + bne .Locb_dec_loop_##bits; \ + b .Locb_dec_done; + + OCB_DEC(128re, r0, r12) + OCB_DEC(192, r0, r12) + OCB_DEC(256, r0, r12) + +#undef OCB_DEC + +.Locb_dec_done: + vst1.8 {q0}, [r3] /* store offset */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + + pop {r4-r12,lr} + vpop {q4-q7} + bx lr +.size _gcry_aes_ocb_dec_armv8_ce,.-_gcry_aes_ocb_dec_armv8_ce; + + +/* + * void _gcry_aes_ocb_auth_armv8_ce (const void *keysched, + * const unsigned char *abuf, + * unsigned char *offset, + * unsigned char *checksum, + * void **Ls, + * size_t nblocks, + * unsigned int nrounds); + */ + +.align 3 +.globl _gcry_aes_ocb_auth_armv8_ce +.type _gcry_aes_ocb_auth_armv8_ce,%function; +_gcry_aes_ocb_auth_armv8_ce: + /* input: + * r0: keysched + * r1: abuf + * r2: offset + * r3: checksum + * %st+0: Ls => r5 + * %st+4: nblocks => r6 (0 < nblocks <= 32) + * %st+8: nrounds => r7 + */ + + vpush {q4-q7} + push {r4-r12,lr} /* 4*16 + 4*10 = 104b */ + ldr r7, [sp, #(104+8)] + ldr r5, [sp, #(104+0)] + ldr r6, [sp, #(104+4)] + + cmp r7, #12 + vld1.8 {q0}, [r2] /* load offset */ + + aes_preload_keys(r0, r12); + + beq .Locb_auth_entry_192 + bhi .Locb_auth_entry_256 + +#define OCB_AUTH(bits, ...) \ + .Locb_auth_entry_##bits: \ + cmp r6, #4; \ + blo .Locb_auth_loop_##bits; \ + \ + .Locb_auth_loop4_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i) */ \ + \ + ldm r5!, {r8, r9, r10, r11}; \ + sub r6, #4; \ + \ + vld1.8 {q9}, [r8]; /* load L_{ntz(i+0)} */ \ + vld1.8 {q1-q2}, [r1]!; /* load A_i+<0-1> */ \ + veor q0, q0, q9; /* Offset_i+0 */ \ + vld1.8 {q9}, [r9]; /* load L_{ntz(i+1)} */ \ + veor q1, q1, q0; /* A_i+0 xor Offset_i+0 */\ + vld1.8 {q3-q4}, [r1]!; /* load A_i+<2-3> */ \ + veor q0, q0, q9; /* Offset_i+1 */ \ + vld1.8 {q9}, [r10]; /* load L_{ntz(i+2)} */ \ + veor q2, q2, q0; /* A_i+1 xor Offset_i+1 */\ + veor q0, q0, q9; /* Offset_i+2 */ \ + vld1.8 {q9}, [r11]; /* load L_{ntz(i+3)} */ \ + veor q3, q3, q0; /* A_i+2 xor Offset_i+2 */\ + veor q0, q0, q9; /* Offset_i+3 */ \ + veor q4, q4, q0; /* A_i+3 xor Offset_i+3 */\ + \ + cmp r6, #4; \ + \ + do_aes_4_##bits(e, mc, q1, q2, q3, q4, ##__VA_ARGS__); \ + \ + veor q1, q1, q2; \ + veor q3, q3, q4; \ + vld1.8 {q2}, [r3]; \ + veor q1, q1, q3; \ + veor q2, q2, q1; \ + vst1.8 {q2}, [r3]; \ + \ + bhs .Locb_auth_loop4_##bits; \ + cmp r6, #0; \ + beq .Locb_auth_done; \ + \ + .Locb_auth_loop_##bits: \ + \ + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ \ + /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i) */ \ + \ + ldr r8, [r5], #4; \ + vld1.8 {q2}, [r8]; /* load L_{ntz(i)} */ \ + vld1.8 {q1}, [r1]!; /* load aadtext */ \ + subs r6, #1; \ + veor q0, q0, q2; \ + vld1.8 {q2}, [r3]; /* load checksum */ \ + veor q1, q1, q0; \ + \ + do_aes_one##bits(e, mc, q1, q1, ##__VA_ARGS__) \ + \ + veor q2, q2, q1; \ + vst1.8 {q2}, [r3]; /* store checksum */ \ + \ + bne .Locb_auth_loop_##bits; \ + b .Locb_auth_done; + + OCB_AUTH(128re, r0, r12) + OCB_AUTH(192, r0, r12) + OCB_AUTH(256, r0, r12) + +#undef OCB_AUTH + +.Locb_auth_done: + vst1.8 {q0}, [r2] /* store offset */ + + CLEAR_REG(q0) + CLEAR_REG(q1) + CLEAR_REG(q2) + CLEAR_REG(q3) + CLEAR_REG(q8) + CLEAR_REG(q9) + CLEAR_REG(q10) + CLEAR_REG(q11) + CLEAR_REG(q12) + CLEAR_REG(q13) + CLEAR_REG(q14) + + pop {r4-r12,lr} + vpop {q4-q7} + bx lr +.size _gcry_aes_ocb_auth_armv8_ce,.-_gcry_aes_ocb_auth_armv8_ce; + + +/* + * u32 _gcry_aes_sbox4_armv8_ce(u32 in4b); + */ +.align 3 +.globl _gcry_aes_sbox4_armv8_ce +.type _gcry_aes_sbox4_armv8_ce,%function; +_gcry_aes_sbox4_armv8_ce: + /* See "Gouv?a, C. P. L. & L?pez, J. Implementing GCM on ARMv8. Topics in + * Cryptology ? CT-RSA 2015" for details. + */ + vmov.i8 q0, #0x52 + vmov.i8 q1, #0 + vmov s0, r0 + aese.8 q0, q1 + veor d0, d1 + vpadd.i32 d0, d0, d1 + vmov r0, s0 + CLEAR_REG(q0) + bx lr +.size _gcry_aes_sbox4_armv8_ce,.-_gcry_aes_sbox4_armv8_ce; + + +/* + * void _gcry_aes_invmixcol_armv8_ce(void *dst, const void *src); + */ +.align 3 +.globl _gcry_aes_invmixcol_armv8_ce +.type _gcry_aes_invmixcol_armv8_ce,%function; +_gcry_aes_invmixcol_armv8_ce: + vld1.8 {q0}, [r1] + aesimc.8 q0, q0 + vst1.8 {q0}, [r0] + CLEAR_REG(q0) + bx lr +.size _gcry_aes_invmixcol_armv8_ce,.-_gcry_aes_invmixcol_armv8_ce; + +#endif diff --git a/cipher/rijndael-armv8-ce.c b/cipher/rijndael-armv8-ce.c new file mode 100644 index 0000000..bed4066 --- /dev/null +++ b/cipher/rijndael-armv8-ce.c @@ -0,0 +1,469 @@ +/* ARMv8 Crypto Extension AES for Libgcrypt + * Copyright (C) 2016 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + * + */ + +#include +#include +#include +#include /* for memcmp() */ + +#include "types.h" /* for byte and u32 typedefs */ +#include "g10lib.h" +#include "cipher.h" +#include "bufhelp.h" +#include "cipher-selftest.h" +#include "rijndael-internal.h" +#include "./cipher-internal.h" + + +#ifdef USE_ARM_CE + + +typedef struct u128_s { u32 a, b, c, d; } u128_t; + +extern u32 _gcry_aes_sbox4_armv8_ce(u32 in4b); +extern void _gcry_aes_invmixcol_armv8_ce(u128_t *dst, const u128_t *src); + +extern unsigned int _gcry_aes_enc_armv8_ce(const void *keysched, byte *dst, + const byte *src, + unsigned int nrounds); +extern unsigned int _gcry_aes_dec_armv8_ce(const void *keysched, byte *dst, + const byte *src, + unsigned int nrounds); + +extern void _gcry_aes_cbc_enc_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + int cbc_mac, unsigned int nrounds); +extern void _gcry_aes_cbc_dec_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + unsigned int nrounds); + +extern void _gcry_aes_cfb_enc_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + unsigned int nrounds); +extern void _gcry_aes_cfb_dec_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + unsigned int nrounds); + +extern void _gcry_aes_ctr_enc_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + unsigned int nrounds); + +extern void _gcry_aes_ocb_enc_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *offset, + unsigned char *checksum, + void **Ls, + size_t nblocks, + unsigned int nrounds); +extern void _gcry_aes_ocb_dec_armv8_ce (const void *keysched, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *offset, + unsigned char *checksum, + void **Ls, + size_t nblocks, + unsigned int nrounds); +extern void _gcry_aes_ocb_auth_armv8_ce (const void *keysched, + const unsigned char *abuf, + unsigned char *offset, + unsigned char *checksum, + void **Ls, + size_t nblocks, + unsigned int nrounds); + +typedef void (*ocb_crypt_fn_t) (const void *keysched, unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *offset, unsigned char *checksum, + void **Ls, size_t nblocks, + unsigned int nrounds); + +void +_gcry_aes_armv8_ce_setkey (RIJNDAEL_context *ctx, const byte *key) +{ + union + { + PROPERLY_ALIGNED_TYPE dummy; + byte data[MAXKC][4]; + u32 data32[MAXKC]; + } tkk[2]; + unsigned int rounds = ctx->rounds; + int KC = rounds - 6; + unsigned int keylen = KC * 4; + unsigned int i, r, t; + byte rcon = 1; + int j; +#define k tkk[0].data +#define k_u32 tkk[0].data32 +#define tk tkk[1].data +#define tk_u32 tkk[1].data32 +#define W (ctx->keyschenc) +#define W_u32 (ctx->keyschenc32) + + for (i = 0; i < keylen; i++) + { + k[i >> 2][i & 3] = key[i]; + } + + for (j = KC-1; j >= 0; j--) + { + tk_u32[j] = k_u32[j]; + } + r = 0; + t = 0; + /* Copy values into round key array. */ + for (j = 0; (j < KC) && (r < rounds + 1); ) + { + for (; (j < KC) && (t < 4); j++, t++) + { + W_u32[r][t] = le_bswap32(tk_u32[j]); + } + if (t == 4) + { + r++; + t = 0; + } + } + + while (r < rounds + 1) + { + tk_u32[0] ^= _gcry_aes_sbox4_armv8_ce(rol(tk_u32[KC - 1], 24)) ^ rcon; + + if (KC != 8) + { + for (j = 1; j < KC; j++) + { + tk_u32[j] ^= tk_u32[j-1]; + } + } + else + { + for (j = 1; j < KC/2; j++) + { + tk_u32[j] ^= tk_u32[j-1]; + } + + tk_u32[KC/2] ^= _gcry_aes_sbox4_armv8_ce(tk_u32[KC/2 - 1]); + + for (j = KC/2 + 1; j < KC; j++) + { + tk_u32[j] ^= tk_u32[j-1]; + } + } + + /* Copy values into round key array. */ + for (j = 0; (j < KC) && (r < rounds + 1); ) + { + for (; (j < KC) && (t < 4); j++, t++) + { + W_u32[r][t] = le_bswap32(tk_u32[j]); + } + if (t == 4) + { + r++; + t = 0; + } + } + + rcon = (rcon << 1) ^ ((rcon >> 7) * 0x1b); + } + +#undef W +#undef tk +#undef k +#undef W_u32 +#undef tk_u32 +#undef k_u32 + wipememory(&tkk, sizeof(tkk)); +} + +/* Make a decryption key from an encryption key. */ +void +_gcry_aes_armv8_ce_prepare_decryption (RIJNDAEL_context *ctx) +{ + u128_t *ekey = (u128_t *)(void *)ctx->keyschenc; + u128_t *dkey = (u128_t *)(void *)ctx->keyschdec; + int rounds = ctx->rounds; + int rr; + int r; + +#define DO_AESIMC() _gcry_aes_invmixcol_armv8_ce(&dkey[r], &ekey[rr]) + + dkey[0] = ekey[rounds]; + r = 1; + rr = rounds-1; + DO_AESIMC(); r++; rr--; /* round 1 */ + DO_AESIMC(); r++; rr--; /* round 2 */ + DO_AESIMC(); r++; rr--; /* round 3 */ + DO_AESIMC(); r++; rr--; /* round 4 */ + DO_AESIMC(); r++; rr--; /* round 5 */ + DO_AESIMC(); r++; rr--; /* round 6 */ + DO_AESIMC(); r++; rr--; /* round 7 */ + DO_AESIMC(); r++; rr--; /* round 8 */ + DO_AESIMC(); r++; rr--; /* round 9 */ + if (rounds >= 12) + { + if (rounds > 12) + { + DO_AESIMC(); r++; rr--; /* round 10 */ + DO_AESIMC(); r++; rr--; /* round 11 */ + } + + DO_AESIMC(); r++; rr--; /* round 12 / 10 */ + DO_AESIMC(); r++; rr--; /* round 13 / 11 */ + } + + dkey[r] = ekey[0]; + +#undef DO_AESIMC +} + +unsigned int +_gcry_aes_armv8_ce_encrypt (const RIJNDAEL_context *ctx, unsigned char *dst, + const unsigned char *src) +{ + const void *keysched = ctx->keyschenc32; + unsigned int nrounds = ctx->rounds; + + return _gcry_aes_enc_armv8_ce(keysched, dst, src, nrounds); +} + +unsigned int +_gcry_aes_armv8_ce_decrypt (const RIJNDAEL_context *ctx, unsigned char *dst, + const unsigned char *src) +{ + const void *keysched = ctx->keyschdec32; + unsigned int nrounds = ctx->rounds; + + return _gcry_aes_dec_armv8_ce(keysched, dst, src, nrounds); +} + +void +_gcry_aes_armv8_ce_cbc_enc (const RIJNDAEL_context *ctx, unsigned char *outbuf, + const unsigned char *inbuf, unsigned char *iv, + size_t nblocks, int cbc_mac) +{ + const void *keysched = ctx->keyschenc32; + unsigned int nrounds = ctx->rounds; + + _gcry_aes_cbc_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, cbc_mac, + nrounds); +} + +void +_gcry_aes_armv8_ce_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, + const unsigned char *inbuf, unsigned char *iv, + size_t nblocks) +{ + const void *keysched = ctx->keyschdec32; + unsigned int nrounds = ctx->rounds; + + _gcry_aes_cbc_dec_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds); +} + +void +_gcry_aes_armv8_ce_cfb_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, + const unsigned char *inbuf, unsigned char *iv, + size_t nblocks) +{ + const void *keysched = ctx->keyschenc32; + unsigned int nrounds = ctx->rounds; + + _gcry_aes_cfb_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds); +} + +void +_gcry_aes_armv8_ce_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, + const unsigned char *inbuf, unsigned char *iv, + size_t nblocks) +{ + const void *keysched = ctx->keyschenc32; + unsigned int nrounds = ctx->rounds; + + _gcry_aes_cfb_dec_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds); +} + +void +_gcry_aes_armv8_ce_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, + const unsigned char *inbuf, unsigned char *iv, + size_t nblocks) +{ + const void *keysched = ctx->keyschenc32; + unsigned int nrounds = ctx->rounds; + + _gcry_aes_ctr_enc_armv8_ce(keysched, outbuf, inbuf, iv, nblocks, nrounds); +} + +void +_gcry_aes_armv8_ce_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, + const void *inbuf_arg, size_t nblocks, + int encrypt) +{ + RIJNDAEL_context *ctx = (void *)&c->context.c; + const void *keysched = encrypt ? ctx->keyschenc32 : ctx->keyschdec32; + ocb_crypt_fn_t crypt_fn = encrypt ? _gcry_aes_ocb_enc_armv8_ce + : _gcry_aes_ocb_dec_armv8_ce; + unsigned char *outbuf = outbuf_arg; + const unsigned char *inbuf = inbuf_arg; + unsigned int nrounds = ctx->rounds; + u64 blkn = c->u_mode.ocb.data_nblocks; + u64 blkn_offs = blkn - blkn % 32; + unsigned int n = 32 - blkn % 32; + unsigned char l_tmp[16]; + void *Ls[32]; + void **l; + size_t i; + + c->u_mode.ocb.data_nblocks = blkn + nblocks; + + if (nblocks >= 32) + { + for (i = 0; i < 32; i += 8) + { + Ls[(i + 0 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + } + + Ls[(7 + n) % 32] = (void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (void *)c->u_mode.ocb.L[3]; + l = &Ls[(31 + n) % 32]; + + /* Process data in 32 block chunks. */ + while (nblocks >= 32) + { + /* l_tmp will be used only every 65536-th block. */ + blkn_offs += 32; + *l = (void *)ocb_get_l(c, l_tmp, blkn_offs); + + crypt_fn(keysched, outbuf, inbuf, c->u_iv.iv, c->u_ctr.ctr, Ls, 32, + nrounds); + + nblocks -= 32; + outbuf += 32 * 16; + inbuf += 32 * 16; + } + + if (nblocks && l < &Ls[nblocks]) + { + *l = (void *)ocb_get_l(c, l_tmp, 32 + blkn_offs); + } + } + else + { + for (i = 0; i < nblocks; i++) + Ls[i] = (void *)ocb_get_l(c, l_tmp, ++blkn); + } + + if (nblocks) + { + crypt_fn(keysched, outbuf, inbuf, c->u_iv.iv, c->u_ctr.ctr, Ls, nblocks, + nrounds); + } + + wipememory(&l_tmp, sizeof(l_tmp)); +} + +void +_gcry_aes_armv8_ce_ocb_auth (gcry_cipher_hd_t c, void *abuf_arg, + size_t nblocks) +{ + RIJNDAEL_context *ctx = (void *)&c->context.c; + const void *keysched = ctx->keyschenc32; + const unsigned char *abuf = abuf_arg; + unsigned int nrounds = ctx->rounds; + u64 blkn = c->u_mode.ocb.aad_nblocks; + u64 blkn_offs = blkn - blkn % 32; + unsigned int n = 32 - blkn % 32; + unsigned char l_tmp[16]; + void *Ls[32]; + void **l; + size_t i; + + c->u_mode.ocb.aad_nblocks = blkn + nblocks; + + if (nblocks >= 32) + { + for (i = 0; i < 32; i += 8) + { + Ls[(i + 0 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (void *)c->u_mode.ocb.L[0]; + } + + Ls[(7 + n) % 32] = (void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (void *)c->u_mode.ocb.L[3]; + l = &Ls[(31 + n) % 32]; + + /* Process data in 32 block chunks. */ + while (nblocks >= 32) + { + /* l_tmp will be used only every 65536-th block. */ + blkn_offs += 32; + *l = (void *)ocb_get_l(c, l_tmp, blkn_offs); + + _gcry_aes_ocb_auth_armv8_ce(keysched, abuf, c->u_mode.ocb.aad_offset, + c->u_mode.ocb.aad_sum, Ls, 32, nrounds); + + nblocks -= 32; + abuf += 32 * 16; + } + + if (nblocks && l < &Ls[nblocks]) + { + *l = (void *)ocb_get_l(c, l_tmp, 32 + blkn_offs); + } + } + else + { + for (i = 0; i < nblocks; i++) + Ls[i] = (void *)ocb_get_l(c, l_tmp, ++blkn); + } + + if (nblocks) + { + _gcry_aes_ocb_auth_armv8_ce(keysched, abuf, c->u_mode.ocb.aad_offset, + c->u_mode.ocb.aad_sum, Ls, nblocks, nrounds); + } + + wipememory(&l_tmp, sizeof(l_tmp)); +} + +#endif /* USE_ARM_CE */ diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index 6641728..7544fa0 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -82,6 +82,17 @@ # endif #endif /* ENABLE_AESNI_SUPPORT */ +/* USE_ARM_CE indicates whether to enable ARMv8 Crypto Extension assembly + * code. */ +#undef USE_ARM_CE +#ifdef ENABLE_ARM_CRYPTO_SUPPORT +# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \ + && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \ + && defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) +# define USE_ARM_CE 1 +# endif +#endif /* ENABLE_ARM_CRYPTO_SUPPORT */ + struct RIJNDAEL_context_s; typedef unsigned int (*rijndael_cryptfn_t)(const struct RIJNDAEL_context_s *ctx, @@ -127,6 +138,9 @@ typedef struct RIJNDAEL_context_s #ifdef USE_SSSE3 unsigned int use_ssse3:1; /* SSSE3 shall be used. */ #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + unsigned int use_arm_ce:1; /* ARMv8 CE shall be used. */ +#endif /*USE_ARM_CE*/ rijndael_cryptfn_t encrypt_fn; rijndael_cryptfn_t decrypt_fn; rijndael_prefetchfn_t prefetch_enc_fn; diff --git a/cipher/rijndael.c b/cipher/rijndael.c index 0130924..cc6a722 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -168,6 +168,46 @@ extern unsigned int _gcry_aes_arm_decrypt_block(const void *keysched_dec, const void *decT); #endif /*USE_ARM_ASM*/ +#ifdef USE_ARM_CE +/* ARMv8 Crypto Extension implementations of AES */ +extern void _gcry_aes_armv8_ce_setkey(RIJNDAEL_context *ctx, const byte *key); +extern void _gcry_aes_armv8_ce_prepare_decryption(RIJNDAEL_context *ctx); + +extern unsigned int _gcry_aes_armv8_ce_encrypt(const RIJNDAEL_context *ctx, + unsigned char *dst, + const unsigned char *src); +extern unsigned int _gcry_aes_armv8_ce_decrypt(const RIJNDAEL_context *ctx, + unsigned char *dst, + const unsigned char *src); + +extern void _gcry_aes_armv8_ce_cfb_enc (RIJNDAEL_context *ctx, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks); +extern void _gcry_aes_armv8_ce_cbc_enc (RIJNDAEL_context *ctx, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks, + int cbc_mac); +extern void _gcry_aes_armv8_ce_ctr_enc (RIJNDAEL_context *ctx, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *ctr, size_t nblocks); +extern void _gcry_aes_armv8_ce_cfb_dec (RIJNDAEL_context *ctx, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks); +extern void _gcry_aes_armv8_ce_cbc_dec (RIJNDAEL_context *ctx, + unsigned char *outbuf, + const unsigned char *inbuf, + unsigned char *iv, size_t nblocks); +extern void _gcry_aes_armv8_ce_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, + const void *inbuf_arg, size_t nblocks, + int encrypt); +extern void _gcry_aes_armv8_ce_ocb_auth (gcry_cipher_hd_t c, + const void *abuf_arg, size_t nblocks); +#endif /*USE_ARM_ASM*/ + static unsigned int do_encrypt (const RIJNDAEL_context *ctx, unsigned char *bx, const unsigned char *ax); static unsigned int do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx, @@ -223,11 +263,12 @@ static gcry_err_code_t do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen) { static int initialized = 0; - static const char *selftest_failed=0; + static const char *selftest_failed = 0; int rounds; int i,j, r, t, rconpointer = 0; int KC; -#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) +#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) \ + || defined(USE_ARM_CE) unsigned int hwfeatures; #endif @@ -268,7 +309,8 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen) ctx->rounds = rounds; -#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) +#if defined(USE_AESNI) || defined(USE_PADLOCK) || defined(USE_SSSE3) \ + || defined(USE_ARM_CE) hwfeatures = _gcry_get_hw_features (); #endif @@ -282,6 +324,9 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen) #ifdef USE_SSSE3 ctx->use_ssse3 = 0; #endif +#ifdef USE_ARM_CE + ctx->use_arm_ce = 0; +#endif if (0) { @@ -318,6 +363,16 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen) ctx->use_ssse3 = 1; } #endif +#ifdef USE_ARM_CE + else if (hwfeatures & HWF_ARM_AES) + { + ctx->encrypt_fn = _gcry_aes_armv8_ce_encrypt; + ctx->decrypt_fn = _gcry_aes_armv8_ce_decrypt; + ctx->prefetch_enc_fn = NULL; + ctx->prefetch_dec_fn = NULL; + ctx->use_arm_ce = 1; + } +#endif else { ctx->encrypt_fn = do_encrypt; @@ -340,6 +395,10 @@ do_setkey (RIJNDAEL_context *ctx, const byte *key, const unsigned keylen) else if (ctx->use_ssse3) _gcry_aes_ssse3_do_setkey (ctx, key); #endif +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + _gcry_aes_armv8_ce_setkey (ctx, key); +#endif else { const byte *sbox = ((const byte *)encT) + 1; @@ -471,6 +530,12 @@ prepare_decryption( RIJNDAEL_context *ctx ) _gcry_aes_ssse3_prepare_decryption (ctx); } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_prepare_decryption (ctx); + } +#endif /*USE_SSSE3*/ #ifdef USE_PADLOCK else if (ctx->use_padlock) { @@ -744,6 +809,13 @@ _gcry_aes_cfb_enc (void *context, unsigned char *iv, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_cfb_enc (ctx, outbuf, inbuf, iv, nblocks); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn; @@ -798,6 +870,13 @@ _gcry_aes_cbc_enc (void *context, unsigned char *iv, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_cbc_enc (ctx, outbuf, inbuf, iv, nblocks, cbc_mac); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn; @@ -860,6 +939,13 @@ _gcry_aes_ctr_enc (void *context, unsigned char *ctr, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_ctr_enc (ctx, outbuf, inbuf, ctr, nblocks); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } tmp; @@ -1120,6 +1206,13 @@ _gcry_aes_cfb_dec (void *context, unsigned char *iv, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_cfb_dec (ctx, outbuf, inbuf, iv, nblocks); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { rijndael_cryptfn_t encrypt_fn = ctx->encrypt_fn; @@ -1173,6 +1266,13 @@ _gcry_aes_cbc_dec (void *context, unsigned char *iv, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_cbc_dec (ctx, outbuf, inbuf, iv, nblocks); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { unsigned char savebuf[BLOCKSIZE] ATTR_ALIGNED_16; @@ -1238,6 +1338,13 @@ _gcry_aes_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_ocb_crypt (c, outbuf, inbuf, nblocks, encrypt); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else if (encrypt) { union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } l_tmp; @@ -1323,6 +1430,13 @@ _gcry_aes_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks) burn_depth = 0; } #endif /*USE_SSSE3*/ +#ifdef USE_ARM_CE + else if (ctx->use_arm_ce) + { + _gcry_aes_armv8_ce_ocb_auth (c, abuf, nblocks); + burn_depth = 0; + } +#endif /*USE_ARM_CE*/ else { union { unsigned char x1[16] ATTR_ALIGNED_16; u32 x32[4]; } l_tmp; diff --git a/configure.ac b/configure.ac index 91dd285..915888a 100644 --- a/configure.ac +++ b/configure.ac @@ -2009,6 +2009,10 @@ if test "$found" = "1" ; then arm*-*-*) # Build with the assembly implementation GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-arm.lo" + + # Build with the ARMv8/AArch32 CE implementation + GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-armv8-ce.lo" + GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-armv8-aarch32-ce.lo" ;; esac From cvs at cvs.gnupg.org Thu Jul 14 11:17:49 2016 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 14 Jul 2016 11:17:49 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.1-19-g2b26de6 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 2b26de65e61dc42c64120c463a33e944bf413e28 (commit) from e535ea1bdc42309553007d60599d3147b8defe93 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 2b26de65e61dc42c64120c463a33e944bf413e28 Author: Werner Koch Date: Thu Jul 14 11:15:38 2016 +0200 build: Update NEWS. -- diff --git a/NEWS b/NEWS index be5e084..498c9da 100644 --- a/NEWS +++ b/NEWS @@ -1,6 +1,24 @@ Noteworthy changes in version 1.7.2 (unreleased) [C21/A1/R_] ------------------------------------------------ + * Bug fixes: + + - Fix setting of the ECC cofactor if parameters are specified. + + - Fix memory leak in the ECC code. + + - Remove debug message about unsupported getrandom syscall. + + - Fix build problems related to AVX use + + - Fix bus errors on ARM for Poly1305, ChaCha20, AES, and SHA-512. + + * Internal chnages: + + - Improved fatal error message for wrong use of gcry_md_read. + + - Disallow symmetric encryption/decryption if key is not set. + Noteworthy changes in version 1.7.1 (2016-06-15) [C21/A1/R1] ------------------------------------------------ ----------------------------------------------------------------------- Summary of changes: NEWS | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Thu Jul 14 11:42:39 2016 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 14 Jul 2016 11:42:39 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.2-1-g62642c4 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 62642c4be0653a94fdec0c0b1f9d38673250a156 (commit) via be0bec7d9208b2f2d2ffce9cc2ca6154853e7e59 (commit) via 5a4cbc5256e493563eb82a9bb73f22fe4d413579 (commit) via b0b70e7fe37b1bf13ec0bfc8effcb5c7f5db6b7d (commit) from 2b26de65e61dc42c64120c463a33e944bf413e28 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- ----------------------------------------------------------------------- Summary of changes: Makefile.am | 15 +++++++++++++++ NEWS | 10 +++++++--- configure.ac | 4 ++-- 3 files changed, 24 insertions(+), 5 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Mon Jul 18 20:43:27 2016 From: wk at gnupg.org (Werner Koch) Date: Mon, 18 Jul 2016 20:43:27 +0200 Subject: Asm portability problems on OS X and Solaris Message-ID: <87y44y7po0.fsf@wheatstone.g10code.de> Hi, I got reports about two portability bug reports for 1.7.2. The first is on an OS X 10.7.5: CFLAGS=-m32 ./configure --disable-asm && make all check ... libtool: compile: gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF .deps/rijndael-aesni.Tpo -c rijndael-aesni.c -fno-common -DPIC -o .libs/rijndael-aesni.o rijndael-aesni.c: In function 'do_aesni_ctr_4': rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints asm volatile (/* detect if 8-bit carry handling is needed */ ^ So, --disable-asm is not honored in this case. A similar problem exists on OpenIndiana with gcc 4.9: crc-intel-pclmul.c: In function 'crc32_less_than_16': crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints asm volatile ("movd %[crc], %%xmm0\n\t" using the Solaris compiler we get this error: "../src/cipher-proto.h", line 27: warning: useless declaration "md.c", line 912: identifier redeclared: _gcry_md_extract current : function(pointer to struct gcry_md_handle {pointer to struct gcry_md_context {..} ctx, ... long output ... previous: function(pointer to struct gcry_md_handle {pointer to struct gcry_md_context {..} ctx, int bufpos, int bufsize, array[1] of unsigned char buf}, int, pointer to void, unsigned int) returning unsigned int : "../src/gcrypt-int.h", line 132 cc: acomp failed for md.c Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. /* Join us at OpenPGP.conf */ From jussi.kivilinna at iki.fi Tue Jul 19 10:43:02 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 19 Jul 2016 11:43:02 +0300 Subject: Asm portability problems on OS X and Solaris In-Reply-To: <87y44y7po0.fsf@wheatstone.g10code.de> References: <87y44y7po0.fsf@wheatstone.g10code.de> Message-ID: <578DE816.9060004@iki.fi> On 18.07.2016 21:43, Werner Koch wrote: > Hi, > > I got reports about two portability bug reports for 1.7.2. > > The first is on an OS X 10.7.5: > > CFLAGS=-m32 ./configure --disable-asm && make all check > ... > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src > -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF > .deps/rijndael-aesni.Tpo -c rijndael-aesni.c -fno-common -DPIC -o > .libs/rijndael-aesni.o > rijndael-aesni.c: In function 'do_aesni_ctr_4': > rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints > asm volatile (/* detect if 8-bit carry handling is needed */ > ^ > > So, --disable-asm is not honored in this case. Currently --disable-asm only disables MPI assembly modules. Configure options --disable-aesni-support and --disable-pclmul-support can be used to disable AESNI and PCLMUL assembly parts. > A similar problem exists > on OpenIndiana with gcc 4.9: > > crc-intel-pclmul.c: In function 'crc32_less_than_16': > crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints > asm volatile ("movd %[crc], %%xmm0\n\t" GCC is failing to allocate register constraints for inline assembly blocks in these cases. In these two asm blocks, quite many memory operands are used and with PIC enabled, these might be requiring extra registers. So, these blocks need be split to ease register pressure. > > using the Solaris compiler we get this error: > > "../src/cipher-proto.h", line 27: warning: useless declaration > "md.c", line 912: identifier redeclared: _gcry_md_extract > current : function(pointer to struct gcry_md_handle {pointer to > struct gcry_md_context {..} ctx, > ... long output ... This does not seem to be full 'current' declaration, but truncated. -Jussi > previous: function(pointer to struct gcry_md_handle {pointer to > struct gcry_md_context {..} ctx, int bufpos, int bufsize, array[1] of > unsigned char buf}, int, pointer to void, unsigned int) returning unsigned > int : "../src/gcrypt-int.h", line 132 > cc: acomp failed for md.c > > > > Salam-Shalom, > > Werner > > From jussi.kivilinna at iki.fi Tue Jul 19 12:48:57 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 19 Jul 2016 13:48:57 +0300 Subject: Asm portability problems on OS X and Solaris In-Reply-To: <578DE816.9060004@iki.fi> References: <87y44y7po0.fsf@wheatstone.g10code.de> <578DE816.9060004@iki.fi> Message-ID: <578E0599.1000801@iki.fi> On 19.07.2016 11:43, Jussi Kivilinna wrote: > On 18.07.2016 21:43, Werner Koch wrote: >> Hi, >> >> I got reports about two portability bug reports for 1.7.2. >> >> The first is on an OS X 10.7.5: >> >> CFLAGS=-m32 ./configure --disable-asm && make all check >> ... >> libtool: compile: gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src >> -I/usr/local/include -m32 -Wall -MT rijndael-aesni.lo -MD -MP -MF >> .deps/rijndael-aesni.Tpo -c rijndael-aesni.c -fno-common -DPIC -o >> .libs/rijndael-aesni.o >> rijndael-aesni.c: In function 'do_aesni_ctr_4': >> rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints >> asm volatile (/* detect if 8-bit carry handling is needed */ >> ^ >> >> So, --disable-asm is not honored in this case. > > Currently --disable-asm only disables MPI assembly modules. > > Configure options --disable-aesni-support and --disable-pclmul-support can > be used to disable AESNI and PCLMUL assembly parts. > >> A similar problem exists >> on OpenIndiana with gcc 4.9: >> >> crc-intel-pclmul.c: In function 'crc32_less_than_16': >> crc-intel-pclmul.c:747:7: error: 'asm' operand has impossible constraints >> asm volatile ("movd %[crc], %%xmm0\n\t" > > GCC is failing to allocate register constraints for inline assembly blocks in > these cases. In these two asm blocks, quite many memory operands are used and > with PIC enabled, these might be requiring extra registers. So, these > blocks need be split to ease register pressure. > I managed to reproduce rijndael-aesni.c case on linux/gcc/i386 with GCC 4.7 and 4.8. GCC 4.9 and later are not affected. I'll post fix soon. Bug tracker had issue for this on gentoo: https://bugs.gnupg.org/gnupg/issue2325 I also made similar change for crc-intel-pclmul.c but was unable to reproduce the issue (on linux). -Jussi From jussi.kivilinna at iki.fi Tue Jul 19 12:51:34 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 19 Jul 2016 13:51:34 +0300 Subject: [PATCH 1/2] rijndael-aesni: split assembly block to ease register pressure Message-ID: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6> * cipher/rijndael-aesni.c (do_aesni_ctr_4): Use single register constraint for passing 'bige_addb' to assembly block; split first inline assembly block into two parts. -- Fixes compiling on i386 with GCC-4.8 and older. Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 97e0ad0..8b28b3a 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -794,6 +794,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4 } }; + const void *bige_addb = bige_addb_const; #define aesenc_xmm1_xmm0 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xc1\n\t" #define aesenc_xmm1_xmm2 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd1\n\t" #define aesenc_xmm1_xmm3 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd9\n\t" @@ -819,16 +820,15 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, "ja .Ladd32bit%=\n\t" "movdqa %%xmm5, %%xmm0\n\t" /* xmm0 := CTR (xmm5) */ - "movdqa %[addb_1], %%xmm2\n\t" /* xmm2 := be(1) */ - "movdqa %[addb_2], %%xmm3\n\t" /* xmm3 := be(2) */ - "movdqa %[addb_3], %%xmm4\n\t" /* xmm4 := be(3) */ - "movdqa %[addb_4], %%xmm5\n\t" /* xmm5 := be(4) */ + "movdqa 0*16(%[addb]), %%xmm2\n\t" /* xmm2 := be(1) */ + "movdqa 1*16(%[addb]), %%xmm3\n\t" /* xmm3 := be(2) */ + "movdqa 2*16(%[addb]), %%xmm4\n\t" /* xmm4 := be(3) */ + "movdqa 3*16(%[addb]), %%xmm5\n\t" /* xmm5 := be(4) */ "paddb %%xmm0, %%xmm2\n\t" /* xmm2 := be(1) + CTR (xmm0) */ "paddb %%xmm0, %%xmm3\n\t" /* xmm3 := be(2) + CTR (xmm0) */ "paddb %%xmm0, %%xmm4\n\t" /* xmm4 := be(3) + CTR (xmm0) */ "paddb %%xmm0, %%xmm5\n\t" /* xmm5 := be(4) + CTR (xmm0) */ "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ - "movl %[rounds], %%esi\n\t" "jmp .Lstore_ctr%=\n\t" ".Ladd32bit%=:\n\t" @@ -871,7 +871,6 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, ".Lno_carry%=:\n\t" "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ - "movl %[rounds], %%esi\n\t" "pshufb %%xmm6, %%xmm2\n\t" /* xmm2 := be(xmm2) */ "pshufb %%xmm6, %%xmm3\n\t" /* xmm3 := be(xmm3) */ @@ -880,8 +879,13 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, ".Lstore_ctr%=:\n\t" "movdqa %%xmm5, (%[ctr])\n\t" /* Update CTR (mem). */ + : + : [ctr] "r" (ctr), + [key] "r" (ctx->keyschenc), + [addb] "r" (bige_addb) + : "%esi", "cc", "memory"); - "pxor %%xmm1, %%xmm0\n\t" /* xmm0 ^= key[0] */ + asm volatile ("pxor %%xmm1, %%xmm0\n\t" /* xmm0 ^= key[0] */ "pxor %%xmm1, %%xmm2\n\t" /* xmm2 ^= key[0] */ "pxor %%xmm1, %%xmm3\n\t" /* xmm3 ^= key[0] */ "pxor %%xmm1, %%xmm4\n\t" /* xmm4 ^= key[0] */ @@ -931,7 +935,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenc_xmm1_xmm3 aesenc_xmm1_xmm4 "movdqa 0xa0(%[key]), %%xmm1\n\t" - "cmpl $10, %%esi\n\t" + "cmpl $10, %[rounds]\n\t" "jz .Lenclast%=\n\t" aesenc_xmm1_xmm0 aesenc_xmm1_xmm2 @@ -943,7 +947,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenc_xmm1_xmm3 aesenc_xmm1_xmm4 "movdqa 0xc0(%[key]), %%xmm1\n\t" - "cmpl $12, %%esi\n\t" + "cmpl $12, %[rounds]\n\t" "jz .Lenclast%=\n\t" aesenc_xmm1_xmm0 aesenc_xmm1_xmm2 @@ -962,14 +966,9 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenclast_xmm1_xmm3 aesenclast_xmm1_xmm4 : - : [ctr] "r" (ctr), - [key] "r" (ctx->keyschenc), - [rounds] "g" (ctx->rounds), - [addb_1] "m" (bige_addb_const[0][0]), - [addb_2] "m" (bige_addb_const[1][0]), - [addb_3] "m" (bige_addb_const[2][0]), - [addb_4] "m" (bige_addb_const[3][0]) - : "%esi", "cc", "memory"); + : [key] "r" (ctx->keyschenc), + [rounds] "r" (ctx->rounds) + : "cc", "memory"); asm volatile ("movdqu (%[src]), %%xmm1\n\t" /* Get block 1. */ "pxor %%xmm1, %%xmm0\n\t" /* EncCTR-1 ^= input */ From jussi.kivilinna at iki.fi Tue Jul 19 12:51:39 2016 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 19 Jul 2016 13:51:39 +0300 Subject: [PATCH 2/2] crc-intel-pclmul: split assembly block to ease register pressure In-Reply-To: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6> References: <146892549420.1140.11952264062807070282.stgit@localhost6.localdomain6> Message-ID: <146892549926.1140.16666077431608890245.stgit@localhost6.localdomain6> * cipher/crc-intel-pclmul.c (crc32_less_than_16): Split inline assembly block handling 4 byte input into multiple blocks. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c index 2972fb4..7a344e2 100644 --- a/cipher/crc-intel-pclmul.c +++ b/cipher/crc-intel-pclmul.c @@ -747,22 +747,28 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, asm volatile ("movd %[crc], %%xmm0\n\t" "movd %[in], %%xmm1\n\t" "movdqa %[my_p], %%xmm5\n\t" - "pxor %%xmm1, %%xmm0\n\t" + : + : [in] "m" (*inbuf), + [crc] "m" (*pcrc), + [my_p] "m" (consts->my_p[0]) + : "cc" ); + + asm volatile ("pxor %%xmm1, %%xmm0\n\t" "pshufb %[bswap], %%xmm0\n\t" /* [xx][00][00][00] */ "pclmulqdq $0x01, %%xmm5, %%xmm0\n\t" /* [00][xx][xx][00] */ "pclmulqdq $0x11, %%xmm5, %%xmm0\n\t" /* [00][00][xx][xx] */ + : + : [bswap] "m" (*crc32_bswap_shuf) + : "cc" ); - /* store CRC in input endian */ + asm volatile (/* store CRC in input endian */ "movd %%xmm0, %%eax\n\t" "bswapl %%eax\n\t" "movl %%eax, %[out]\n\t" : [out] "=m" (*pcrc) - : [in] "m" (*inbuf), - [crc] "m" (*pcrc), - [my_p] "m" (consts->my_p[0]), - [bswap] "m" (*crc32_bswap_shuf) - : "eax" ); + : + : "eax", "cc" ); } else { From cvs at cvs.gnupg.org Wed Jul 20 12:33:09 2016 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Wed, 20 Jul 2016 12:33:09 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.2-8-gf38199d Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via f38199dbc290003898a1799adc367265267784c2 (commit) via a4d1595a2638db63ac4c73e722c8ba95fdd85ff7 (commit) from 05a4cecae0c02d2b4ee1cadd9c08115beae3a94a (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit f38199dbc290003898a1799adc367265267784c2 Author: Jussi Kivilinna Date: Tue Jul 19 13:20:53 2016 +0300 crc-intel-pclmul: split assembly block to ease register pressure * cipher/crc-intel-pclmul.c (crc32_less_than_16): Split inline assembly block handling 4 byte input into multiple blocks. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c index 2972fb4..7a344e2 100644 --- a/cipher/crc-intel-pclmul.c +++ b/cipher/crc-intel-pclmul.c @@ -747,22 +747,28 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, asm volatile ("movd %[crc], %%xmm0\n\t" "movd %[in], %%xmm1\n\t" "movdqa %[my_p], %%xmm5\n\t" - "pxor %%xmm1, %%xmm0\n\t" + : + : [in] "m" (*inbuf), + [crc] "m" (*pcrc), + [my_p] "m" (consts->my_p[0]) + : "cc" ); + + asm volatile ("pxor %%xmm1, %%xmm0\n\t" "pshufb %[bswap], %%xmm0\n\t" /* [xx][00][00][00] */ "pclmulqdq $0x01, %%xmm5, %%xmm0\n\t" /* [00][xx][xx][00] */ "pclmulqdq $0x11, %%xmm5, %%xmm0\n\t" /* [00][00][xx][xx] */ + : + : [bswap] "m" (*crc32_bswap_shuf) + : "cc" ); - /* store CRC in input endian */ + asm volatile (/* store CRC in input endian */ "movd %%xmm0, %%eax\n\t" "bswapl %%eax\n\t" "movl %%eax, %[out]\n\t" : [out] "=m" (*pcrc) - : [in] "m" (*inbuf), - [crc] "m" (*pcrc), - [my_p] "m" (consts->my_p[0]), - [bswap] "m" (*crc32_bswap_shuf) - : "eax" ); + : + : "eax", "cc" ); } else { commit a4d1595a2638db63ac4c73e722c8ba95fdd85ff7 Author: Jussi Kivilinna Date: Tue Jul 19 13:20:13 2016 +0300 rijndael-aesni: split assembly block to ease register pressure * cipher/rijndael-aesni.c (do_aesni_ctr_4): Use single register constraint for passing 'bige_addb' to assembly block; split first inline assembly block into two parts. -- Fixes compiling on i386 with GCC-4.8 and older. Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 97e0ad0..8b28b3a 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -794,6 +794,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4 } }; + const void *bige_addb = bige_addb_const; #define aesenc_xmm1_xmm0 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xc1\n\t" #define aesenc_xmm1_xmm2 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd1\n\t" #define aesenc_xmm1_xmm3 ".byte 0x66, 0x0f, 0x38, 0xdc, 0xd9\n\t" @@ -819,16 +820,15 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, "ja .Ladd32bit%=\n\t" "movdqa %%xmm5, %%xmm0\n\t" /* xmm0 := CTR (xmm5) */ - "movdqa %[addb_1], %%xmm2\n\t" /* xmm2 := be(1) */ - "movdqa %[addb_2], %%xmm3\n\t" /* xmm3 := be(2) */ - "movdqa %[addb_3], %%xmm4\n\t" /* xmm4 := be(3) */ - "movdqa %[addb_4], %%xmm5\n\t" /* xmm5 := be(4) */ + "movdqa 0*16(%[addb]), %%xmm2\n\t" /* xmm2 := be(1) */ + "movdqa 1*16(%[addb]), %%xmm3\n\t" /* xmm3 := be(2) */ + "movdqa 2*16(%[addb]), %%xmm4\n\t" /* xmm4 := be(3) */ + "movdqa 3*16(%[addb]), %%xmm5\n\t" /* xmm5 := be(4) */ "paddb %%xmm0, %%xmm2\n\t" /* xmm2 := be(1) + CTR (xmm0) */ "paddb %%xmm0, %%xmm3\n\t" /* xmm3 := be(2) + CTR (xmm0) */ "paddb %%xmm0, %%xmm4\n\t" /* xmm4 := be(3) + CTR (xmm0) */ "paddb %%xmm0, %%xmm5\n\t" /* xmm5 := be(4) + CTR (xmm0) */ "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ - "movl %[rounds], %%esi\n\t" "jmp .Lstore_ctr%=\n\t" ".Ladd32bit%=:\n\t" @@ -871,7 +871,6 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, ".Lno_carry%=:\n\t" "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ - "movl %[rounds], %%esi\n\t" "pshufb %%xmm6, %%xmm2\n\t" /* xmm2 := be(xmm2) */ "pshufb %%xmm6, %%xmm3\n\t" /* xmm3 := be(xmm3) */ @@ -880,8 +879,13 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, ".Lstore_ctr%=:\n\t" "movdqa %%xmm5, (%[ctr])\n\t" /* Update CTR (mem). */ + : + : [ctr] "r" (ctr), + [key] "r" (ctx->keyschenc), + [addb] "r" (bige_addb) + : "%esi", "cc", "memory"); - "pxor %%xmm1, %%xmm0\n\t" /* xmm0 ^= key[0] */ + asm volatile ("pxor %%xmm1, %%xmm0\n\t" /* xmm0 ^= key[0] */ "pxor %%xmm1, %%xmm2\n\t" /* xmm2 ^= key[0] */ "pxor %%xmm1, %%xmm3\n\t" /* xmm3 ^= key[0] */ "pxor %%xmm1, %%xmm4\n\t" /* xmm4 ^= key[0] */ @@ -931,7 +935,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenc_xmm1_xmm3 aesenc_xmm1_xmm4 "movdqa 0xa0(%[key]), %%xmm1\n\t" - "cmpl $10, %%esi\n\t" + "cmpl $10, %[rounds]\n\t" "jz .Lenclast%=\n\t" aesenc_xmm1_xmm0 aesenc_xmm1_xmm2 @@ -943,7 +947,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenc_xmm1_xmm3 aesenc_xmm1_xmm4 "movdqa 0xc0(%[key]), %%xmm1\n\t" - "cmpl $12, %%esi\n\t" + "cmpl $12, %[rounds]\n\t" "jz .Lenclast%=\n\t" aesenc_xmm1_xmm0 aesenc_xmm1_xmm2 @@ -962,14 +966,9 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenclast_xmm1_xmm3 aesenclast_xmm1_xmm4 : - : [ctr] "r" (ctr), - [key] "r" (ctx->keyschenc), - [rounds] "g" (ctx->rounds), - [addb_1] "m" (bige_addb_const[0][0]), - [addb_2] "m" (bige_addb_const[1][0]), - [addb_3] "m" (bige_addb_const[2][0]), - [addb_4] "m" (bige_addb_const[3][0]) - : "%esi", "cc", "memory"); + : [key] "r" (ctx->keyschenc), + [rounds] "r" (ctx->rounds) + : "cc", "memory"); asm volatile ("movdqu (%[src]), %%xmm1\n\t" /* Get block 1. */ "pxor %%xmm1, %%xmm0\n\t" /* EncCTR-1 ^= input */ ----------------------------------------------------------------------- Summary of changes: cipher/crc-intel-pclmul.c | 20 +++++++++++++------- cipher/rijndael-aesni.c | 33 ++++++++++++++++----------------- 2 files changed, 29 insertions(+), 24 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From ff at octo.it Wed Jul 27 14:09:00 2016 From: ff at octo.it (Florian Forster) Date: Wed, 27 Jul 2016 14:09:00 +0200 Subject: [PATCH] doc: Improve example gcry_control usage. Message-ID: <1469621341-28825-1-git-send-email-ff@octo.it> * doc/gcrypt.texi: Change example code to check return value of gcry_control(GCRYCTL_INIT_SECMEM, ?). --- doc/gcrypt.texi | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index c2c39ad..9486400 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -424,7 +424,10 @@ and freed memory, you need to initialize Libgcrypt this way: /* Allocate a pool of 16k secure memory. This make the secure memory available and also drops privileges where needed. */ - gcry_control (GCRYCTL_INIT_SECMEM, 16384, 0); + if (gcry_control (GCRYCTL_INIT_SECMEM, 16384, 0)) { + fputs ("initializing secure memory failed\n", stderr); + exit (EXIT_FAILURE); + } @anchor{sample-use-resume-secmem} /* It is now okay to let Libgcrypt complain when there was/is @@ -687,7 +690,8 @@ enabling the use of secure memory. It also drops all extra privileges the process has (i.e. if it is run as setuid (root)). If the argument @var{nbytes} is 0, secure memory will be disabled. The minimum amount of secure memory allocated is currently 16384 bytes; you may thus use a -value of 1 to request that default size. +value of 1 to request that default size. Returns zero on success and +non-zero on failure. @item GCRYCTL_TERM_SECMEM; Arguments: none This command zeroises the secure memory and destroys the handler. The -- 2.8.0.rc3.226.g39d4020