From cvs at cvs.gnupg.org Tue Aug 1 20:36:46 2017 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Tue, 01 Aug 2017 20:36:46 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.0-7-gcf1528e Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via cf1528e7f2761774d06ace0de48f39c96b52dc4f (commit) via 4a7aa30ae9f3ce798dd886c2f2d4164c43027748 (commit) from b7cd44335d9cde43be6f693dca6399ed0762649c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit cf1528e7f2761774d06ace0de48f39c96b52dc4f Author: Jussi Kivilinna Date: Sat Jul 29 14:34:23 2017 +0300 Fix return value type for _gcry_md_extract * src/gcrypt-int.h (_gcry_md_extract): Use gpg_err_code_t instead of gpg_error_t for internal function return type. -- GnuPG-bug-id: 3314 Signed-off-by: Jussi Kivilinna diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h index ddcafa5..ad719be 100644 --- a/src/gcrypt-int.h +++ b/src/gcrypt-int.h @@ -39,7 +39,7 @@ typedef struct mpi_ec_ctx_s *mpi_ec_t; /* Underscore prefixed internal versions of the public functions. - They return gpg_err_code and not gpg_error_t. Some macros also + They return gpg_err_code_t and not gpg_error_t. Some macros also need an underscore prefixed internal version. Note that the memory allocation functions and macros (xmalloc etc.) @@ -120,8 +120,8 @@ gpg_err_code_t _gcry_md_ctl (gcry_md_hd_t hd, int cmd, void *buffer, size_t buflen); void _gcry_md_write (gcry_md_hd_t hd, const void *buffer, size_t length); unsigned char *_gcry_md_read (gcry_md_hd_t hd, int algo); -gpg_error_t _gcry_md_extract (gcry_md_hd_t hd, int algo, void *buffer, - size_t length); +gpg_err_code_t _gcry_md_extract (gcry_md_hd_t hd, int algo, void *buffer, + size_t length); void _gcry_md_hash_buffer (int algo, void *digest, const void *buffer, size_t length); gpg_err_code_t _gcry_md_hash_buffers (int algo, unsigned int flags, commit 4a7aa30ae9f3ce798dd886c2f2d4164c43027748 Author: Jussi Kivilinna Date: Sat Jul 29 14:34:23 2017 +0300 Fix building AArch32 CE implementations when target is ARMv6 arch * cipher/cipher-gcm-armv8-aarch32-ce.S: Select ARMv8 architecure. * cipher/rijndael-armv8-aarch32-ce.S: Ditto. * cipher/sha1-armv8-aarch32-ce.S: Ditto. * cipher/sha256-armv8-aarch32-ce.S: Ditto. * configure.ac (gcry_cv_gcc_inline_asm_aarch32_crypto): Ditto. -- Raspbian distribution defaults to ARMv6 architecture thus 'rbit' instruction is not available with default compiler flags. Patch adds explicit architecture selection for ARMv8 to enable 'rbit' usage with ARMv8/AArch32-CE assembly implementations of SHA, GHASH and AES. Reported-by: Chris Horry Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-gcm-armv8-aarch32-ce.S b/cipher/cipher-gcm-armv8-aarch32-ce.S index b61a787..1de66a1 100644 --- a/cipher/cipher-gcm-armv8-aarch32-ce.S +++ b/cipher/cipher-gcm-armv8-aarch32-ce.S @@ -24,6 +24,7 @@ defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) .syntax unified +.arch armv8-a .fpu crypto-neon-fp-armv8 .arm diff --git a/cipher/rijndael-armv8-aarch32-ce.S b/cipher/rijndael-armv8-aarch32-ce.S index f375f67..5c8fa3c 100644 --- a/cipher/rijndael-armv8-aarch32-ce.S +++ b/cipher/rijndael-armv8-aarch32-ce.S @@ -24,6 +24,7 @@ defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) .syntax unified +.arch armv8-a .fpu crypto-neon-fp-armv8 .arm diff --git a/cipher/sha1-armv8-aarch32-ce.S b/cipher/sha1-armv8-aarch32-ce.S index b0bc5ff..bf2b233 100644 --- a/cipher/sha1-armv8-aarch32-ce.S +++ b/cipher/sha1-armv8-aarch32-ce.S @@ -24,6 +24,7 @@ defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA1) .syntax unified +.arch armv8-a .fpu crypto-neon-fp-armv8 .arm diff --git a/cipher/sha256-armv8-aarch32-ce.S b/cipher/sha256-armv8-aarch32-ce.S index 2041a23..2b17ab1 100644 --- a/cipher/sha256-armv8-aarch32-ce.S +++ b/cipher/sha256-armv8-aarch32-ce.S @@ -24,6 +24,7 @@ defined(HAVE_GCC_INLINE_ASM_AARCH32_CRYPTO) && defined(USE_SHA256) .syntax unified +.arch armv8-a .fpu crypto-neon-fp-armv8 .arm diff --git a/configure.ac b/configure.ac index 27faa7f..66e7cd6 100644 --- a/configure.ac +++ b/configure.ac @@ -1619,6 +1619,7 @@ AC_CACHE_CHECK([whether GCC inline assembler supports AArch32 Crypto Extension i AC_COMPILE_IFELSE([AC_LANG_SOURCE( [[__asm__( ".syntax unified\n\t" + ".arch armv8-a\n\t" ".arm\n\t" ".fpu crypto-neon-fp-armv8\n\t" ----------------------------------------------------------------------- Summary of changes: cipher/cipher-gcm-armv8-aarch32-ce.S | 1 + cipher/rijndael-armv8-aarch32-ce.S | 1 + cipher/sha1-armv8-aarch32-ce.S | 1 + cipher/sha256-armv8-aarch32-ce.S | 1 + configure.ac | 1 + src/gcrypt-int.h | 6 +++--- 6 files changed, 8 insertions(+), 3 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Tue Aug 1 21:09:12 2017 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Tue, 01 Aug 2017 21:09:12 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.0-8-g94a92a3 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 94a92a3db909aef0ebcc009c2d7f5a2663e99004 (commit) from cf1528e7f2761774d06ace0de48f39c96b52dc4f (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 94a92a3db909aef0ebcc009c2d7f5a2663e99004 Author: Jussi Kivilinna Date: Tue Aug 1 21:05:31 2017 +0300 Add script to run basic tests with all supported HWF combinations * tests/basic_all_hwfeature_combinations.sh: New. * tests/Makefile.am: Add basic_all_hwfeature_combinations.sh. -- Signed-off-by: Jussi Kivilinna diff --git a/tests/Makefile.am b/tests/Makefile.am index 1744ea7..eee24fa 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -60,7 +60,7 @@ EXTRA_DIST = README rsa-16k.key cavs_tests.sh cavs_driver.pl \ t-ed25519.inp stopwatch.h hashtest-256g.in \ sha3-224.h sha3-256.h sha3-384.h sha3-512.h \ blake2b.h blake2s.h \ - basic-disable-all-hwf.in + basic-disable-all-hwf.in basic_all_hwfeature_combinations.sh LDADD = $(standard_ldadd) $(GPG_ERROR_LIBS) t_lock_LDADD = $(standard_ldadd) $(GPG_ERROR_MT_LIBS) diff --git a/tests/basic_all_hwfeature_combinations.sh b/tests/basic_all_hwfeature_combinations.sh new file mode 100755 index 0000000..8ec97bf --- /dev/null +++ b/tests/basic_all_hwfeature_combinations.sh @@ -0,0 +1,111 @@ +#!/bin/bash +# Run basic tests with all HW feature combinations +# Copyright 2017 Jussi Kivilinna +# +# This file is free software; as a special exception the author gives +# unlimited permission to copy and/or distribute it, with or without +# modifications, as long as this notice is preserved. +# +# This file is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY, to the extent permitted by law; without even the +# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +# + +# Use BINEXT to set executable extension +# For example for Windows executables: BINEXT=.exe +if [ "x$BINEXT" != "x" ] && [ -e "tests/version$BINEXT" ]; then + binext="$BINEXT" +else + binext="" +fi + +# Use BINPRE to set executable prefix +# For example to run Windows executable with WINE: BINPRE="wine " +if [ "x$BINPRE" != "x" ]; then + binpre="$BINPRE" +else + binpre="" +fi + +# Use NJOBS to define number of parallel tasks +if [ "x$NJOBS" != "x" ]; then + njobs="$NJOBS" +else + # default to cpu count + ncpus=$(nproc --all) + if [ "x at cpus" != "x" ]; then + njobs=$ncpus + else + # could not get cpu count, use 4 parallel tasks instead + njobs=4 + fi +fi + +get_supported_hwfeatures() { + $binpre "tests/version$binext" 2>&1 | \ + grep "hwflist" | \ + sed -e 's/hwflist://' -e 's/:/ /g' -e 's/\x0d/\x0a/g' +} + +hwfs=($(get_supported_hwfeatures)) +retcodes=() +optslist=() +echo "Total HW-feature combinations: $((1<<${#hwfs[@]}))" +for ((cbits=0; cbits < (1<<${#hwfs[@]}); cbits++)); do + for ((mask=0; mask < ${#hwfs[@]}; mask++)); do + match=$(((1< This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via df1e221b3012e96bbffbc7d5fd70836a9ae1cc19 (commit) via 21d0f068a721c022f955084c28304934fd198c5e (commit) via eea36574f37830a6a80b4fad884825e815b2912f (commit) from 94a92a3db909aef0ebcc009c2d7f5a2663e99004 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit df1e221b3012e96bbffbc7d5fd70836a9ae1cc19 Author: Werner Koch Date: Wed Aug 2 18:45:51 2017 +0200 tests: Fix a printf glitch for a Windows test. * tests/t-convert.c (check_formats): Fix print format glitch on Windows. * tests/t-ed25519.c: Typo fix. Signed-off-by: Werner Koch diff --git a/tests/t-convert.c b/tests/t-convert.c index ec56677..121039c 100644 --- a/tests/t-convert.c +++ b/tests/t-convert.c @@ -435,7 +435,8 @@ check_formats (void) if (gcry_mpi_cmp (a, b) || data[idx].a.stdlen != buflen) { fail ("error scanning value %d from %s: %s (%lu)\n", - data[idx].value, "STD", "wrong result", buflen); + data[idx].value, "STD", "wrong result", + (long unsigned int)buflen); showmpi ("expected:", a); showmpi (" got:", b); } @@ -452,7 +453,8 @@ check_formats (void) if (gcry_mpi_cmp (a, b) || data[idx].a.sshlen != buflen) { fail ("error scanning value %d from %s: %s (%lu)\n", - data[idx].value, "SSH", "wrong result", buflen); + data[idx].value, "SSH", "wrong result", + (long unsigned int)buflen); showmpi ("expected:", a); showmpi (" got:", b); } @@ -471,7 +473,8 @@ check_formats (void) if (gcry_mpi_cmp (a, b) || data[idx].a.usglen != buflen) { fail ("error scanning value %d from %s: %s (%lu)\n", - data[idx].value, "USG", "wrong result", buflen); + data[idx].value, "USG", "wrong result", + (long unsigned int)buflen); showmpi ("expected:", a); showmpi (" got:", b); } @@ -492,7 +495,8 @@ check_formats (void) if (gcry_mpi_cmp (a, b) || data[idx].a.pgplen != buflen) { fail ("error scanning value %d from %s: %s (%lu)\n", - data[idx].value, "PGP", "wrong result", buflen); + data[idx].value, "PGP", "wrong result", + (long unsigned int)buflen); showmpi ("expected:", a); showmpi (" got:", b); } diff --git a/tests/t-ed25519.c b/tests/t-ed25519.c index 2f59a89..73628a8 100644 --- a/tests/t-ed25519.c +++ b/tests/t-ed25519.c @@ -74,7 +74,7 @@ show_sexp (const char *prefix, gcry_sexp_t a) /* Prepend FNAME with the srcdir environment variable's value and - retrun an allocated filename. */ + * return an allocated filename. */ char * prepend_srcdir (const char *fname) { commit 21d0f068a721c022f955084c28304934fd198c5e Author: Werner Koch Date: Wed Aug 2 18:44:14 2017 +0200 tests: Add benchmarking option to tests/random. * tests/random.c: Always include unistd.h. (prepend_srcdir): New. (run_benchmark): New. (main): Add options --benchmark and --with-seed-file. Print whetehr JENT has been used. * tests/t-common.h (split_fields_colon): New. Taken from GnuPG. License of that code changed to LGPLv2.1. -- Running these tests on a KVM hosted Windows Vista using a statically compiled tests/random and modifying the extra random added in read_seed_file gave these results: | Seed | Jent | Bytes | Bits | Time (ms) | |------+------+-------+------+------------| | yes | yes | 32 | 256 | 46 .. 62 | | yes | yes | 64 | 512 | 62 .. 78 | | yes | yes | 128 | 1024 | 78 .. 93 | | yes | yes | 256 | 2048 | 124 .. 156 | | yes | yes | 384 | 3072 | 171 .. 202 | | yes | yes | 512 | 4096 | 234 .. 249 | | yes | no | 32 | 256 | 15 .. 31 | | yes | no | 64 | 512 | 15 .. 31 | | yes | no | 128 | 1024 | 15 | | no | yes | - | - | 78 .. 93 | | no | no | - | - | 15 | Seed: Whether a seed file is used. Jent: Whether JENT was working. Bytes: The number bytes mixed into the pool after reading the seed file. Bits: 8 * Bytes Time: Measured time including the time to read the seed file. Mimimun and maximum values are given. Granularity of the used timer is quite large. Signed-off-by: Werner Koch diff --git a/tests/random.c b/tests/random.c index 8a85429..2f48323 100644 --- a/tests/random.c +++ b/tests/random.c @@ -24,18 +24,41 @@ #include #include #include +#include #ifndef HAVE_W32_SYSTEM # include -# include # include #endif +#include "stopwatch.h" + + #define PGM "random" +#define NEED_EXTRA_TEST_SUPPORT 1 #include "t-common.h" static int with_progress; +/* Prepend FNAME with the srcdir environment variable's value and + * return an allocated filename. */ +static char * +prepend_srcdir (const char *fname) +{ + static const char *srcdir; + char *result; + + if (!srcdir && !(srcdir = getenv ("srcdir"))) + srcdir = "."; + + result = xmalloc (strlen (srcdir) + 1 + strlen (fname) + 1); + strcpy (result, srcdir); + strcat (result, "/"); + strcat (result, fname); + return result; +} + + static void print_hex (const char *text, const void *buf, size_t n) { @@ -537,12 +560,43 @@ run_all_rng_tests (const char *program) free (cmdline); } + +static void +run_benchmark (void) +{ + char rndbuf[32]; + int i, j; + + if (verbose) + info ("benchmarking GCRY_STRONG_RANDOM (/dev/urandom)\n"); + + start_timer (); + gcry_randomize (rndbuf, sizeof rndbuf, GCRY_STRONG_RANDOM); + stop_timer (); + + info ("getting first 256 bits: %s", elapsed_time (1)); + + for (j=0; j < 5; j++) + { + start_timer (); + for (i=0; i < 100; i++) + gcry_randomize (rndbuf, sizeof rndbuf, GCRY_STRONG_RANDOM); + stop_timer (); + + info ("100 calls of 256 bits each: %s", elapsed_time (100)); + } + +} + + int main (int argc, char **argv) { int last_argc = -1; int early_rng = 0; int in_recursion = 0; + int benchmark = 0; + int with_seed_file = 0; const char *program = NULL; if (argc) @@ -586,16 +640,27 @@ main (int argc, char **argv) in_recursion = 1; argc--; argv++; } + else if (!strcmp (*argv, "--benchmark")) + { + benchmark = 1; + argc--; argv++; + } else if (!strcmp (*argv, "--early-rng-check")) { early_rng = 1; argc--; argv++; } + else if (!strcmp (*argv, "--with-seed-file")) + { + with_seed_file = 1; + argc--; argv++; + } else if (!strcmp (*argv, "--prefer-standard-rng")) { /* This is anyway the default, but we may want to use it for debugging. */ - xgcry_control (GCRYCTL_SET_PREFERRED_RNG_TYPE, GCRY_RNG_TYPE_STANDARD); + xgcry_control (GCRYCTL_SET_PREFERRED_RNG_TYPE, + GCRY_RNG_TYPE_STANDARD); argc--; argv++; } else if (!strcmp (*argv, "--prefer-fips-rng")) @@ -608,12 +673,27 @@ main (int argc, char **argv) xgcry_control (GCRYCTL_SET_PREFERRED_RNG_TYPE, GCRY_RNG_TYPE_SYSTEM); argc--; argv++; } + else if (!strcmp (*argv, "--disable-hwf")) + { + argc--; + argv++; + if (argc) + { + if (gcry_control (GCRYCTL_DISABLE_HWF, *argv, NULL)) + die ("unknown hardware feature `%s'\n", *argv); + argc--; + argv++; + } + } } #ifndef HAVE_W32_SYSTEM signal (SIGPIPE, SIG_IGN); #endif + if (benchmark && !verbose) + verbose = 1; + if (early_rng) { /* Don't switch RNG in fips mode. */ @@ -628,11 +708,25 @@ main (int argc, char **argv) if (with_progress) gcry_set_progress_handler (progress_cb, NULL); + if (with_seed_file) + { + char *fname = prepend_srcdir ("random.seed"); + + if (access (fname, F_OK)) + info ("random seed file '%s' not found\n", fname); + gcry_control (GCRYCTL_SET_RANDOM_SEED_FILE, fname); + xfree (fname); + } + xgcry_control (GCRYCTL_INITIALIZATION_FINISHED, 0); if (debug) xgcry_control (GCRYCTL_SET_DEBUG_FLAGS, 1u, 0); - if (!in_recursion) + if (benchmark) + { + run_benchmark (); + } + else if (!in_recursion) { check_forking (); check_nonce_forking (); @@ -640,16 +734,31 @@ main (int argc, char **argv) } /* For now we do not run the drgb_reinit check from "make check" due to its high requirement for entropy. */ - if (!getenv ("GCRYPT_IN_REGRESSION_TEST")) + if (!benchmark && !getenv ("GCRYPT_IN_REGRESSION_TEST")) check_drbg_reinit (); /* Don't switch RNG in fips mode. */ - if (!gcry_fips_mode_active()) + if (!benchmark && !gcry_fips_mode_active()) check_rng_type_switching (); - if (!in_recursion) + if (!in_recursion && !benchmark) run_all_rng_tests (program); + /* Print this info last so that it does not influence the + * initialization and thus the benchmarking. */ + if (!in_recursion && verbose) + { + char *buf; + char *fields[5]; + + buf = gcry_get_config (0, "rng-type"); + if (buf + && split_fields_colon (buf, fields, DIM (fields)) >= 5 + && atoi (fields[4]) > 0) + info ("The JENT RNG was active\n"); + gcry_free (buf); + } + if (debug) xgcry_control (GCRYCTL_DUMP_RANDOM_STATS); diff --git a/tests/t-common.h b/tests/t-common.h index 8466ac1..2040f09 100644 --- a/tests/t-common.h +++ b/tests/t-common.h @@ -158,3 +158,41 @@ info (const char *format, ...) die ("line %d: gcry_control (%s) failed: %s", \ __LINE__, #cmd, gcry_strerror (err__)); \ } while (0) + + +/* Split a string into colon delimited fields A pointer to each field + * is stored in ARRAY. Stop splitting at ARRAYSIZE fields. The + * function modifies STRING. The number of parsed fields is returned. + * Note that leading and trailing spaces are not removed from the fields. + * Example: + * + * char *fields[2]; + * if (split_fields (string, fields, DIM (fields)) < 2) + * return // Not enough args. + * foo (fields[0]); + * foo (fields[1]); + */ +#ifdef NEED_EXTRA_TEST_SUPPORT +static int +split_fields_colon (char *string, char **array, int arraysize) +{ + int n = 0; + char *p, *pend; + + p = string; + do + { + if (n == arraysize) + break; + array[n++] = p; + pend = strchr (p, ':'); + if (!pend) + break; + *pend++ = 0; + p = pend; + } + while (*p); + + return n; +} +#endif /*NEED_EXTRA_TEST_SUPPORT*/ commit eea36574f37830a6a80b4fad884825e815b2912f Author: Werner Koch Date: Fri Jul 28 15:31:03 2017 +0200 random: Add more bytes to the pool in addition to the seed file. * random/random-csprng.c (read_seed_file): Read 128 or 32 butes depending on whether we have the Jitter RNG. -- These are actually 3 changes: - We use GCRY_STRONG_RANDOM instead GCRY_WEAK_RANDOM, which we used for historical reasons. However the entropy gather modules handle both identical; that is reading from /dev/urandom. Only GCRY_VERY_STRONG_RANDOM would use a blocking read from /dev/random. - We increase the number of extra buts from 128 or 256. - If the Jitter RNG is available we assume that a fast entropy source is available and thus we read 4 times more entropy (1024 bits). Note that on Windows GnuPG tests in DE-VS mode that the Jitter RNG is available and properly working. Thus we will add 1024 bits in addition to the state read from the seed file. Signed-off-by: Werner Koch diff --git a/random/random-csprng.c b/random/random-csprng.c index 5a771c2..650c438 100644 --- a/random/random-csprng.c +++ b/random/random-csprng.c @@ -717,12 +717,12 @@ lock_seed_file (int fd, const char *fname, int for_write) out the same pool and then race for updating it (the last update overwrites earlier updates). They will differentiate only by the weak entropy that is added in read_seed_file based on the PID and - clock, and up to 16 bytes of weak random non-blockingly. The + clock, and up to 32 bytes from a non-blocking entropy source. The consequence is that the output of these different instances is correlated to some extent. In the perfect scenario, the attacker can control (or at least guess) the PID and clock of the application, and drain the system's entropy pool to reduce the "up - to 16 bytes" above to 0. Then the dependencies of the initial + to 32 bytes" above to 0. Then the dependencies of the initial states of the pools are completely known. */ static int read_seed_file (void) @@ -814,12 +814,16 @@ read_seed_file (void) add_randomness( &x, sizeof(x), RANDOM_ORIGIN_INIT ); } - /* And read a few bytes from our entropy source. By using a level - * of 0 this will not block and might not return anything with some - * entropy drivers, however the rndlinux driver will use - * /dev/urandom and return some stuff - Do not read too much as we - * want to be friendly to the scare system entropy resource. */ - read_random_source ( RANDOM_ORIGIN_INIT, 16, GCRY_WEAK_RANDOM ); + /* And read a few bytes from our entropy source. If we have the + * Jitter RNG we can fast get a lot of entropy. Thus we read 1024 + * bits from that source. + * + * Without the Jitter RNG we keep the old method of reading only a + * few bytes usually from /dev/urandom which won't block. */ + if (_gcry_rndjent_get_version (NULL)) + read_random_source (RANDOM_ORIGIN_INIT, 128, GCRY_STRONG_RANDOM); + else + read_random_source (RANDOM_ORIGIN_INIT, 32, GCRY_STRONG_RANDOM); allow_seed_file_update = 1; return 1; ----------------------------------------------------------------------- Summary of changes: random/random-csprng.c | 20 ++++---- tests/random.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++--- tests/t-common.h | 38 ++++++++++++++++ tests/t-convert.c | 12 +++-- tests/t-ed25519.c | 2 +- 5 files changed, 174 insertions(+), 19 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Thu Aug 3 19:46:48 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 03 Aug 2017 20:46:48 +0300 Subject: [PATCH] AES-NI improvements for AMD64 Message-ID: <150178240838.17701.10166156518356203521.stgit@localhost.localdomain> * cipher/rijndael-aesni.c [__x86_64__] (aesni_prepare_7_15_variable) (aesni_prepare_7_15, aesni_cleanup_7_15, do_aesni_enc_vec8) (do_aesni_dec_vec8, do_aesni_ctr_8): New. (_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec) (_gcry_aes_aesni_cbc_dec, aesni_ocb_enc, aesni_ocb_dec) (_gcry_aes_aesni_ocb_auth) [__x86_64__]: Add 8 parallel blocks processing. -- Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo, no HT): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 0.175 ns/B 5448.7 MiB/s 0.700 c/B CFB dec | 0.174 ns/B 5466.2 MiB/s 0.698 c/B CTR enc | 0.182 ns/B 5226.0 MiB/s 0.730 c/B OCB enc | 0.194 ns/B 4913.9 MiB/s 0.776 c/B OCB dec | 0.200 ns/B 4769.2 MiB/s 0.800 c/B OCB auth | 0.172 ns/B 5545.0 MiB/s 0.688 c/B After (1.08x to 1.14x faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 0.157 ns/B 6075.0 MiB/s 0.628 c/B CFB dec | 0.158 ns/B 6045.8 MiB/s 0.631 c/B CTR enc | 0.160 ns/B 5977.1 MiB/s 0.638 c/B OCB enc | 0.175 ns/B 5446.7 MiB/s 0.700 c/B OCB dec | 0.185 ns/B 5152.5 MiB/s 0.740 c/B OCB auth | 0.156 ns/B 6095.5 MiB/s 0.626 c/B Signed-off-by: Jussi Kivilinna --- cipher/rijndael-aesni.c | 1224 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1218 insertions(+), 6 deletions(-) diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 735e5cdd..b8aad9d1 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -55,6 +55,7 @@ typedef struct u128_s #ifdef __WIN64__ /* XMM6-XMM15 are callee-saved registers on WIN64. */ # define aesni_prepare_2_6_variable char win64tmp[16] +# define aesni_prepare_7_15_variable char win64tmp7_15[16 * 9] # define aesni_prepare() do { } while (0) # define aesni_prepare_2_6() \ do { asm volatile ("movdqu %%xmm6, %0\n\t" \ @@ -62,6 +63,20 @@ typedef struct u128_s : \ : "memory"); \ } while (0) +# define aesni_prepare_7_15() \ + do { asm volatile ("movdqu %%xmm7, 0*16(%0)\n\t" \ + "movdqu %%xmm8, 1*16(%0)\n\t" \ + "movdqu %%xmm9, 2*16(%0)\n\t" \ + "movdqu %%xmm10, 3*16(%0)\n\t" \ + "movdqu %%xmm11, 4*16(%0)\n\t" \ + "movdqu %%xmm12, 5*16(%0)\n\t" \ + "movdqu %%xmm13, 6*16(%0)\n\t" \ + "movdqu %%xmm14, 7*16(%0)\n\t" \ + "movdqu %%xmm15, 8*16(%0)\n\t" \ + : \ + : "r" (win64tmp7_15) \ + : "memory"); \ + } while (0) # define aesni_cleanup() \ do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ "pxor %%xmm1, %%xmm1\n" :: ); \ @@ -76,6 +91,20 @@ typedef struct u128_s : "m" (*win64tmp) \ : "memory"); \ } while (0) +# define aesni_cleanup_7_15() \ + do { asm volatile ("movdqu 0*16(%0), %%xmm7\n\t" \ + "movdqu 1*16(%0), %%xmm8\n\t" \ + "movdqu 2*16(%0), %%xmm9\n\t" \ + "movdqu 3*16(%0), %%xmm10\n\t" \ + "movdqu 4*16(%0), %%xmm11\n\t" \ + "movdqu 5*16(%0), %%xmm12\n\t" \ + "movdqu 6*16(%0), %%xmm13\n\t" \ + "movdqu 7*16(%0), %%xmm14\n\t" \ + "movdqu 8*16(%0), %%xmm15\n\t" \ + : \ + : "r" (win64tmp7_15) \ + : "memory"); \ + } while (0) #else # define aesni_prepare_2_6_variable # define aesni_prepare() do { } while (0) @@ -91,6 +120,21 @@ typedef struct u128_s "pxor %%xmm5, %%xmm5\n" \ "pxor %%xmm6, %%xmm6\n":: ); \ } while (0) +# ifdef __x86_64__ +# define aesni_prepare_7_15_variable +# define aesni_prepare_7_15() do { } while (0) +# define aesni_cleanup_7_15() \ + do { asm volatile ("pxor %%xmm7, %%xmm7\n\t" \ + "pxor %%xmm8, %%xmm8\n" \ + "pxor %%xmm9, %%xmm9\n" \ + "pxor %%xmm10, %%xmm10\n" \ + "pxor %%xmm11, %%xmm11\n" \ + "pxor %%xmm12, %%xmm12\n" \ + "pxor %%xmm13, %%xmm13\n" \ + "pxor %%xmm14, %%xmm14\n" \ + "pxor %%xmm15, %%xmm15\n":: ); \ + } while (0) +# endif #endif void @@ -704,6 +748,314 @@ do_aesni_dec_vec4 (const RIJNDAEL_context *ctx) } +#ifdef __x86_64__ + +/* Encrypt eigth blocks using the Intel AES-NI instructions. Blocks are input + * and output through SSE registers xmm1 to xmm4 and xmm8 to xmm11. */ +static inline void +do_aesni_enc_vec8 (const RIJNDAEL_context *ctx) +{ + asm volatile ("movdqa (%[key]), %%xmm0\n\t" + "pxor %%xmm0, %%xmm1\n\t" /* xmm1 ^= key[0] */ + "pxor %%xmm0, %%xmm2\n\t" /* xmm2 ^= key[0] */ + "pxor %%xmm0, %%xmm3\n\t" /* xmm3 ^= key[0] */ + "pxor %%xmm0, %%xmm4\n\t" /* xmm4 ^= key[0] */ + "pxor %%xmm0, %%xmm8\n\t" /* xmm8 ^= key[0] */ + "pxor %%xmm0, %%xmm9\n\t" /* xmm9 ^= key[0] */ + "pxor %%xmm0, %%xmm10\n\t" /* xmm10 ^= key[0] */ + "pxor %%xmm0, %%xmm11\n\t" /* xmm11 ^= key[0] */ + "movdqa 0x10(%[key]), %%xmm0\n\t" + "cmpl $12, %[rounds]\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x20(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x30(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x40(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x50(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x60(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x70(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x80(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0x90(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0xa0(%[key]), %%xmm0\n\t" + "jb .Ldeclast%=\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0xb0(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0xc0(%[key]), %%xmm0\n\t" + "je .Ldeclast%=\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0xd0(%[key]), %%xmm0\n\t" + "aesenc %%xmm0, %%xmm1\n\t" + "aesenc %%xmm0, %%xmm2\n\t" + "aesenc %%xmm0, %%xmm3\n\t" + "aesenc %%xmm0, %%xmm4\n\t" + "aesenc %%xmm0, %%xmm8\n\t" + "aesenc %%xmm0, %%xmm9\n\t" + "aesenc %%xmm0, %%xmm10\n\t" + "aesenc %%xmm0, %%xmm11\n\t" + "movdqa 0xe0(%[key]), %%xmm0\n" + + ".Ldeclast%=:\n\t" + "aesenclast %%xmm0, %%xmm1\n\t" + "aesenclast %%xmm0, %%xmm2\n\t" + "aesenclast %%xmm0, %%xmm3\n\t" + "aesenclast %%xmm0, %%xmm4\n\t" + "aesenclast %%xmm0, %%xmm8\n\t" + "aesenclast %%xmm0, %%xmm9\n\t" + "aesenclast %%xmm0, %%xmm10\n\t" + "aesenclast %%xmm0, %%xmm11\n\t" + : /* no output */ + : [key] "r" (ctx->keyschenc), + [rounds] "r" (ctx->rounds) + : "cc", "memory"); +} + + +/* Decrypt eigth blocks using the Intel AES-NI instructions. Blocks are input + * and output through SSE registers xmm1 to xmm4 and xmm8 to xmm11. */ +static inline void +do_aesni_dec_vec8 (const RIJNDAEL_context *ctx) +{ + asm volatile ("movdqa (%[key]), %%xmm0\n\t" + "pxor %%xmm0, %%xmm1\n\t" /* xmm1 ^= key[0] */ + "pxor %%xmm0, %%xmm2\n\t" /* xmm2 ^= key[0] */ + "pxor %%xmm0, %%xmm3\n\t" /* xmm3 ^= key[0] */ + "pxor %%xmm0, %%xmm4\n\t" /* xmm4 ^= key[0] */ + "pxor %%xmm0, %%xmm8\n\t" /* xmm8 ^= key[0] */ + "pxor %%xmm0, %%xmm9\n\t" /* xmm9 ^= key[0] */ + "pxor %%xmm0, %%xmm10\n\t" /* xmm10 ^= key[0] */ + "pxor %%xmm0, %%xmm11\n\t" /* xmm11 ^= key[0] */ + "movdqa 0x10(%[key]), %%xmm0\n\t" + "cmpl $12, %[rounds]\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x20(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x30(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x40(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x50(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x60(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x70(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x80(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0x90(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0xa0(%[key]), %%xmm0\n\t" + "jb .Ldeclast%=\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0xb0(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0xc0(%[key]), %%xmm0\n\t" + "je .Ldeclast%=\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0xd0(%[key]), %%xmm0\n\t" + "aesdec %%xmm0, %%xmm1\n\t" + "aesdec %%xmm0, %%xmm2\n\t" + "aesdec %%xmm0, %%xmm3\n\t" + "aesdec %%xmm0, %%xmm4\n\t" + "aesdec %%xmm0, %%xmm8\n\t" + "aesdec %%xmm0, %%xmm9\n\t" + "aesdec %%xmm0, %%xmm10\n\t" + "aesdec %%xmm0, %%xmm11\n\t" + "movdqa 0xe0(%[key]), %%xmm0\n" + + ".Ldeclast%=:\n\t" + "aesdeclast %%xmm0, %%xmm1\n\t" + "aesdeclast %%xmm0, %%xmm2\n\t" + "aesdeclast %%xmm0, %%xmm3\n\t" + "aesdeclast %%xmm0, %%xmm4\n\t" + "aesdeclast %%xmm0, %%xmm8\n\t" + "aesdeclast %%xmm0, %%xmm9\n\t" + "aesdeclast %%xmm0, %%xmm10\n\t" + "aesdeclast %%xmm0, %%xmm11\n\t" + : /* no output */ + : [key] "r" (ctx->keyschdec), + [rounds] "r" (ctx->rounds) + : "cc", "memory"); +} + +#endif /* __x86_64__ */ + + /* Perform a CTR encryption round using the counter CTR and the input block A. Write the result to the output block B and update CTR. CTR needs to be a 16 byte aligned little-endian value. */ @@ -808,7 +1160,7 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, #define aesenclast_xmm1_xmm4 ".byte 0x66, 0x0f, 0x38, 0xdd, 0xe1\n\t" /* Register usage: - esi keyschedule + [key] keyschedule xmm0 CTR-0 xmm1 temp / round key xmm2 CTR-1 @@ -1003,6 +1355,327 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, } +#ifdef __x86_64__ + +/* Eigth blocks at a time variant of do_aesni_ctr. */ +static void +do_aesni_ctr_8 (const RIJNDAEL_context *ctx, + unsigned char *ctr, unsigned char *b, const unsigned char *a) +{ + static const byte bige_addb_const[8][16] __attribute__ ((aligned (16))) = + { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8 } + }; + const void *bige_addb = bige_addb_const; + + /* Register usage: + [key] keyschedule + xmm0 CTR-0 + xmm1 temp / round key + xmm2 CTR-1 + xmm3 CTR-2 + xmm4 CTR-3 + xmm5 copy of *ctr + xmm6 endian swapping mask + xmm8 CTR-4 + xmm9 CTR-5 + xmm10 CTR-6 + xmm11 CTR-7 + xmm12 temp + xmm13 temp + xmm14 temp + xmm15 temp + */ + + asm volatile (/* detect if 8-bit carry handling is needed */ + "cmpb $0xf7, 15(%[ctr])\n\t" + "ja .Ladd32bit%=\n\t" + + "movdqa %%xmm5, %%xmm0\n\t" /* xmm0 := CTR (xmm5) */ + "movdqa 0*16(%[addb]), %%xmm2\n\t" /* xmm2 := be(1) */ + "movdqa 1*16(%[addb]), %%xmm3\n\t" /* xmm3 := be(2) */ + "movdqa 2*16(%[addb]), %%xmm4\n\t" /* xmm4 := be(3) */ + "movdqa 3*16(%[addb]), %%xmm8\n\t" /* xmm8 := be(4) */ + "movdqa 4*16(%[addb]), %%xmm9\n\t" /* xmm9 := be(5) */ + "movdqa 5*16(%[addb]), %%xmm10\n\t" /* xmm10 := be(6) */ + "movdqa 6*16(%[addb]), %%xmm11\n\t" /* xmm11 := be(7) */ + "movdqa 7*16(%[addb]), %%xmm5\n\t" /* xmm5 := be(8) */ + "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ + "paddb %%xmm0, %%xmm2\n\t" /* xmm2 := be(1) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm3\n\t" /* xmm3 := be(2) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm4\n\t" /* xmm4 := be(3) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm8\n\t" /* xmm8 := be(4) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm9\n\t" /* xmm9 := be(5) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm10\n\t" /* xmm10 := be(6) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm11\n\t" /* xmm11 := be(7) + CTR (xmm0) */ + "paddb %%xmm0, %%xmm5\n\t" /* xmm5 := be(8) + CTR (xmm0) */ + "jmp .Lstore_ctr%=\n\t" + + ".Ladd32bit%=:\n\t" + "movdqa %%xmm5, %%xmm0\n\t" /* xmm0, xmm2 := CTR (xmm5) */ + "movdqa %%xmm0, %%xmm2\n\t" + "pcmpeqd %%xmm1, %%xmm1\n\t" + "psrldq $8, %%xmm1\n\t" /* xmm1 = -1 */ + + "pshufb %%xmm6, %%xmm2\n\t" /* xmm2 := le(xmm2) */ + "psubq %%xmm1, %%xmm2\n\t" /* xmm2++ */ + "movdqa %%xmm2, %%xmm3\n\t" /* xmm3 := xmm2 */ + "psubq %%xmm1, %%xmm3\n\t" /* xmm3++ */ + "movdqa %%xmm3, %%xmm4\n\t" /* xmm4 := xmm3 */ + "psubq %%xmm1, %%xmm4\n\t" /* xmm4++ */ + "movdqa %%xmm4, %%xmm8\n\t" /* xmm8 := xmm4 */ + "psubq %%xmm1, %%xmm8\n\t" /* xmm8++ */ + "movdqa %%xmm8, %%xmm9\n\t" /* xmm9 := xmm8 */ + "psubq %%xmm1, %%xmm9\n\t" /* xmm9++ */ + "movdqa %%xmm9, %%xmm10\n\t" /* xmm10 := xmm9 */ + "psubq %%xmm1, %%xmm10\n\t" /* xmm10++ */ + "movdqa %%xmm10, %%xmm11\n\t" /* xmm11 := xmm10 */ + "psubq %%xmm1, %%xmm11\n\t" /* xmm11++ */ + "movdqa %%xmm11, %%xmm5\n\t" /* xmm5 := xmm11 */ + "psubq %%xmm1, %%xmm5\n\t" /* xmm5++ */ + + /* detect if 64-bit carry handling is needed */ + "cmpl $0xffffffff, 8(%[ctr])\n\t" + "jne .Lno_carry%=\n\t" + "movl 12(%[ctr]), %%esi\n\t" + "bswapl %%esi\n\t" + "cmpl $0xfffffff8, %%esi\n\t" + "jb .Lno_carry%=\n\t" /* no carry */ + + "pslldq $8, %%xmm1\n\t" /* move lower 64-bit to high */ + "je .Lcarry_xmm5%=\n\t" /* esi == 0xfffffff8 */ + "cmpl $0xfffffffa, %%esi\n\t" + "jb .Lcarry_xmm11%=\n\t" /* esi == 0xfffffff9 */ + "je .Lcarry_xmm10%=\n\t" /* esi == 0xfffffffa */ + "cmpl $0xfffffffc, %%esi\n\t" + "jb .Lcarry_xmm9%=\n\t" /* esi == 0xfffffffb */ + "je .Lcarry_xmm8%=\n\t" /* esi == 0xfffffffc */ + "cmpl $0xfffffffe, %%esi\n\t" + "jb .Lcarry_xmm4%=\n\t" /* esi == 0xfffffffd */ + "je .Lcarry_xmm3%=\n\t" /* esi == 0xfffffffe */ + /* esi == 0xffffffff */ + + "psubq %%xmm1, %%xmm2\n\t" + ".Lcarry_xmm3%=:\n\t" + "psubq %%xmm1, %%xmm3\n\t" + ".Lcarry_xmm4%=:\n\t" + "psubq %%xmm1, %%xmm4\n\t" + ".Lcarry_xmm8%=:\n\t" + "psubq %%xmm1, %%xmm8\n\t" + ".Lcarry_xmm9%=:\n\t" + "psubq %%xmm1, %%xmm9\n\t" + ".Lcarry_xmm10%=:\n\t" + "psubq %%xmm1, %%xmm10\n\t" + ".Lcarry_xmm11%=:\n\t" + "psubq %%xmm1, %%xmm11\n\t" + ".Lcarry_xmm5%=:\n\t" + "psubq %%xmm1, %%xmm5\n\t" + + ".Lno_carry%=:\n\t" + "movdqa (%[key]), %%xmm1\n\t" /* xmm1 := key[0] */ + + "pshufb %%xmm6, %%xmm2\n\t" /* xmm2 := be(xmm2) */ + "pshufb %%xmm6, %%xmm3\n\t" /* xmm3 := be(xmm3) */ + "pshufb %%xmm6, %%xmm4\n\t" /* xmm4 := be(xmm4) */ + "pshufb %%xmm6, %%xmm5\n\t" /* xmm5 := be(xmm5) */ + "pshufb %%xmm6, %%xmm8\n\t" /* xmm8 := be(xmm8) */ + "pshufb %%xmm6, %%xmm9\n\t" /* xmm9 := be(xmm9) */ + "pshufb %%xmm6, %%xmm10\n\t" /* xmm10 := be(xmm10) */ + "pshufb %%xmm6, %%xmm11\n\t" /* xmm11 := be(xmm11) */ + + ".Lstore_ctr%=:\n\t" + "movdqa %%xmm5, (%[ctr])\n\t" /* Update CTR (mem). */ + : + : [ctr] "r" (ctr), + [key] "r" (ctx->keyschenc), + [addb] "r" (bige_addb) + : "%esi", "cc", "memory"); + + asm volatile ("pxor %%xmm1, %%xmm0\n\t" /* xmm0 ^= key[0] */ + "pxor %%xmm1, %%xmm2\n\t" /* xmm2 ^= key[0] */ + "pxor %%xmm1, %%xmm3\n\t" /* xmm3 ^= key[0] */ + "pxor %%xmm1, %%xmm4\n\t" /* xmm4 ^= key[0] */ + "pxor %%xmm1, %%xmm8\n\t" /* xmm8 ^= key[0] */ + "pxor %%xmm1, %%xmm9\n\t" /* xmm9 ^= key[0] */ + "pxor %%xmm1, %%xmm10\n\t" /* xmm10 ^= key[0] */ + "pxor %%xmm1, %%xmm11\n\t" /* xmm11 ^= key[0] */ + "movdqa 0x10(%[key]), %%xmm1\n\t" + "cmpl $12, %[rounds]\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x20(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x30(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x40(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x50(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x60(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x70(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x80(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0x90(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0xa0(%[key]), %%xmm1\n\t" + "jb .Lenclast%=\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0xb0(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0xc0(%[key]), %%xmm1\n\t" + "je .Lenclast%=\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0xd0(%[key]), %%xmm1\n\t" + "aesenc %%xmm1, %%xmm0\n\t" + "aesenc %%xmm1, %%xmm2\n\t" + "aesenc %%xmm1, %%xmm3\n\t" + "aesenc %%xmm1, %%xmm4\n\t" + "aesenc %%xmm1, %%xmm8\n\t" + "aesenc %%xmm1, %%xmm9\n\t" + "aesenc %%xmm1, %%xmm10\n\t" + "aesenc %%xmm1, %%xmm11\n\t" + "movdqa 0xe0(%[key]), %%xmm1\n" + + ".Lenclast%=:\n\t" + "aesenclast %%xmm1, %%xmm0\n\t" + "aesenclast %%xmm1, %%xmm2\n\t" + "aesenclast %%xmm1, %%xmm3\n\t" + "aesenclast %%xmm1, %%xmm4\n\t" + "aesenclast %%xmm1, %%xmm8\n\t" + "aesenclast %%xmm1, %%xmm9\n\t" + "aesenclast %%xmm1, %%xmm10\n\t" + "aesenclast %%xmm1, %%xmm11\n\t" + : + : [key] "r" (ctx->keyschenc), + [rounds] "r" (ctx->rounds) + : "cc", "memory"); + + asm volatile ("movdqu 0*16(%[src]), %%xmm12\n\t" /* Get block 1. */ + "movdqu 1*16(%[src]), %%xmm13\n\t" /* Get block 2. */ + "movdqu 2*16(%[src]), %%xmm14\n\t" /* Get block 3. */ + "movdqu 3*16(%[src]), %%xmm15\n\t" /* Get block 4. */ + "movdqu 4*16(%[src]), %%xmm1\n\t" /* Get block 5. */ + "pxor %%xmm12, %%xmm0\n\t" /* EncCTR-1 ^= input */ + "movdqu 5*16(%[src]), %%xmm12\n\t" /* Get block 6. */ + "pxor %%xmm13, %%xmm2\n\t" /* EncCTR-2 ^= input */ + "movdqu 6*16(%[src]), %%xmm13\n\t" /* Get block 7. */ + "pxor %%xmm14, %%xmm3\n\t" /* EncCTR-3 ^= input */ + "movdqu 7*16(%[src]), %%xmm14\n\t" /* Get block 8. */ + "pxor %%xmm15, %%xmm4\n\t" /* EncCTR-4 ^= input */ + "movdqu %%xmm0, 0*16(%[dst])\n\t" /* Store block 1 */ + "pxor %%xmm1, %%xmm8\n\t" /* EncCTR-5 ^= input */ + "movdqu %%xmm0, 0*16(%[dst])\n\t" /* Store block 1 */ + "pxor %%xmm12, %%xmm9\n\t" /* EncCTR-6 ^= input */ + "movdqu %%xmm2, 1*16(%[dst])\n\t" /* Store block 2. */ + "pxor %%xmm13, %%xmm10\n\t" /* EncCTR-7 ^= input */ + "movdqu %%xmm3, 2*16(%[dst])\n\t" /* Store block 3. */ + "pxor %%xmm14, %%xmm11\n\t" /* EncCTR-8 ^= input */ + "movdqu %%xmm4, 3*16(%[dst])\n\t" /* Store block 4. */ + "movdqu %%xmm8, 4*16(%[dst])\n\t" /* Store block 8. */ + "movdqu %%xmm9, 5*16(%[dst])\n\t" /* Store block 9. */ + "movdqu %%xmm10, 6*16(%[dst])\n\t" /* Store block 10. */ + "movdqu %%xmm11, 7*16(%[dst])\n\t" /* Store block 11. */ + : + : [src] "r" (a), + [dst] "r" (b) + : "memory"); +} + +#endif /* __x86_64__ */ + + unsigned int _gcry_aes_aesni_encrypt (const RIJNDAEL_context *ctx, unsigned char *dst, const unsigned char *src) @@ -1123,7 +1796,25 @@ _gcry_aes_aesni_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, [ctr] "m" (*ctr) : "memory"); - for ( ;nblocks > 3 ; nblocks -= 4 ) +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8 ; nblocks -= 8 ) + { + do_aesni_ctr_8 (ctx, ctr, outbuf, inbuf); + outbuf += 8*BLOCKSIZE; + inbuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + + for ( ;nblocks >= 4 ; nblocks -= 4 ) { do_aesni_ctr_4 (ctx, ctr, outbuf, inbuf); outbuf += 4*BLOCKSIZE; @@ -1175,6 +1866,76 @@ _gcry_aes_aesni_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, : "memory" ); /* CFB decryption can be parallelized */ + +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8; nblocks -= 8) + { + asm volatile + ("movdqu %%xmm6, %%xmm1\n\t" /* load input blocks */ + "movdqu 0*16(%[inbuf]), %%xmm2\n\t" + "movdqu 1*16(%[inbuf]), %%xmm3\n\t" + "movdqu 2*16(%[inbuf]), %%xmm4\n\t" + "movdqu 3*16(%[inbuf]), %%xmm8\n\t" + "movdqu 4*16(%[inbuf]), %%xmm9\n\t" + "movdqu 5*16(%[inbuf]), %%xmm10\n\t" + "movdqu 6*16(%[inbuf]), %%xmm11\n\t" + + "movdqu 7*16(%[inbuf]), %%xmm6\n\t" /* update IV */ + + "movdqa %%xmm2, %%xmm12\n\t" + "movdqa %%xmm3, %%xmm13\n\t" + "movdqa %%xmm4, %%xmm14\n\t" + "movdqa %%xmm8, %%xmm15\n\t" + : /* No output */ + : [inbuf] "r" (inbuf) + : "memory"); + + do_aesni_enc_vec8 (ctx); + + asm volatile + ( + "pxor %%xmm12, %%xmm1\n\t" + "movdqu 4*16(%[inbuf]), %%xmm12\n\t" + "pxor %%xmm13, %%xmm2\n\t" + "movdqu 5*16(%[inbuf]), %%xmm13\n\t" + "pxor %%xmm14, %%xmm3\n\t" + "movdqu 6*16(%[inbuf]), %%xmm14\n\t" + "pxor %%xmm15, %%xmm4\n\t" + "movdqu 7*16(%[inbuf]), %%xmm15\n\t" + + "pxor %%xmm12, %%xmm8\n\t" + "movdqu %%xmm1, 0*16(%[outbuf])\n\t" + "pxor %%xmm13, %%xmm9\n\t" + "movdqu %%xmm2, 1*16(%[outbuf])\n\t" + "pxor %%xmm14, %%xmm10\n\t" + "movdqu %%xmm3, 2*16(%[outbuf])\n\t" + "pxor %%xmm15, %%xmm11\n\t" + "movdqu %%xmm4, 3*16(%[outbuf])\n\t" + + "movdqu %%xmm8, 4*16(%[outbuf])\n\t" + "movdqu %%xmm9, 5*16(%[outbuf])\n\t" + "movdqu %%xmm10, 6*16(%[outbuf])\n\t" + "movdqu %%xmm11, 7*16(%[outbuf])\n\t" + + : /* No output */ + : [inbuf] "r" (inbuf), + [outbuf] "r" (outbuf) + : "memory"); + + outbuf += 8*BLOCKSIZE; + inbuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + for ( ;nblocks >= 4; nblocks -= 4) { asm volatile @@ -1260,7 +2021,76 @@ _gcry_aes_aesni_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, : [iv] "m" (*iv) : "memory"); - for ( ;nblocks > 3 ; nblocks -= 4 ) +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8 ; nblocks -= 8 ) + { + asm volatile + ("movdqu 0*16(%[inbuf]), %%xmm1\n\t" /* load input blocks */ + "movdqu 1*16(%[inbuf]), %%xmm2\n\t" + "movdqu 2*16(%[inbuf]), %%xmm3\n\t" + "movdqu 3*16(%[inbuf]), %%xmm4\n\t" + "movdqu 4*16(%[inbuf]), %%xmm8\n\t" + "movdqu 5*16(%[inbuf]), %%xmm9\n\t" + "movdqu 6*16(%[inbuf]), %%xmm10\n\t" + "movdqu 7*16(%[inbuf]), %%xmm11\n\t" + + "movdqa %%xmm1, %%xmm12\n\t" + "movdqa %%xmm2, %%xmm13\n\t" + "movdqa %%xmm3, %%xmm14\n\t" + "movdqa %%xmm4, %%xmm15\n\t" + + : /* No output */ + : [inbuf] "r" (inbuf) + : "memory"); + + do_aesni_dec_vec8 (ctx); + + asm volatile + ("pxor %%xmm5, %%xmm1\n\t" /* xor IV with output */ + + "pxor %%xmm12, %%xmm2\n\t" /* xor IV with output */ + "movdqu 4*16(%[inbuf]), %%xmm12\n\t" + + "pxor %%xmm13, %%xmm3\n\t" /* xor IV with output */ + "movdqu 5*16(%[inbuf]), %%xmm13\n\t" + + "pxor %%xmm14, %%xmm4\n\t" /* xor IV with output */ + "movdqu 6*16(%[inbuf]), %%xmm14\n\t" + + "pxor %%xmm15, %%xmm8\n\t" /* xor IV with output */ + "movdqu 7*16(%[inbuf]), %%xmm5\n\t" + "pxor %%xmm12, %%xmm9\n\t" /* xor IV with output */ + "movdqu %%xmm1, 0*16(%[outbuf])\n\t" + "pxor %%xmm13, %%xmm10\n\t" /* xor IV with output */ + "movdqu %%xmm2, 1*16(%[outbuf])\n\t" + "pxor %%xmm14, %%xmm11\n\t" /* xor IV with output */ + "movdqu %%xmm3, 2*16(%[outbuf])\n\t" + "movdqu %%xmm4, 3*16(%[outbuf])\n\t" + "movdqu %%xmm8, 4*16(%[outbuf])\n\t" + "movdqu %%xmm9, 5*16(%[outbuf])\n\t" + "movdqu %%xmm10, 6*16(%[outbuf])\n\t" + "movdqu %%xmm11, 7*16(%[outbuf])\n\t" + + : /* No output */ + : [inbuf] "r" (inbuf), + [outbuf] "r" (outbuf) + : "memory"); + + outbuf += 8*BLOCKSIZE; + inbuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + + for ( ;nblocks >= 4 ; nblocks -= 4 ) { asm volatile ("movdqu 0*16(%[inbuf]), %%xmm1\n\t" /* load input blocks */ @@ -1386,7 +2216,146 @@ aesni_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, outbuf += BLOCKSIZE; } - for ( ;nblocks > 3 ; nblocks -= 4 ) +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8 ; nblocks -= 8 ) + { + n += 4; + l = ocb_get_l(c, n); + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* Checksum_i = Checksum_{i-1} xor P_i */ + /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i) */ + + asm volatile ("movdqu %[l0], %%xmm0\n\t" + "movdqu %[inbuf0], %%xmm1\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm1, %%xmm6\n\t" + "pxor %%xmm5, %%xmm1\n\t" + "movdqu %%xmm5, %%xmm12\n\t" + : + : [l0] "m" (*c->u_mode.ocb.L[0]), + [inbuf0] "m" (*(inbuf + 0 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l1], %%xmm0\n\t" + "movdqu %[inbuf1], %%xmm2\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm2, %%xmm6\n\t" + "pxor %%xmm5, %%xmm2\n\t" + "movdqu %%xmm5, %%xmm13\n\t" + : + : [l1] "m" (*c->u_mode.ocb.L[1]), + [inbuf1] "m" (*(inbuf + 1 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l2], %%xmm0\n\t" + "movdqu %[inbuf2], %%xmm3\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm3, %%xmm6\n\t" + "pxor %%xmm5, %%xmm3\n\t" + "movdqu %%xmm5, %%xmm14\n\t" + : + : [l2] "m" (*c->u_mode.ocb.L[0]), + [inbuf2] "m" (*(inbuf + 2 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l3], %%xmm0\n\t" + "movdqu %[inbuf3], %%xmm4\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm4, %%xmm6\n\t" + "pxor %%xmm5, %%xmm4\n\t" + "movdqu %%xmm5, %%xmm15\n\t" + : + : [l3] "m" (*l), + [inbuf3] "m" (*(inbuf + 3 * BLOCKSIZE)) + : "memory" ); + + n += 4; + l = ocb_get_l(c, n); + + asm volatile ("movdqu %[l4], %%xmm0\n\t" + "movdqu %[inbuf4], %%xmm8\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm8, %%xmm6\n\t" + "pxor %%xmm5, %%xmm8\n\t" + "movdqu %%xmm5, %%xmm7\n\t" + : + : [l4] "m" (*c->u_mode.ocb.L[0]), + [inbuf4] "m" (*(inbuf + 4 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l5], %%xmm0\n\t" + "movdqu %[inbuf5], %%xmm9\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm9, %%xmm6\n\t" + "pxor %%xmm5, %%xmm9\n\t" + "movdqu %%xmm5, %[outbuf5]\n\t" + : [outbuf5] "=m" (*(outbuf + 5 * BLOCKSIZE)) + : [l5] "m" (*c->u_mode.ocb.L[1]), + [inbuf5] "m" (*(inbuf + 5 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l6], %%xmm0\n\t" + "movdqu %[inbuf6], %%xmm10\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm10, %%xmm6\n\t" + "pxor %%xmm5, %%xmm10\n\t" + "movdqu %%xmm5, %[outbuf6]\n\t" + : [outbuf6] "=m" (*(outbuf + 6 * BLOCKSIZE)) + : [l6] "m" (*c->u_mode.ocb.L[0]), + [inbuf6] "m" (*(inbuf + 6 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l7], %%xmm0\n\t" + "movdqu %[inbuf7], %%xmm11\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm11, %%xmm6\n\t" + "pxor %%xmm5, %%xmm11\n\t" + : + : [l7] "m" (*l), + [inbuf7] "m" (*(inbuf + 7 * BLOCKSIZE)) + : "memory" ); + + do_aesni_enc_vec8 (ctx); + + asm volatile ("pxor %%xmm12, %%xmm1\n\t" + "pxor %%xmm13, %%xmm2\n\t" + "pxor %%xmm14, %%xmm3\n\t" + "pxor %%xmm15, %%xmm4\n\t" + "pxor %%xmm7, %%xmm8\n\t" + "movdqu %[outbuf5],%%xmm0\n\t" + "pxor %%xmm0, %%xmm9\n\t" + "movdqu %[outbuf6],%%xmm0\n\t" + "pxor %%xmm0, %%xmm10\n\t" + "pxor %%xmm5, %%xmm11\n\t" + "movdqu %%xmm1, %[outbuf0]\n\t" + "movdqu %%xmm2, %[outbuf1]\n\t" + "movdqu %%xmm3, %[outbuf2]\n\t" + "movdqu %%xmm4, %[outbuf3]\n\t" + "movdqu %%xmm8, %[outbuf4]\n\t" + "movdqu %%xmm9, %[outbuf5]\n\t" + "movdqu %%xmm10, %[outbuf6]\n\t" + "movdqu %%xmm11, %[outbuf7]\n\t" + : [outbuf0] "=m" (*(outbuf + 0 * BLOCKSIZE)), + [outbuf1] "=m" (*(outbuf + 1 * BLOCKSIZE)), + [outbuf2] "=m" (*(outbuf + 2 * BLOCKSIZE)), + [outbuf3] "=m" (*(outbuf + 3 * BLOCKSIZE)), + [outbuf4] "=m" (*(outbuf + 4 * BLOCKSIZE)), + [outbuf5] "+m" (*(outbuf + 5 * BLOCKSIZE)), + [outbuf6] "+m" (*(outbuf + 6 * BLOCKSIZE)), + [outbuf7] "=m" (*(outbuf + 7 * BLOCKSIZE)) + : + : "memory" ); + + outbuf += 8*BLOCKSIZE; + inbuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + + for ( ;nblocks >= 4 ; nblocks -= 4 ) { n += 4; l = ocb_get_l(c, n); @@ -1551,7 +2520,146 @@ aesni_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, outbuf += BLOCKSIZE; } - for ( ;nblocks > 3 ; nblocks -= 4 ) +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8 ; nblocks -= 8 ) + { + n += 4; + l = ocb_get_l(c, n); + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* P_i = Offset_i xor DECIPHER(K, C_i xor Offset_i) */ + /* Checksum_i = Checksum_{i-1} xor P_i */ + + asm volatile ("movdqu %[l0], %%xmm0\n\t" + "movdqu %[inbuf0], %%xmm1\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm1\n\t" + "movdqa %%xmm5, %%xmm12\n\t" + : + : [l0] "m" (*c->u_mode.ocb.L[0]), + [inbuf0] "m" (*(inbuf + 0 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l1], %%xmm0\n\t" + "movdqu %[inbuf1], %%xmm2\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm2\n\t" + "movdqa %%xmm5, %%xmm13\n\t" + : + : [l1] "m" (*c->u_mode.ocb.L[1]), + [inbuf1] "m" (*(inbuf + 1 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l2], %%xmm0\n\t" + "movdqu %[inbuf2], %%xmm3\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm3\n\t" + "movdqa %%xmm5, %%xmm14\n\t" + : + : [l2] "m" (*c->u_mode.ocb.L[0]), + [inbuf2] "m" (*(inbuf + 2 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l3], %%xmm0\n\t" + "movdqu %[inbuf3], %%xmm4\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm4\n\t" + "movdqa %%xmm5, %%xmm15\n\t" + : + : [l3] "m" (*l), + [inbuf3] "m" (*(inbuf + 3 * BLOCKSIZE)) + : "memory" ); + + n += 4; + l = ocb_get_l(c, n); + + asm volatile ("movdqu %[l4], %%xmm0\n\t" + "movdqu %[inbuf4], %%xmm8\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm8\n\t" + "movdqa %%xmm5, %%xmm7\n\t" + : + : [l4] "m" (*c->u_mode.ocb.L[0]), + [inbuf4] "m" (*(inbuf + 4 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l5], %%xmm0\n\t" + "movdqu %[inbuf5], %%xmm9\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm9\n\t" + "movdqu %%xmm5, %[outbuf5]\n\t" + : [outbuf5] "=m" (*(outbuf + 5 * BLOCKSIZE)) + : [l5] "m" (*c->u_mode.ocb.L[1]), + [inbuf5] "m" (*(inbuf + 5 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l6], %%xmm0\n\t" + "movdqu %[inbuf6], %%xmm10\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm10\n\t" + "movdqu %%xmm5, %[outbuf6]\n\t" + : [outbuf6] "=m" (*(outbuf + 6 * BLOCKSIZE)) + : [l6] "m" (*c->u_mode.ocb.L[0]), + [inbuf6] "m" (*(inbuf + 6 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l7], %%xmm0\n\t" + "movdqu %[inbuf7], %%xmm11\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm11\n\t" + : + : [l7] "m" (*l), + [inbuf7] "m" (*(inbuf + 7 * BLOCKSIZE)) + : "memory" ); + + do_aesni_dec_vec8 (ctx); + + asm volatile ("movdqu %[outbuf5],%%xmm0\n\t" + "pxor %%xmm12, %%xmm1\n\t" + "movdqu %[outbuf6],%%xmm12\n\t" + "pxor %%xmm13, %%xmm2\n\t" + "pxor %%xmm14, %%xmm3\n\t" + "pxor %%xmm15, %%xmm4\n\t" + "pxor %%xmm7, %%xmm8\n\t" + "pxor %%xmm0, %%xmm9\n\t" + "pxor %%xmm12, %%xmm10\n\t" + "pxor %%xmm5, %%xmm11\n\t" + "movdqu %%xmm1, %[outbuf0]\n\t" + "movdqu %%xmm2, %[outbuf1]\n\t" + "movdqu %%xmm3, %[outbuf2]\n\t" + "movdqu %%xmm4, %[outbuf3]\n\t" + "movdqu %%xmm8, %[outbuf4]\n\t" + "movdqu %%xmm9, %[outbuf5]\n\t" + "movdqu %%xmm10, %[outbuf6]\n\t" + "movdqu %%xmm11, %[outbuf7]\n\t" + "pxor %%xmm3, %%xmm1\n\t" + "pxor %%xmm8, %%xmm1\n\t" + "pxor %%xmm10, %%xmm1\n\t" + "pxor %%xmm2, %%xmm1\n\t" + "pxor %%xmm4, %%xmm6\n\t" + "pxor %%xmm9, %%xmm6\n\t" + "pxor %%xmm11, %%xmm6\n\t" + "pxor %%xmm1, %%xmm6\n\t" + : [outbuf0] "=m" (*(outbuf + 0 * BLOCKSIZE)), + [outbuf1] "=m" (*(outbuf + 1 * BLOCKSIZE)), + [outbuf2] "=m" (*(outbuf + 2 * BLOCKSIZE)), + [outbuf3] "=m" (*(outbuf + 3 * BLOCKSIZE)), + [outbuf4] "=m" (*(outbuf + 4 * BLOCKSIZE)), + [outbuf5] "+m" (*(outbuf + 5 * BLOCKSIZE)), + [outbuf6] "+m" (*(outbuf + 6 * BLOCKSIZE)), + [outbuf7] "=m" (*(outbuf + 7 * BLOCKSIZE)) + : + : "memory" ); + + outbuf += 8*BLOCKSIZE; + inbuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + + for ( ;nblocks >= 4 ; nblocks -= 4 ) { n += 4; l = ocb_get_l(c, n); @@ -1722,7 +2830,111 @@ _gcry_aes_aesni_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, abuf += BLOCKSIZE; } - for ( ;nblocks > 3 ; nblocks -= 4 ) +#ifdef __x86_64__ + if (nblocks >= 8) + { + aesni_prepare_7_15_variable; + + aesni_prepare_7_15(); + + for ( ;nblocks >= 8 ; nblocks -= 8 ) + { + n += 4; + l = ocb_get_l(c, n); + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i) */ + asm volatile ("movdqu %[l0], %%xmm0\n\t" + "movdqu %[abuf0], %%xmm1\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm1\n\t" + : + : [l0] "m" (*c->u_mode.ocb.L[0]), + [abuf0] "m" (*(abuf + 0 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l1], %%xmm0\n\t" + "movdqu %[abuf1], %%xmm2\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm2\n\t" + : + : [l1] "m" (*c->u_mode.ocb.L[1]), + [abuf1] "m" (*(abuf + 1 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l2], %%xmm0\n\t" + "movdqu %[abuf2], %%xmm3\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm3\n\t" + : + : [l2] "m" (*c->u_mode.ocb.L[0]), + [abuf2] "m" (*(abuf + 2 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l3], %%xmm0\n\t" + "movdqu %[abuf3], %%xmm4\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm4\n\t" + : + : [l3] "m" (*l), + [abuf3] "m" (*(abuf + 3 * BLOCKSIZE)) + : "memory" ); + + n += 4; + l = ocb_get_l(c, n); + + asm volatile ("movdqu %[l4], %%xmm0\n\t" + "movdqu %[abuf4], %%xmm8\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm8\n\t" + : + : [l4] "m" (*c->u_mode.ocb.L[0]), + [abuf4] "m" (*(abuf + 4 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l5], %%xmm0\n\t" + "movdqu %[abuf5], %%xmm9\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm9\n\t" + : + : [l5] "m" (*c->u_mode.ocb.L[1]), + [abuf5] "m" (*(abuf + 5 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l6], %%xmm0\n\t" + "movdqu %[abuf6], %%xmm10\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm10\n\t" + : + : [l6] "m" (*c->u_mode.ocb.L[0]), + [abuf6] "m" (*(abuf + 6 * BLOCKSIZE)) + : "memory" ); + asm volatile ("movdqu %[l7], %%xmm0\n\t" + "movdqu %[abuf7], %%xmm11\n\t" + "pxor %%xmm0, %%xmm5\n\t" + "pxor %%xmm5, %%xmm11\n\t" + : + : [l7] "m" (*l), + [abuf7] "m" (*(abuf + 7 * BLOCKSIZE)) + : "memory" ); + + do_aesni_enc_vec8 (ctx); + + asm volatile ("pxor %%xmm2, %%xmm1\n\t" + "pxor %%xmm3, %%xmm1\n\t" + "pxor %%xmm4, %%xmm1\n\t" + "pxor %%xmm8, %%xmm1\n\t" + "pxor %%xmm9, %%xmm6\n\t" + "pxor %%xmm10, %%xmm6\n\t" + "pxor %%xmm11, %%xmm6\n\t" + "pxor %%xmm1, %%xmm6\n\t" + : + : + : "memory" ); + + abuf += 8*BLOCKSIZE; + } + + aesni_cleanup_7_15(); + } +#endif + + for ( ;nblocks >= 4 ; nblocks -= 4 ) { n += 4; l = ocb_get_l(c, n); From jussi.kivilinna at iki.fi Sun Aug 6 14:09:43 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 06 Aug 2017 15:09:43 +0300 Subject: [PATCH] Add AES-NI acceleration for AES-XTS Message-ID: <150202138361.15447.3621592672076674318.stgit@localhost.localdomain> * cipher/cipher-internal.h (gcry_cipher_handle): Change bulk XTS function to take cipher context. * cipher/cipher-xts.c (_gcry_cipher_xts_crypt): Ditto. * cipher/cipher.c (_gcry_cipher_open_internal): Setup AES-NI XTS bulk function. * cipher/rijndael-aesni.c (xts_gfmul_const, _gcry_aes_aesni_xts_enc) (_gcry_aes_aesni_xts_enc, _gcry_aes_aesni_xts_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_xts_crypt) (_gcry_aes_xts_crypt): New. * src/cipher.h (_gcry_aes_xts_crypt): New. -- Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo): Before: XTS enc | 1.66 ns/B 575.7 MiB/s 6.63 c/B XTS dec | 1.66 ns/B 575.5 MiB/s 6.63 c/B After (~6x faster): XTS enc | 0.270 ns/B 3528.5 MiB/s 1.08 c/B XTS dec | 0.272 ns/B 3511.5 MiB/s 1.09 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index b7481255..8c897d7b 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -146,7 +146,7 @@ struct gcry_cipher_handle const void *inbuf_arg, size_t nblocks, int encrypt); size_t (*ocb_auth)(gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks); - void (*xts_crypt)(gcry_cipher_hd_t c, unsigned char *tweak, + void (*xts_crypt)(void *context, unsigned char *tweak, void *outbuf_arg, const void *inbuf_arg, size_t nblocks, int encrypt); } bulk; diff --git a/cipher/cipher-xts.c b/cipher/cipher-xts.c index 4da89e55..06cefbe0 100644 --- a/cipher/cipher-xts.c +++ b/cipher/cipher-xts.c @@ -93,7 +93,8 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, /* Use a bulk method if available. */ if (nblocks && c->bulk.xts_crypt) { - c->bulk.xts_crypt (c, c->u_ctr.ctr, outbuf, inbuf, nblocks, encrypt); + c->bulk.xts_crypt (&c->context.c, c->u_ctr.ctr, outbuf, inbuf, nblocks, + encrypt); inbuf += nblocks * GCRY_XTS_BLOCK_LEN; outbuf += nblocks * GCRY_XTS_BLOCK_LEN; inbuflen -= nblocks * GCRY_XTS_BLOCK_LEN; diff --git a/cipher/cipher.c b/cipher/cipher.c index 98127386..063c13da 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -532,6 +532,7 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, h->bulk.ctr_enc = _gcry_aes_ctr_enc; h->bulk.ocb_crypt = _gcry_aes_ocb_crypt; h->bulk.ocb_auth = _gcry_aes_ocb_auth; + h->bulk.xts_crypt = _gcry_aes_xts_crypt; break; #endif /*USE_AES*/ #ifdef USE_BLOWFISH diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 3d323cf0..50a0745b 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -3007,4 +3007,295 @@ _gcry_aes_aesni_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, } +static const u64 xts_gfmul_const[16] __attribute__ ((aligned (16))) = + { 0x87, 0x01 }; + + +static void +_gcry_aes_aesni_xts_enc (RIJNDAEL_context *ctx, unsigned char *tweak, + unsigned char *outbuf, const unsigned char *inbuf, + size_t nblocks) +{ + aesni_prepare_2_6_variable; + + aesni_prepare (); + aesni_prepare_2_6 (); + + /* Preload Tweak */ + asm volatile ("movdqu %[tweak], %%xmm5\n\t" + "movdqa %[gfmul], %%xmm6\n\t" + : + : [tweak] "m" (*tweak), + [gfmul] "m" (*xts_gfmul_const) + : "memory" ); + + for ( ;nblocks >= 4; nblocks -= 4 ) + { + asm volatile ("pshufd $0x13, %%xmm5, %%xmm4\n\t" + "movdqu %[inbuf0], %%xmm1\n\t" + "pxor %%xmm5, %%xmm1\n\t" + "movdqu %%xmm5, %[outbuf0]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf0] "=m" (*(outbuf + 0 * 16)) + : [inbuf0] "m" (*(inbuf + 0 * 16)) + : "memory" ); + + asm volatile ("movdqu %[inbuf1], %%xmm2\n\t" + "pxor %%xmm5, %%xmm2\n\t" + "movdqu %%xmm5, %[outbuf1]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf1] "=m" (*(outbuf + 1 * 16)) + : [inbuf1] "m" (*(inbuf + 1 * 16)) + : "memory" ); + + asm volatile ("movdqu %[inbuf2], %%xmm3\n\t" + "pxor %%xmm5, %%xmm3\n\t" + "movdqu %%xmm5, %[outbuf2]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf2] "=m" (*(outbuf + 2 * 16)) + : [inbuf2] "m" (*(inbuf + 2 * 16)) + : "memory" ); + + asm volatile ("movdqa %%xmm4, %%xmm0\n\t" + "movdqu %[inbuf3], %%xmm4\n\t" + "pxor %%xmm5, %%xmm4\n\t" + "movdqu %%xmm5, %[outbuf3]\n\t" + + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf3] "=m" (*(outbuf + 3 * 16)) + : [inbuf3] "m" (*(inbuf + 3 * 16)) + : "memory" ); + + do_aesni_enc_vec4 (ctx); + + asm volatile ("movdqu %[outbuf0], %%xmm0\n\t" + "pxor %%xmm0, %%xmm1\n\t" + "movdqu %[outbuf1], %%xmm0\n\t" + "movdqu %%xmm1, %[outbuf0]\n\t" + "movdqu %[outbuf2], %%xmm1\n\t" + "pxor %%xmm0, %%xmm2\n\t" + "movdqu %[outbuf3], %%xmm0\n\t" + "pxor %%xmm1, %%xmm3\n\t" + "pxor %%xmm0, %%xmm4\n\t" + "movdqu %%xmm2, %[outbuf1]\n\t" + "movdqu %%xmm3, %[outbuf2]\n\t" + "movdqu %%xmm4, %[outbuf3]\n\t" + : [outbuf0] "+m" (*(outbuf + 0 * 16)), + [outbuf1] "+m" (*(outbuf + 1 * 16)), + [outbuf2] "+m" (*(outbuf + 2 * 16)), + [outbuf3] "+m" (*(outbuf + 3 * 16)) + : + : "memory" ); + + outbuf += BLOCKSIZE * 4; + inbuf += BLOCKSIZE * 4; + } + + for ( ;nblocks; nblocks-- ) + { + asm volatile ("movdqu %[inbuf], %%xmm0\n\t" + "pxor %%xmm5, %%xmm0\n\t" + "movdqa %%xmm5, %%xmm4\n\t" + + "pshufd $0x13, %%xmm5, %%xmm1\n\t" + "psrad $31, %%xmm1\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm1\n\t" + "pxor %%xmm1, %%xmm5\n\t" + : + : [inbuf] "m" (*inbuf) + : "memory" ); + + do_aesni_enc (ctx); + + asm volatile ("pxor %%xmm4, %%xmm0\n\t" + "movdqu %%xmm0, %[outbuf]\n\t" + : [outbuf] "=m" (*outbuf) + : + : "memory" ); + + outbuf += BLOCKSIZE; + inbuf += BLOCKSIZE; + } + + asm volatile ("movdqu %%xmm5, %[tweak]\n\t" + : [tweak] "=m" (*tweak) + : + : "memory" ); + + aesni_cleanup (); + aesni_cleanup_2_6 (); +} + + +static void +_gcry_aes_aesni_xts_dec (RIJNDAEL_context *ctx, unsigned char *tweak, + unsigned char *outbuf, const unsigned char *inbuf, + size_t nblocks) +{ + aesni_prepare_2_6_variable; + + aesni_prepare (); + aesni_prepare_2_6 (); + + /* Preload Tweak */ + asm volatile ("movdqu %[tweak], %%xmm5\n\t" + "movdqa %[gfmul], %%xmm6\n\t" + : + : [tweak] "m" (*tweak), + [gfmul] "m" (*xts_gfmul_const) + : "memory" ); + + for ( ;nblocks >= 4; nblocks -= 4 ) + { + asm volatile ("pshufd $0x13, %%xmm5, %%xmm4\n\t" + "movdqu %[inbuf0], %%xmm1\n\t" + "pxor %%xmm5, %%xmm1\n\t" + "movdqu %%xmm5, %[outbuf0]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf0] "=m" (*(outbuf + 0 * 16)) + : [inbuf0] "m" (*(inbuf + 0 * 16)) + : "memory" ); + + asm volatile ("movdqu %[inbuf1], %%xmm2\n\t" + "pxor %%xmm5, %%xmm2\n\t" + "movdqu %%xmm5, %[outbuf1]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf1] "=m" (*(outbuf + 1 * 16)) + : [inbuf1] "m" (*(inbuf + 1 * 16)) + : "memory" ); + + asm volatile ("movdqu %[inbuf2], %%xmm3\n\t" + "pxor %%xmm5, %%xmm3\n\t" + "movdqu %%xmm5, %[outbuf2]\n\t" + + "movdqa %%xmm4, %%xmm0\n\t" + "paddd %%xmm4, %%xmm4\n\t" + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf2] "=m" (*(outbuf + 2 * 16)) + : [inbuf2] "m" (*(inbuf + 2 * 16)) + : "memory" ); + + asm volatile ("movdqa %%xmm4, %%xmm0\n\t" + "movdqu %[inbuf3], %%xmm4\n\t" + "pxor %%xmm5, %%xmm4\n\t" + "movdqu %%xmm5, %[outbuf3]\n\t" + + "psrad $31, %%xmm0\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm0\n\t" + "pxor %%xmm0, %%xmm5\n\t" + : [outbuf3] "=m" (*(outbuf + 3 * 16)) + : [inbuf3] "m" (*(inbuf + 3 * 16)) + : "memory" ); + + do_aesni_dec_vec4 (ctx); + + asm volatile ("movdqu %[outbuf0], %%xmm0\n\t" + "pxor %%xmm0, %%xmm1\n\t" + "movdqu %[outbuf1], %%xmm0\n\t" + "movdqu %%xmm1, %[outbuf0]\n\t" + "movdqu %[outbuf2], %%xmm1\n\t" + "pxor %%xmm0, %%xmm2\n\t" + "movdqu %[outbuf3], %%xmm0\n\t" + "pxor %%xmm1, %%xmm3\n\t" + "pxor %%xmm0, %%xmm4\n\t" + "movdqu %%xmm2, %[outbuf1]\n\t" + "movdqu %%xmm3, %[outbuf2]\n\t" + "movdqu %%xmm4, %[outbuf3]\n\t" + : [outbuf0] "+m" (*(outbuf + 0 * 16)), + [outbuf1] "+m" (*(outbuf + 1 * 16)), + [outbuf2] "+m" (*(outbuf + 2 * 16)), + [outbuf3] "+m" (*(outbuf + 3 * 16)) + : + : "memory" ); + + outbuf += BLOCKSIZE * 4; + inbuf += BLOCKSIZE * 4; + } + + for ( ;nblocks; nblocks-- ) + { + asm volatile ("movdqu %[inbuf], %%xmm0\n\t" + "pxor %%xmm5, %%xmm0\n\t" + "movdqa %%xmm5, %%xmm4\n\t" + + "pshufd $0x13, %%xmm5, %%xmm1\n\t" + "psrad $31, %%xmm1\n\t" + "paddq %%xmm5, %%xmm5\n\t" + "pand %%xmm6, %%xmm1\n\t" + "pxor %%xmm1, %%xmm5\n\t" + : + : [inbuf] "m" (*inbuf) + : "memory" ); + + do_aesni_dec (ctx); + + asm volatile ("pxor %%xmm4, %%xmm0\n\t" + "movdqu %%xmm0, %[outbuf]\n\t" + : [outbuf] "=m" (*outbuf) + : + : "memory" ); + + outbuf += BLOCKSIZE; + inbuf += BLOCKSIZE; + } + + asm volatile ("movdqu %%xmm5, %[tweak]\n\t" + : [tweak] "=m" (*tweak) + : + : "memory" ); + + aesni_cleanup (); + aesni_cleanup_2_6 (); +} + + +void +_gcry_aes_aesni_xts_crypt (RIJNDAEL_context *ctx, unsigned char *tweak, + unsigned char *outbuf, const unsigned char *inbuf, + size_t nblocks, int encrypt) +{ + if (encrypt) + _gcry_aes_aesni_xts_enc(ctx, tweak, outbuf, inbuf, nblocks); + else + _gcry_aes_aesni_xts_dec(ctx, tweak, outbuf, inbuf, nblocks); +} + #endif /* USE_AESNI */ diff --git a/cipher/rijndael.c b/cipher/rijndael.c index 8637195a..548bfa09 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -103,6 +103,11 @@ extern void _gcry_aes_aesni_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, int encrypt); extern void _gcry_aes_aesni_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks); +extern void _gcry_aes_aesni_xts_crypt (RIJNDAEL_context *ctx, + unsigned char *tweak, + unsigned char *outbuf, + const unsigned char *inbuf, + size_t nblocks, int encrypt); #endif #ifdef USE_SSSE3 @@ -1467,6 +1472,85 @@ _gcry_aes_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks) } +/* Bulk encryption/decryption of complete blocks in XTS mode. */ +void +_gcry_aes_xts_crypt (void *context, unsigned char *tweak, + void *outbuf_arg, const void *inbuf_arg, + size_t nblocks, int encrypt) +{ + RIJNDAEL_context *ctx = context; + unsigned char *outbuf = outbuf_arg; + const unsigned char *inbuf = inbuf_arg; + unsigned int burn_depth = 0; + rijndael_cryptfn_t crypt_fn; + u64 tweak_lo, tweak_hi, tweak_next_lo, tweak_next_hi, tmp_lo, tmp_hi, carry; + + if (encrypt) + { + if (ctx->prefetch_enc_fn) + ctx->prefetch_enc_fn(); + + crypt_fn = ctx->encrypt_fn; + } + else + { + check_decryption_preparation (ctx); + + if (ctx->prefetch_dec_fn) + ctx->prefetch_dec_fn(); + + crypt_fn = ctx->decrypt_fn; + } + + if (0) + ; +#ifdef USE_AESNI + else if (ctx->use_aesni) + { + _gcry_aes_aesni_xts_crypt (ctx, tweak, outbuf, inbuf, nblocks, encrypt); + burn_depth = 0; + } +#endif /*USE_AESNI*/ + else + { + tweak_next_lo = buf_get_le64 (tweak + 0); + tweak_next_hi = buf_get_le64 (tweak + 8); + + while (nblocks) + { + tweak_lo = tweak_next_lo; + tweak_hi = tweak_next_hi; + + /* Xor-Encrypt/Decrypt-Xor block. */ + tmp_lo = buf_get_le64 (inbuf + 0) ^ tweak_lo; + tmp_hi = buf_get_le64 (inbuf + 8) ^ tweak_hi; + + buf_put_le64 (outbuf + 0, tmp_lo); + buf_put_le64 (outbuf + 8, tmp_hi); + + /* Generate next tweak. */ + carry = -(tweak_next_hi >> 63) & 0x87; + tweak_next_hi = (tweak_next_hi << 1) + (tweak_next_lo >> 63); + tweak_next_lo = (tweak_next_lo << 1) ^ carry; + + burn_depth = crypt_fn (ctx, outbuf, outbuf); + + buf_put_le64 (outbuf + 0, buf_get_le64 (outbuf + 0) ^ tweak_lo); + buf_put_le64 (outbuf + 8, buf_get_le64 (outbuf + 8) ^ tweak_hi); + + outbuf += GCRY_XTS_BLOCK_LEN; + inbuf += GCRY_XTS_BLOCK_LEN; + nblocks--; + } + + buf_put_le64 (tweak + 0, tweak_next_lo); + buf_put_le64 (tweak + 8, tweak_next_hi); + } + + if (burn_depth) + _gcry_burn_stack (burn_depth + 5 * sizeof(void *)); +} + /* Run the self-tests for AES 128. Returns NULL on success. */ static const char* diff --git a/src/cipher.h b/src/cipher.h index f2acb556..d9e0ac6a 100644 --- a/src/cipher.h +++ b/src/cipher.h @@ -158,6 +158,9 @@ size_t _gcry_aes_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, const void *inbuf_arg, size_t nblocks, int encrypt); size_t _gcry_aes_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks); +void _gcry_aes_xts_crypt (void *context, unsigned char *tweak, + void *outbuf_arg, const void *inbuf_arg, + size_t nblocks, int encrypt); /*-- blowfish.c --*/ void _gcry_blowfish_cfb_dec (void *context, unsigned char *iv, From jussi.kivilinna at iki.fi Sun Aug 6 14:09:49 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 06 Aug 2017 15:09:49 +0300 Subject: [PATCH] Add ARMv8/AArch64 implementation of chacha20 Message-ID: <150202138983.15504.16870943748027047649.stgit@localhost.localdomain> * cipher/Makefile.am: Add 'chacha20-aarch64.S'. * cipher/chacha20-aarch64.S: New. * cipher/chacha20.c (USE_AARCH64_SIMD): New. (_gcry_chacha20_aarch_blocks): New. (chacha20_do_setkey): Add HWF selection for Aarch64 implementation. * configure.ac: Add 'chacha20-aarch64.lo'. -- Patch adds ARMv8/AArch64 SIMD implementation based on public domain ARMv7/NEON implementation by Andrew Moon at: https://github.com/floodyberry/chacha-opt Benchmark on ARM Cortex-A53 (1536 Mhz): Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 5.70 ns/B 167.2 MiB/s 8.76 c/B STREAM dec | 5.71 ns/B 166.9 MiB/s 8.78 c/B After (~1.7x faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 3.32 ns/B 287.7 MiB/s 5.09 c/B STREAM dec | 3.31 ns/B 287.9 MiB/s 5.09 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 95c45108..26d25e1a 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -65,7 +65,7 @@ arcfour.c arcfour-amd64.S \ blowfish.c blowfish-amd64.S blowfish-arm.S \ cast5.c cast5-amd64.S cast5-arm.S \ chacha20.c chacha20-sse2-amd64.S chacha20-ssse3-amd64.S chacha20-avx2-amd64.S \ - chacha20-armv7-neon.S \ + chacha20-armv7-neon.S chacha20-aarch64.S \ crc.c \ crc-intel-pclmul.c \ des.c des-amd64.S \ diff --git a/cipher/chacha20-aarch64.S b/cipher/chacha20-aarch64.S new file mode 100644 index 00000000..d07511ff --- /dev/null +++ b/cipher/chacha20-aarch64.S @@ -0,0 +1,772 @@ +/* chacha20-aarch64.S - ARMv8/AArch64 accelerated chacha20 blocks function + * + * Copyright (C) 2014,2017 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +/* + * Based on public domain ARMv7/NEON implementation by Andrew Moon at + * https://github.com/floodyberry/chacha-opt + */ + +#include + +#if defined(__AARCH64EL__) && \ + defined(HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS) && \ + defined(HAVE_GCC_INLINE_ASM_AARCH64_NEON) && \ + defined(USE_CHACHA20) + +.cpu generic+simd + +.text + +#define STMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \ + add x17, ptr, #8; \ + stp l0, l1, [ptr], #16; \ + stp l2, l3, [x17], #16; \ + stp l4, l5, [ptr], #16; \ + stp l6, l7, [x17]; + +#define LDMIA16(ptr, l0, l1, l2, l3, l4, l5, l6, l7, \ + l8, l9, l10, l11, l12, l13, l14, l15) \ + add x17, ptr, #8; \ + ldp l0, l1, [ptr], #16; \ + ldp l2, l3, [x17], #16; \ + ldp l4, l5, [ptr], #16; \ + ldp l6, l7, [x17], #16; \ + ldp l8, l9, [ptr], #16; \ + ldp l10, l11, [x17], #16; \ + ldp l12, l13, [ptr], #16; \ + ldp l14, l15, [x17]; \ + +#define LDMIA8(ptr, l0, l1, l2, l3, l4, l5, l6, l7) \ + add x17, ptr, #8; \ + ldp l0, l1, [ptr], #16; \ + ldp l2, l3, [x17], #16; \ + ldp l4, l5, [ptr], #16; \ + ldp l6, l7, [x17]; + +#define LDMIA4(ptr, l0, l1, l2, l3) \ + ldp l0, l1, [ptr], #8; \ + ldp l2, l3, [ptr], #8; + +#define EXT32(a,b,c,n) \ + ext a,b,c,#(n*4); + +.text + +#define STACK_STATE 48 +#define STACK_SRC 56 +#define STACK_SP 192 +#define STACK_DST 200 +#define STACK_BYTES 208 +#define STACK_DST_TMP 216 + +.globl _gcry_chacha20_aarch64_blocks +.type _gcry_chacha20_aarch64_blocks,%function; +_gcry_chacha20_aarch64_blocks: +.Lchacha_blocks_neon_local: + tst x3, x3 + beq .Lchacha_blocks_neon_nobytes + mov x16, sp + mov x8, sp + sub x16, x16, #(216+8) + mov v16.16b, v8.16b + mov v17.16b, v9.16b + and x16, x16, #(-32) + mov v18.16b, v10.16b + mov v19.16b, v11.16b + mov v20.16b, v12.16b + mov sp, x16 + add x16, x16, #64 + mov v21.16b, v13.16b + mov v22.16b, v14.16b + mov v23.16b, v15.16b + mov w4, #20 + ld1 {v24.4s-v27.4s}, [x0] + str x0, [sp, # STACK_STATE] + str x1, [sp, # STACK_SRC] + str x2, [sp, # STACK_DST] + str x3, [sp, # STACK_BYTES] + str x8, [sp, # STACK_SP] + st1 {v24.4s-v27.4s}, [x16] + str w4, [sp, #44] + cmp x3, #256 + blo .Lchacha_blocks_neon_mainloop2 +.Lchacha_blocks_neon_mainloop1: + ldr w0, [sp, #44] + add x16, sp, #64 + str w0, [sp, #0] + mov x2, #1 + eor v12.16b, v12.16b, v12.16b + mov v0.16b, v24.16b + mov v1.16b, v25.16b + mov v2.16b, v26.16b + mov v3.16b, v27.16b + mov v12.2d[0], x2 + add v3.2d, v3.2d, v12.2d + mov v4.16b, v0.16b + mov v5.16b, v1.16b + mov v6.16b, v2.16b + add v7.2d, v3.2d, v12.2d + LDMIA16(x16, w0, w1, w2, w3, w4, w5, w6, w7, + w8, w9, w10, w11, w12, w13, w14, w15) + mov v8.16b, v0.16b + mov v9.16b, v1.16b + mov v10.16b, v2.16b + add v11.2d, v7.2d, v12.2d + str w6, [sp, #8] + str w11, [sp, #12] + mov w11, w13 + str w15, [sp, #28] +.Lchacha_blocks_neon_rounds1: + ldr w6, [sp, #0] + add v0.4s, v0.4s, v1.4s + add w0, w0, w4 + add v4.4s, v4.4s, v5.4s + add w1, w1, w5 + add v8.4s, v8.4s, v9.4s + eor w12, w12, w0 + eor v12.16b, v3.16b, v0.16b + eor w11, w11, w1 + eor v13.16b, v7.16b, v4.16b + ror w12, w12, #16 + eor v14.16b, v11.16b, v8.16b + ror w11, w11, #16 + rev32 v3.8h, v12.8h + subs w6, w6, #2 + rev32 v7.8h, v13.8h + add w8, w8, w12 + rev32 v11.8h, v14.8h + add w9, w9, w11 + add v2.4s, v2.4s, v3.4s + eor w4, w4, w8 + add v6.4s, v6.4s, v7.4s + eor w5, w5, w9 + add v10.4s, v10.4s, v11.4s + str w6, [sp, #0] + eor v12.16b, v1.16b, v2.16b + ror w4, w4, #20 + eor v13.16b, v5.16b, v6.16b + ror w5, w5, #20 + eor v14.16b, v9.16b, v10.16b + add w0, w0, w4 + shl v1.4s, v12.4s, #12 + add w1, w1, w5 + shl v5.4s, v13.4s, #12 + ldr w6, [sp, #8] + shl v9.4s, v14.4s, #12 + eor w12, w12, w0 + sri v1.4s, v12.4s, #20 + eor w11, w11, w1 + sri v5.4s, v13.4s, #20 + ror w12, w12, #24 + sri v9.4s, v14.4s, #20 + ror w11, w11, #24 + add v0.4s, v0.4s, v1.4s + add w8, w8, w12 + add v4.4s, v4.4s, v5.4s + add w9, w9, w11 + add v8.4s, v8.4s, v9.4s + eor w4, w4, w8 + eor v12.16b, v3.16b, v0.16b + eor w5, w5, w9 + eor v13.16b, v7.16b, v4.16b + str w11, [sp, #20] + eor v14.16b, v11.16b, v8.16b + ror w4, w4, #25 + shl v3.4s, v12.4s, #8 + ror w5, w5, #25 + shl v7.4s, v13.4s, #8 + str w4, [sp, #4] + shl v11.4s, v14.4s, #8 + ldr w4, [sp, #28] + sri v3.4s, v12.4s, #24 + add w2, w2, w6 + sri v7.4s, v13.4s, #24 + add w3, w3, w7 + sri v11.4s, v14.4s, #24 + ldr w11, [sp, #12] + add v2.4s, v2.4s, v3.4s + eor w14, w14, w2 + add v6.4s, v6.4s, v7.4s + eor w4, w4, w3 + add v10.4s, v10.4s, v11.4s + ror w14, w14, #16 + eor v12.16b, v1.16b, v2.16b + ror w4, w4, #16 + eor v13.16b, v5.16b, v6.16b + add w10, w10, w14 + eor v14.16b, v9.16b, v10.16b + add w11, w11, w4 + shl v1.4s, v12.4s, #7 + eor w6, w6, w10 + shl v5.4s, v13.4s, #7 + eor w7, w7, w11 + shl v9.4s, v14.4s, #7 + ror w6, w6, #20 + sri v1.4s, v12.4s, #25 + ror w7, w7, #20 + sri v5.4s, v13.4s, #25 + add w2, w2, w6 + sri v9.4s, v14.4s, #25 + add w3, w3, w7 + EXT32(v3.16b, v3.16b, v3.16b, 3) + eor w14, w14, w2 + EXT32(v7.16b, v7.16b, v7.16b, 3) + eor w4, w4, w3 + EXT32(v11.16b, v11.16b, v11.16b, 3) + ror w14, w14, #24 + EXT32(v1.16b, v1.16b, v1.16b, 1) + ror w4, w4, #24 + EXT32(v5.16b, v5.16b, v5.16b, 1) + add w10, w10, w14 + EXT32(v9.16b, v9.16b, v9.16b, 1) + add w11, w11, w4 + EXT32(v2.16b, v2.16b, v2.16b, 2) + eor w6, w6, w10 + EXT32(v6.16b, v6.16b, v6.16b, 2) + eor w7, w7, w11 + EXT32(v10.16b, v10.16b, v10.16b, 2) + ror w6, w6, #25 + add v0.4s, v0.4s, v1.4s + ror w7, w7, #25 + add v4.4s, v4.4s, v5.4s + add w0, w0, w5 + add v8.4s, v8.4s, v9.4s + add w1, w1, w6 + eor v12.16b, v3.16b, v0.16b + eor w4, w4, w0 + eor v13.16b, v7.16b, v4.16b + eor w12, w12, w1 + eor v14.16b, v11.16b, v8.16b + ror w4, w4, #16 + rev32 v3.8h, v12.8h + ror w12, w12, #16 + rev32 v7.8h, v13.8h + add w10, w10, w4 + rev32 v11.8h, v14.8h + add w11, w11, w12 + add v2.4s, v2.4s, v3.4s + eor w5, w5, w10 + add v6.4s, v6.4s, v7.4s + eor w6, w6, w11 + add v10.4s, v10.4s, v11.4s + ror w5, w5, #20 + eor v12.16b, v1.16b, v2.16b + ror w6, w6, #20 + eor v13.16b, v5.16b, v6.16b + add w0, w0, w5 + eor v14.16b, v9.16b, v10.16b + add w1, w1, w6 + shl v1.4s, v12.4s, #12 + eor w4, w4, w0 + shl v5.4s, v13.4s, #12 + eor w12, w12, w1 + shl v9.4s, v14.4s, #12 + ror w4, w4, #24 + sri v1.4s, v12.4s, #20 + ror w12, w12, #24 + sri v5.4s, v13.4s, #20 + add w10, w10, w4 + sri v9.4s, v14.4s, #20 + add w11, w11, w12 + add v0.4s, v0.4s, v1.4s + eor w5, w5, w10 + add v4.4s, v4.4s, v5.4s + eor w6, w6, w11 + add v8.4s, v8.4s, v9.4s + str w11, [sp, #12] + eor v12.16b, v3.16b, v0.16b + ror w5, w5, #25 + eor v13.16b, v7.16b, v4.16b + ror w6, w6, #25 + eor v14.16b, v11.16b, v8.16b + str w4, [sp, #28] + shl v3.4s, v12.4s, #8 + ldr w4, [sp, #4] + shl v7.4s, v13.4s, #8 + add w2, w2, w7 + shl v11.4s, v14.4s, #8 + add w3, w3, w4 + sri v3.4s, v12.4s, #24 + ldr w11, [sp, #20] + sri v7.4s, v13.4s, #24 + eor w11, w11, w2 + sri v11.4s, v14.4s, #24 + eor w14, w14, w3 + add v2.4s, v2.4s, v3.4s + ror w11, w11, #16 + add v6.4s, v6.4s, v7.4s + ror w14, w14, #16 + add v10.4s, v10.4s, v11.4s + add w8, w8, w11 + eor v12.16b, v1.16b, v2.16b + add w9, w9, w14 + eor v13.16b, v5.16b, v6.16b + eor w7, w7, w8 + eor v14.16b, v9.16b, v10.16b + eor w4, w4, w9 + shl v1.4s, v12.4s, #7 + ror w7, w7, #20 + shl v5.4s, v13.4s, #7 + ror w4, w4, #20 + shl v9.4s, v14.4s, #7 + str w6, [sp, #8] + sri v1.4s, v12.4s, #25 + add w2, w2, w7 + sri v5.4s, v13.4s, #25 + add w3, w3, w4 + sri v9.4s, v14.4s, #25 + eor w11, w11, w2 + EXT32(v3.16b, v3.16b, v3.16b, 1) + eor w14, w14, w3 + EXT32(v7.16b, v7.16b, v7.16b, 1) + ror w11, w11, #24 + EXT32(v11.16b, v11.16b, v11.16b, 1) + ror w14, w14, #24 + EXT32(v1.16b, v1.16b, v1.16b, 3) + add w8, w8, w11 + EXT32(v5.16b, v5.16b, v5.16b, 3) + add w9, w9, w14 + EXT32(v9.16b, v9.16b, v9.16b, 3) + eor w7, w7, w8 + EXT32(v2.16b, v2.16b, v2.16b, 2) + eor w4, w4, w9 + EXT32(v6.16b, v6.16b, v6.16b, 2) + ror w7, w7, #25 + EXT32(v10.16b, v10.16b, v10.16b, 2) + ror w4, w4, #25 + bne .Lchacha_blocks_neon_rounds1 + str w8, [sp, #0] + str w9, [sp, #4] + mov v12.16b, v24.16b + str w10, [sp, #8] + str w12, [sp, #16] + mov v13.16b, v25.16b + str w11, [sp, #20] + str w14, [sp, #24] + mov v14.16b, v26.16b + mov v15.16b, v27.16b + ldr x12, [sp, # STACK_SRC] + ldr x14, [sp, # STACK_DST] + add v0.4s, v0.4s, v12.4s + ldr w8, [sp, #(64 +0)] + add v4.4s, v4.4s, v12.4s + ldr w9, [sp, #(64 +4)] + add v8.4s, v8.4s, v12.4s + ldr w10, [sp, #(64 +8)] + add v1.4s, v1.4s, v13.4s + ldr w11, [sp, #(64 +12)] + add v5.4s, v5.4s, v13.4s + add w0, w0, w8 + add v9.4s, v9.4s, v13.4s + add w1, w1, w9 + add v2.4s, v2.4s, v14.4s + add w2, w2, w10 + add v6.4s, v6.4s, v14.4s + ldr w8, [sp, #(64 +16)] + add v10.4s, v10.4s, v14.4s + add w3, w3, w11 + eor v14.16b, v14.16b, v14.16b + ldr w9, [sp, #(64 +20)] + mov x11, #1 + add w4, w4, w8 + mov v14.2d[0], x11 + ldr w10, [sp, #(64 +24)] + add v12.2d, v14.2d, v15.2d + add w5, w5, w9 + add v13.2d, v14.2d, v12.2d + ldr w11, [sp, #(64 +28)] + add v14.2d, v14.2d, v13.2d + add w6, w6, w10 + add v3.4s, v3.4s, v12.4s + tst x12, x12 + add v7.4s, v7.4s, v13.4s + add w7, w7, w11 + add v11.4s, v11.4s, v14.4s + beq .Lchacha_blocks_neon_nomessage11 + LDMIA4(x12, w8, w9, w10, w11) + tst x12, x12 + eor w0, w0, w8 + eor w1, w1, w9 + eor w2, w2, w10 + ldr w8, [x12, #0] + eor w3, w3, w11 + ldr w9, [x12, #4] + eor w4, w4, w8 + ldr w10, [x12, #8] + eor w5, w5, w9 + ldr w11, [x12, #12] + eor w6, w6, w10 + add x12, x12, #16 + eor w7, w7, w11 +.Lchacha_blocks_neon_nomessage11: + mov x16, sp + STMIA8(x14, w0, w1, w2, w3, w4, w5, w6, w7) + tst x12, x12 + LDMIA8(x16, w0, w1, w2, w3, w4, w5, w6, w7) + ldr w8, [sp, #(64 +32)] + ldr w9, [sp, #(64 +36)] + ldr w10, [sp, #(64 +40)] + ldr w11, [sp, #(64 +44)] + add w0, w0, w8 + add w1, w1, w9 + add w2, w2, w10 + ldr w8, [sp, #(64 +48)] + add w3, w3, w11 + ldr w9, [sp, #(64 +52)] + add w4, w4, w8 + ldr w10, [sp, #(64 +56)] + add w5, w5, w9 + ldr w11, [sp, #(64 +60)] + add w6, w6, w10 + adds w8, w8, #4 + add w7, w7, w11 + adc w9, w9, wzr + str w8, [sp, #(64 +48)] + mov v27.4s[0], w8 + tst x12, x12 + str w9, [sp, #(64 +52)] + mov v27.4s[1], w9 + beq .Lchacha_blocks_neon_nomessage12 + LDMIA4(x12, w8, w9, w10, w11) + tst x12, x12 + eor w0, w0, w8 + eor w1, w1, w9 + eor w2, w2, w10 + ldr w8, [x12, #0] + eor w3, w3, w11 + ldr w9, [x12, #4] + eor w4, w4, w8 + ldr w10, [x12, #8] + eor w5, w5, w9 + ldr w11, [x12, #12] + eor w6, w6, w10 + add x12, x12, #16 + eor w7, w7, w11 +.Lchacha_blocks_neon_nomessage12: + STMIA8(x14, w0, w1, w2, w3, w4, w5, w6, w7) + tst x12, x12 + beq .Lchacha_blocks_neon_nomessage13 + ld1 {v12.4s-v15.4s}, [x12], #64 + eor v0.16b, v0.16b, v12.16b + eor v1.16b, v1.16b, v13.16b + eor v2.16b, v2.16b, v14.16b + eor v3.16b, v3.16b, v15.16b +.Lchacha_blocks_neon_nomessage13: + st1 {v0.4s-v3.4s}, [x14], #64 + beq .Lchacha_blocks_neon_nomessage14 + ld1 {v12.4s-v15.4s}, [x12], #64 + eor v4.16b, v4.16b, v12.16b + eor v5.16b, v5.16b, v13.16b + eor v6.16b, v6.16b, v14.16b + eor v7.16b, v7.16b, v15.16b +.Lchacha_blocks_neon_nomessage14: + st1 {v4.4s-v7.4s}, [x14], #64 + beq .Lchacha_blocks_neon_nomessage15 + ld1 {v12.4s-v15.4s}, [x12], #64 + eor v8.16b, v8.16b, v12.16b + eor v9.16b, v9.16b, v13.16b + eor v10.16b, v10.16b, v14.16b + eor v11.16b, v11.16b, v15.16b +.Lchacha_blocks_neon_nomessage15: + st1 {v8.4s-v11.4s}, [x14], #64 + str x12, [sp, # STACK_SRC] + str x14, [sp, # STACK_DST] + ldr x3, [sp, # STACK_BYTES] + sub x3, x3, #256 + cmp x3, #256 + str x3, [sp, # STACK_BYTES] + bhs .Lchacha_blocks_neon_mainloop1 + tst x3, x3 + beq .Lchacha_blocks_neon_done +.Lchacha_blocks_neon_mainloop2: + ldr x3, [sp, # STACK_BYTES] + ldr x1, [sp, # STACK_SRC] + cmp x3, #64 + bhs .Lchacha_blocks_neon_noswap1 + add x4, sp, #128 + mov x5, x4 + tst x1, x1 + beq .Lchacha_blocks_neon_nocopy1 +.Lchacha_blocks_neon_copyinput1: + subs x3, x3, #1 + ldrb w0, [x1], #1 + strb w0, [x4], #1 + bne .Lchacha_blocks_neon_copyinput1 + str x5, [sp, # STACK_SRC] +.Lchacha_blocks_neon_nocopy1: + ldr x4, [sp, # STACK_DST] + str x5, [sp, # STACK_DST] + str x4, [sp, # STACK_DST_TMP] +.Lchacha_blocks_neon_noswap1: + add x16, sp, #64 + ldr w0, [sp, #44] + str w0, [sp, #0] + LDMIA16(x16, w0, w1, w2, w3, w4, w5, w6, w7, + w8, w9, w10, w11, w12, w13, w14, w15) + str w6, [sp, #8] + str w11, [sp, #12] + mov w11, w13 + str w15, [sp, #28] +.Lchacha_blocks_neon_rounds2: + ldr w6, [sp, #0] + add w0, w0, w4 + add w1, w1, w5 + eor w12, w12, w0 + eor w11, w11, w1 + ror w12, w12, #16 + ror w11, w11, #16 + subs w6, w6, #2 + add w8, w8, w12 + add w9, w9, w11 + eor w4, w4, w8 + eor w5, w5, w9 + str w6, [sp, #0] + ror w4, w4, #20 + ror w5, w5, #20 + add w0, w0, w4 + add w1, w1, w5 + ldr w6, [sp, #8] + eor w12, w12, w0 + eor w11, w11, w1 + ror w12, w12, #24 + ror w11, w11, #24 + add w8, w8, w12 + add w9, w9, w11 + eor w4, w4, w8 + eor w5, w5, w9 + str w11, [sp, #20] + ror w4, w4, #25 + ror w5, w5, #25 + str w4, [sp, #4] + ldr w4, [sp, #28] + add w2, w2, w6 + add w3, w3, w7 + ldr w11, [sp, #12] + eor w14, w14, w2 + eor w4, w4, w3 + ror w14, w14, #16 + ror w4, w4, #16 + add w10, w10, w14 + add w11, w11, w4 + eor w6, w6, w10 + eor w7, w7, w11 + ror w6, w6, #20 + ror w7, w7, #20 + add w2, w2, w6 + add w3, w3, w7 + eor w14, w14, w2 + eor w4, w4, w3 + ror w14, w14, #24 + ror w4, w4, #24 + add w10, w10, w14 + add w11, w11, w4 + eor w6, w6, w10 + eor w7, w7, w11 + ror w6, w6, #25 + ror w7, w7, #25 + add w0, w0, w5 + add w1, w1, w6 + eor w4, w4, w0 + eor w12, w12, w1 + ror w4, w4, #16 + ror w12, w12, #16 + add w10, w10, w4 + add w11, w11, w12 + eor w5, w5, w10 + eor w6, w6, w11 + ror w5, w5, #20 + ror w6, w6, #20 + add w0, w0, w5 + add w1, w1, w6 + eor w4, w4, w0 + eor w12, w12, w1 + ror w4, w4, #24 + ror w12, w12, #24 + add w10, w10, w4 + add w11, w11, w12 + eor w5, w5, w10 + eor w6, w6, w11 + str w11, [sp, #12] + ror w5, w5, #25 + ror w6, w6, #25 + str w4, [sp, #28] + ldr w4, [sp, #4] + add w2, w2, w7 + add w3, w3, w4 + ldr w11, [sp, #20] + eor w11, w11, w2 + eor w14, w14, w3 + ror w11, w11, #16 + ror w14, w14, #16 + add w8, w8, w11 + add w9, w9, w14 + eor w7, w7, w8 + eor w4, w4, w9 + ror w7, w7, #20 + ror w4, w4, #20 + str w6, [sp, #8] + add w2, w2, w7 + add w3, w3, w4 + eor w11, w11, w2 + eor w14, w14, w3 + ror w11, w11, #24 + ror w14, w14, #24 + add w8, w8, w11 + add w9, w9, w14 + eor w7, w7, w8 + eor w4, w4, w9 + ror w7, w7, #25 + ror w4, w4, #25 + bne .Lchacha_blocks_neon_rounds2 + str w8, [sp, #0] + str w9, [sp, #4] + str w10, [sp, #8] + str w12, [sp, #16] + str w11, [sp, #20] + str w14, [sp, #24] + ldr x12, [sp, # STACK_SRC] + ldr x14, [sp, # STACK_DST] + ldr w8, [sp, #(64 +0)] + ldr w9, [sp, #(64 +4)] + ldr w10, [sp, #(64 +8)] + ldr w11, [sp, #(64 +12)] + add w0, w0, w8 + add w1, w1, w9 + add w2, w2, w10 + ldr w8, [sp, #(64 +16)] + add w3, w3, w11 + ldr w9, [sp, #(64 +20)] + add w4, w4, w8 + ldr w10, [sp, #(64 +24)] + add w5, w5, w9 + ldr w11, [sp, #(64 +28)] + add w6, w6, w10 + tst x12, x12 + add w7, w7, w11 + beq .Lchacha_blocks_neon_nomessage21 + LDMIA4(x12, w8, w9, w10, w11) + tst x12, x12 + eor w0, w0, w8 + eor w1, w1, w9 + eor w2, w2, w10 + ldr w8, [x12, #0] + eor w3, w3, w11 + ldr w9, [x12, #4] + eor w4, w4, w8 + ldr w10, [x12, #8] + eor w5, w5, w9 + ldr w11, [x12, #12] + eor w6, w6, w10 + add x12, x12, #16 + eor w7, w7, w11 +.Lchacha_blocks_neon_nomessage21: + mov x16, sp + STMIA8(x14, w0, w1, w2, w3, w4, w5, w6, w7) + LDMIA8(x16, w0, w1, w2, w3, w4, w5, w6, w7) + ldr w8, [sp, #(64 +32)] + ldr w9, [sp, #(64 +36)] + ldr w10, [sp, #(64 +40)] + ldr w11, [sp, #(64 +44)] + add w0, w0, w8 + add w1, w1, w9 + add w2, w2, w10 + ldr w8, [sp, #(64 +48)] + add w3, w3, w11 + ldr w9, [sp, #(64 +52)] + add w4, w4, w8 + ldr w10, [sp, #(64 +56)] + add w5, w5, w9 + ldr w11, [sp, #(64 +60)] + add w6, w6, w10 + adds w8, w8, #1 + add w7, w7, w11 + adc w9, w9, wzr + str w8, [sp, #(64 +48)] + tst x12, x12 + str w9, [sp, #(64 +52)] + beq .Lchacha_blocks_neon_nomessage22 + LDMIA4(x12, w8, w9, w10, w11) + tst x12, x12 + eor w0, w0, w8 + eor w1, w1, w9 + eor w2, w2, w10 + ldr w8, [x12, #0] + eor w3, w3, w11 + ldr w9, [x12, #4] + eor w4, w4, w8 + ldr w10, [x12, #8] + eor w5, w5, w9 + ldr w11, [x12, #12] + eor w6, w6, w10 + add x12, x12, #16 + eor w7, w7, w11 +.Lchacha_blocks_neon_nomessage22: + STMIA8(x14, w0, w1, w2, w3, w4, w5, w6, w7) + str x12, [sp, # STACK_SRC] + str x14, [sp, # STACK_DST] + ldr x3, [sp, # STACK_BYTES] + cmp x3, #64 + sub x4, x3, #64 + str x4, [sp, # STACK_BYTES] + bhi .Lchacha_blocks_neon_mainloop2 + cmp x3, #64 + beq .Lchacha_blocks_neon_nocopy2 + ldr x1, [sp, # STACK_DST_TMP] + sub x14, x14, #64 +.Lchacha_blocks_neon_copyinput2: + subs x3, x3, #1 + ldrb w0, [x14], #1 + strb w0, [x1], #1 + bne .Lchacha_blocks_neon_copyinput2 +.Lchacha_blocks_neon_nocopy2: +.Lchacha_blocks_neon_done: + ldr x16, [sp, # STACK_SP] + ldr x7, [sp, # STACK_STATE] + ldr w8, [sp, #(64 +48)] + ldr w9, [sp, #(64 +52)] + str w8, [x7, #(48 + 0)] + str w9, [x7, #(48 + 4)] + sub x0, sp, #8 + mov v8.16b, v16.16b + mov v9.16b, v17.16b + mov v10.16b, v18.16b + mov v11.16b, v19.16b + mov sp, x16 + mov v12.16b, v20.16b + mov v13.16b, v21.16b + mov v14.16b, v22.16b + mov v15.16b, v23.16b + sub x0, sp, x0 + eor v0.16b, v0.16b, v0.16b + eor v1.16b, v1.16b, v1.16b + eor v2.16b, v2.16b, v2.16b + eor v3.16b, v3.16b, v3.16b + eor v4.16b, v4.16b, v4.16b + eor v5.16b, v5.16b, v5.16b + eor v6.16b, v6.16b, v6.16b + eor v7.16b, v7.16b, v7.16b + ret +.Lchacha_blocks_neon_nobytes: + mov x0, xzr; + ret +.ltorg +.size _gcry_chacha20_aarch64_blocks,.-_gcry_chacha20_aarch64_blocks; + +#endif diff --git a/cipher/chacha20.c b/cipher/chacha20.c index 613fa82a..a11986c1 100644 --- a/cipher/chacha20.c +++ b/cipher/chacha20.c @@ -81,6 +81,16 @@ # endif #endif /*ENABLE_NEON_SUPPORT*/ +/* USE_AARCH64_SIMD indicates whether to enable ARMv8 SIMD assembly + * code. */ +#undef USE_AARCH64_SIMD +#ifdef ENABLE_NEON_SUPPORT +# if defined(__AARCH64EL__) \ + && defined(HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS) \ + && defined(HAVE_GCC_INLINE_ASM_AARCH64_NEON) +# define USE_AARCH64_SIMD 1 +# endif +#endif struct CHACHA20_context_s; @@ -144,6 +154,14 @@ unsigned int _gcry_chacha20_armv7_neon_blocks(u32 *state, const byte *in, #endif /* USE_NEON */ +#ifdef USE_AARCH64_SIMD + +unsigned int _gcry_chacha20_aarch64_blocks(u32 *state, const byte *in, + byte *out, + size_t bytes) ASM_FUNC_ABI; + +#endif /* USE_AARCH64_SIMD */ + static void chacha20_setiv (void *context, const byte * iv, size_t ivlen); static const char *selftest (void); @@ -406,6 +424,10 @@ chacha20_do_setkey (CHACHA20_context_t * ctx, if (features & HWF_ARM_NEON) ctx->blocks = _gcry_chacha20_armv7_neon_blocks; #endif +#ifdef USE_AARCH64_SIMD + if (features & HWF_ARM_NEON) + ctx->blocks = _gcry_chacha20_aarch64_blocks; +#endif (void)features; diff --git a/configure.ac b/configure.ac index 66e7cd67..1e6ac9d7 100644 --- a/configure.ac +++ b/configure.ac @@ -2243,6 +2243,10 @@ if test "$found" = "1" ; then GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-ssse3-amd64.lo" GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-avx2-amd64.lo" ;; + aarch64-*-*) + # Build with the assembly implementation + GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-aarch64.lo" + ;; esac if test x"$neonsupport" = xyes ; then From wk at gnupg.org Mon Aug 7 16:52:39 2017 From: wk at gnupg.org (Werner Koch) Date: Mon, 07 Aug 2017 16:52:39 +0200 Subject: [PATCH] Add ARMv8/AArch64 implementation of chacha20 In-Reply-To: <150202138983.15504.16870943748027047649.stgit@localhost.localdomain> (Jussi Kivilinna's message of "Sun, 06 Aug 2017 15:09:49 +0300") References: <150202138983.15504.16870943748027047649.stgit@localhost.localdomain> Message-ID: <8760dzpgzs.fsf@wheatstone.g10code.de> On Sun, 6 Aug 2017 14:09, jussi.kivilinna at iki.fi said: > Patch adds ARMv8/AArch64 SIMD implementation based on public domain > ARMv7/NEON implementation by Andrew Moon at: > https://github.com/floodyberry/chacha-opt Can you please contact the author and ask to clarify the license? I only found this in the README: Public Domain. or MIT This is not sufficient. We need to know who has put this into the PD. There are several MIT licenses. We need to know which one. And also the copyright holder. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From cvs at cvs.gnupg.org Mon Aug 7 19:27:31 2017 From: cvs at cvs.gnupg.org (by Marcus Brinkmann) Date: Mon, 07 Aug 2017 19:27:31 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.0-12-ga7bd2cb Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via a7bd2cbd3eabda88fb3cac5cbc13c21c97a7b315 (commit) from df1e221b3012e96bbffbc7d5fd70836a9ae1cc19 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit a7bd2cbd3eabda88fb3cac5cbc13c21c97a7b315 Author: Marcus Brinkmann Date: Mon Aug 7 19:26:26 2017 +0200 cipher: Add OID for SHA384WithECDSA. * cipher/sha512.c (oid_spec_sha384): Add SHA384WithECDSA. Signed-off-by: Marcus Brinkmann Suggested-by: Sven Fischer GnuPG-bug-id: 3336 diff --git a/cipher/sha512.c b/cipher/sha512.c index 2ddc485..06e8a2b 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -943,6 +943,9 @@ static gcry_md_oid_spec_t oid_spec_sha384[] = /* PKCS#1 sha384WithRSAEncryption */ { "1.2.840.113549.1.1.12" }, + /* SHA384WithECDSA: RFC 7427 (A.3.3.) */ + { "1.2.840.10045.4.3.3" }, + { NULL }, }; ----------------------------------------------------------------------- Summary of changes: cipher/sha512.c | 3 +++ 1 file changed, 3 insertions(+) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From peterglen99 at gmail.com Fri Aug 11 03:51:14 2017 From: peterglen99 at gmail.com (Peter Glen) Date: Thu, 10 Aug 2017 21:51:14 -0400 Subject: Example code glibcrypt encrypt / decrypt Message-ID: Ready to use keygen / asym encryption / asym decryption samples. https://github.com/pglen/glibcrypt_samples -------------- next part -------------- An HTML attachment was scrubbed... URL: From jussi.kivilinna at iki.fi Sat Aug 12 09:11:07 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 12 Aug 2017 10:11:07 +0300 Subject: [PATCH] Add ARMv8/AArch64 implementation of chacha20 In-Reply-To: <8760dzpgzs.fsf@wheatstone.g10code.de> References: <150202138983.15504.16870943748027047649.stgit@localhost.localdomain> <8760dzpgzs.fsf@wheatstone.g10code.de> Message-ID: <2a8e1f8d-a04a-4770-24f1-665a2b9e21a2@iki.fi> On 07.08.2017 17:52, Werner Koch wrote: > On Sun, 6 Aug 2017 14:09, jussi.kivilinna at iki.fi said: > >> Patch adds ARMv8/AArch64 SIMD implementation based on public domain >> ARMv7/NEON implementation by Andrew Moon at: >> https://github.com/floodyberry/chacha-opt > > Can you please contact the author and ask to clarify the license? I > only found this in the README: > > Public Domain. or MIT > > This is not sufficient. We need to know who has put this into the PD. > There are several MIT licenses. We need to know which one. And also > the copyright holder. > I've sent author e-mail on this issue, and now waiting for reply. -Jussi From dkg at fifthhorseman.net Tue Aug 15 02:15:22 2017 From: dkg at fifthhorseman.net (Daniel Kahn Gillmor) Date: Mon, 14 Aug 2017 20:15:22 -0400 Subject: Example code glibcrypt encrypt / decrypt In-Reply-To: References: Message-ID: <8760dpk7ol.fsf@fifthhorseman.net> On Thu 2017-08-10 21:51:14 -0400, Peter Glen wrote: > Ready to use keygen / asym encryption / asym decryption samples. > > https://github.com/pglen/glibcrypt_samples is "glibcrypt" intended to refer to "libgcrypt"? if so, that's pretty confusing, and doesn't give the casual user (or a search engine) a lot of confidence about the relevance or quality of the code samples. If the code samples (or the discussions around them) are useful and informative, maybe you'd like to submit comparable changes to the gcrypt project directly? It'd be great to have improvements to the standard documentation. You can fetch the gcrypt sources with: git clone https://dev.gnupg.org/source/libgcrypt.git and then read/edit the source for the standard manual in libgcrypt/doc/gcrypt.texi if you make improvments, please record them as git commits (see the other git commits in that repo for the preferred commit style) and either send them as patches to this mailing list (e.g. with "git send-email") or post them to the bugtracker at https://dev.gnupg.org/ All the best, --dkg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ahmad at a3f.at Fri Aug 18 03:10:44 2017 From: ahmad at a3f.at (Ahmad Fatoum) Date: Fri, 18 Aug 2017 03:10:44 +0200 Subject: gcry_sexp_nth_data and gcry_pk_decrypt Message-ID: <19527F2F-2F8A-4FB6-A57B-D501B0E48721@a3f.at> Hello everyone, I am wondering whether following snippet is correct: rc = gcry_sexp_build(&s_data, NULL, "(enc-val(rsa(a%b)))", (int)len, (char*)data); assert(rc == 0); rc = gcry_pk_decrypt(&s_plain, s_data, pk); assert(rc == 0); const char *decr = gcry_sexp_nth_data(s_plain, 0, &decr_len); /* do something with the plaintext in decr */ It seems to work, but there has been doubt whether accessing the plaintext with gcry_sexp_nth_data and index 0 is always guaranteed to work (No endianness issues for example): https://code.wireshark.org/review/#/c/23052/ Help would be appreciated. Thanks. Cheers, Ahmad -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP URL: From ametzler at bebt.de Sat Aug 19 15:48:34 2017 From: ametzler at bebt.de (Andreas Metzler) Date: Sat, 19 Aug 2017 15:48:34 +0200 Subject: 1.8.0 testsuite error on PowerPC* In-Reply-To: <20170721172854.bhpy3onzhsrkybsc@argenau.bebt.de> References: <20170720161220.zsj4qvxhkijxj2r2@argenau.bebt.de> <87efta1gjy.fsf@iwagami.gniibe.org> <20170721172854.bhpy3onzhsrkybsc@argenau.bebt.de> Message-ID: <20170819134834.b4u2uflj5qsaoaz6@argenau.bebt.de> On 2017-07-21 Andreas Metzler wrote: > On 2017-07-21 NIIBE Yutaka wrote: >> Andreas Metzler wrote: >>> on many (all?) PowerPC variants gcrypt 1.8.0 FTBFS with >>> t-secmem: allocation did not fail as expected >> I think that this is due to the page size of PowerPC. >> Is it larger than 16K? > [...] > Indeed it is: > ametzler at partch:~$ getconf PAGESIZE > 65536 Hello, Fedora is using the attached patch which works for me on linux/powerpc. Adding a check for POSIX avoids breaking compilation on mingw. cu Andreas -- `What a good friend you are to him, Dr. Maturin. His other friends are so grateful to you.' `I sew his ears on from time to time, sure' -------------- next part -------------- A non-text attachment was scrubbed... Name: fedora.patch Type: text/x-diff Size: 993 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fedora+checkforPOSIX.diff Type: text/x-diff Size: 885 bytes Desc: not available URL: From cvs at cvs.gnupg.org Sun Aug 27 09:40:13 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Sun, 27 Aug 2017 09:40:13 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.0-16-geb8f352 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via eb8f35243916132e10125e9e9edb066e8f1edd08 (commit) via 80fd8615048c3897b91a315cca22ab139b056ccd (commit) via bf76acbf0da6b0f245e491bec12c0f0a1b5be7c9 (commit) via 5417a29336426d310c3e012b148bcb20ef9ca85c (commit) from a7bd2cbd3eabda88fb3cac5cbc13c21c97a7b315 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit eb8f35243916132e10125e9e9edb066e8f1edd08 Author: Werner Koch Date: Sun Aug 27 09:36:37 2017 +0200 Post release updates -- diff --git a/NEWS b/NEWS index 39f70a3..8ae0d12 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,7 @@ +Noteworthy changes in version 1.8.2 (unreleased) [C22/A2/R2] +------------------------------------------------ + + Noteworthy changes in version 1.8.1 (2017-08-27) [C22/A2/R1] ------------------------------------------------ diff --git a/configure.ac b/configure.ac index 7a78e30..e24e710 100644 --- a/configure.ac +++ b/configure.ac @@ -30,7 +30,7 @@ min_automake_version="1.14" # for the LT versions. m4_define(mym4_version_major, [1]) m4_define(mym4_version_minor, [8]) -m4_define(mym4_version_micro, [1]) +m4_define(mym4_version_micro, [2]) # Below is m4 magic to extract and compute the revision number, the # decimalized short revision number, a beta version string, and a flag commit 80fd8615048c3897b91a315cca22ab139b056ccd Author: Werner Koch Date: Sun Aug 27 09:22:09 2017 +0200 Release 1.8.1 * configure.ac: Set LT version to C22/A2/R1. Signed-off-by: Werner Koch diff --git a/NEWS b/NEWS index 4ca8bc2..39f70a3 100644 --- a/NEWS +++ b/NEWS @@ -1,6 +1,19 @@ -Noteworthy changes in version 1.8.1 (unreleased) [C22/A2/R_] +Noteworthy changes in version 1.8.1 (2017-08-27) [C22/A2/R1] ------------------------------------------------ + * Bug fixes: + + - Mitigate a local side-channel attack on Curve25519 dubbed "May + the Fourth be With You". [CVE-2017-0379] [also in 1.7.9] + + - Add more extra bytes to the pool after reading a seed file. + + - Add the OID SHA384WithECDSA from RFC-7427 to SHA-384. + + - Fix build problems with the Jitter RNG + + - Fix assembler code build problems on Rasbian (ARMv8/AArch32-CE). + Noteworthy changes in version 1.8.0 (2017-07-18) [C22/A2/R0] ------------------------------------------------ diff --git a/configure.ac b/configure.ac index 66e7cd6..7a78e30 100644 --- a/configure.ac +++ b/configure.ac @@ -56,7 +56,7 @@ AC_INIT([libgcrypt],[mym4_full_version],[http://bugs.gnupg.org]) # (No interfaces changed: REVISION++) LIBGCRYPT_LT_CURRENT=22 LIBGCRYPT_LT_AGE=2 -LIBGCRYPT_LT_REVISION=0 +LIBGCRYPT_LT_REVISION=1 # If the API is changed in an incompatible way: increment the next counter. commit bf76acbf0da6b0f245e491bec12c0f0a1b5be7c9 Author: NIIBE Yutaka Date: Fri Aug 25 18:13:28 2017 +0900 ecc: Add input validation for X25519. * cipher/ecc.c (ecc_decrypt_raw): Add input validation. * mpi/ec.c (ec_p_init): Use scratch buffer for bad points. (_gcry_mpi_ec_bad_point): New. -- Following is the paper describing the attack: May the Fourth Be With You: A Microarchitectural Side Channel Attack on Real-World Applications of Curve25519 by Daniel Genkin, Luke Valenta, and Yuval Yarom In the current implementation, we do output checking and it results an error for those bad points. However, when attacked, the computation will done with leak of private key, even it will results errors. To mitigate leak, we added input validation. Note that we only list bad points with MSB=0. By X25519, MSB is always cleared. In future, we should implement constant-time field computation. Then, this input validation could be removed, if performance is important and we are sure for no leak. CVE-id: CVE-2017-0379 Signed-off-by: NIIBE Yutaka diff --git a/cipher/ecc.c b/cipher/ecc.c index e25bf09..4e3e5b1 100644 --- a/cipher/ecc.c +++ b/cipher/ecc.c @@ -1628,9 +1628,22 @@ ecc_decrypt_raw (gcry_sexp_t *r_plain, gcry_sexp_t s_data, gcry_sexp_t keyparms) if (DBG_CIPHER) log_printpnt ("ecc_decrypt kG", &kG, NULL); - if (!(flags & PUBKEY_FLAG_DJB_TWEAK) + if ((flags & PUBKEY_FLAG_DJB_TWEAK)) + { /* For X25519, by its definition, validation should not be done. */ - && !_gcry_mpi_ec_curve_point (&kG, ec)) + /* (Instead, we do output check.) + * + * However, to mitigate secret key leak from our implementation, + * we also do input validation here. For constant-time + * implementation, we can remove this input validation. + */ + if (_gcry_mpi_ec_bad_point (&kG, ec)) + { + rc = GPG_ERR_INV_DATA; + goto leave; + } + } + else if (!_gcry_mpi_ec_curve_point (&kG, ec)) { rc = GPG_ERR_INV_DATA; goto leave; diff --git a/mpi/ec.c b/mpi/ec.c index a0f7357..4c16603 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -396,6 +396,29 @@ ec_get_two_inv_p (mpi_ec_t ec) } +static const char *curve25519_bad_points[] = { + "0x0000000000000000000000000000000000000000000000000000000000000000", + "0x0000000000000000000000000000000000000000000000000000000000000001", + "0x00b8495f16056286fdb1329ceb8d09da6ac49ff1fae35616aeb8413b7c7aebe0", + "0x57119fd0dd4e22d8868e1c58c45c44045bef839c55b1d0b1248c50a3bc959c5f", + "0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffec", + "0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffed", + "0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffee", + NULL +}; + +static gcry_mpi_t +scanval (const char *string) +{ + gpg_err_code_t rc; + gcry_mpi_t val; + + rc = _gcry_mpi_scan (&val, GCRYMPI_FMT_HEX, string, 0, NULL); + if (rc) + log_fatal ("scanning ECC parameter failed: %s\n", gpg_strerror (rc)); + return val; +} + /* This function initialized a context for elliptic curve based on the field GF(p). P is the prime specifying this field, A is the first @@ -434,9 +457,17 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, _gcry_mpi_ec_get_reset (ctx); - /* Allocate scratch variables. */ - for (i=0; i< DIM(ctx->t.scratch); i++) - ctx->t.scratch[i] = mpi_alloc_like (ctx->p); + if (model == MPI_EC_MONTGOMERY) + { + for (i=0; i< DIM(ctx->t.scratch) && curve25519_bad_points[i]; i++) + ctx->t.scratch[i] = scanval (curve25519_bad_points[i]); + } + else + { + /* Allocate scratch variables. */ + for (i=0; i< DIM(ctx->t.scratch); i++) + ctx->t.scratch[i] = mpi_alloc_like (ctx->p); + } /* Prepare for fast reduction. */ /* FIXME: need a test for NIST values. However it does not gain us @@ -1572,3 +1603,17 @@ _gcry_mpi_ec_curve_point (gcry_mpi_point_t point, mpi_ec_t ctx) return res; } + + +int +_gcry_mpi_ec_bad_point (gcry_mpi_point_t point, mpi_ec_t ctx) +{ + int i; + gcry_mpi_t x_bad; + + for (i = 0; (x_bad = ctx->t.scratch[i]); i++) + if (!mpi_cmp (point->x, x_bad)) + return 1; + + return 0; +} diff --git a/src/mpi.h b/src/mpi.h index b5385b5..aeba7f8 100644 --- a/src/mpi.h +++ b/src/mpi.h @@ -296,6 +296,7 @@ void _gcry_mpi_ec_mul_point (mpi_point_t result, gcry_mpi_t scalar, mpi_point_t point, mpi_ec_t ctx); int _gcry_mpi_ec_curve_point (gcry_mpi_point_t point, mpi_ec_t ctx); +int _gcry_mpi_ec_bad_point (gcry_mpi_point_t point, mpi_ec_t ctx); gcry_mpi_t _gcry_mpi_ec_ec2os (gcry_mpi_point_t point, mpi_ec_t ectx); commit 5417a29336426d310c3e012b148bcb20ef9ca85c Author: Werner Koch Date: Thu Aug 24 11:43:05 2017 +0200 indent: Typo fix. -- diff --git a/random/random-csprng.c b/random/random-csprng.c index 650c438..8cb35e7 100644 --- a/random/random-csprng.c +++ b/random/random-csprng.c @@ -115,7 +115,7 @@ static size_t pool_writepos; static size_t pool_readpos; /* This flag is set to true as soon as the pool has been completely - filled the first time. This may happen either by rereading a seed + filled the first time. This may happen either by reading a seed file or by adding enough entropy. */ static int pool_filled; ----------------------------------------------------------------------- Summary of changes: NEWS | 19 ++++++++++++++++++- cipher/ecc.c | 17 +++++++++++++++-- configure.ac | 4 ++-- mpi/ec.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++--- random/random-csprng.c | 2 +- src/mpi.h | 1 + 6 files changed, 85 insertions(+), 9 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Sun Aug 27 10:12:30 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Sun, 27 Aug 2017 10:12:30 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-2-g566c8ef Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 566c8efd585ce6941449c76da13eae597dbabddb (commit) from eb8f35243916132e10125e9e9edb066e8f1edd08 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 566c8efd585ce6941449c76da13eae597dbabddb Author: Werner Koch Date: Sun Aug 27 10:08:58 2017 +0200 Prepare for the 1.9 branch -- We need to bump the LT Age even if there won't be compatible interface change. This is so that we can keep on updating the Revision in the 1.8 branch. Signed-off-by: Werner Koch diff --git a/NEWS b/NEWS index 8ae0d12..3e07a94 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,4 @@ -Noteworthy changes in version 1.8.2 (unreleased) [C22/A2/R2] +Noteworthy changes in version 1.9.0 (unreleased) [C22/A3/R0] ------------------------------------------------ diff --git a/configure.ac b/configure.ac index e24e710..52e0f5e 100644 --- a/configure.ac +++ b/configure.ac @@ -29,8 +29,8 @@ min_automake_version="1.14" # commit and push so that the git magic is able to work. See below # for the LT versions. m4_define(mym4_version_major, [1]) -m4_define(mym4_version_minor, [8]) -m4_define(mym4_version_micro, [2]) +m4_define(mym4_version_minor, [9]) +m4_define(mym4_version_micro, [0]) # Below is m4 magic to extract and compute the revision number, the # decimalized short revision number, a beta version string, and a flag @@ -55,8 +55,8 @@ AC_INIT([libgcrypt],[mym4_full_version],[http://bugs.gnupg.org]) # (Interfaces added: CURRENT++, AGE++, REVISION=0) # (No interfaces changed: REVISION++) LIBGCRYPT_LT_CURRENT=22 -LIBGCRYPT_LT_AGE=2 -LIBGCRYPT_LT_REVISION=1 +LIBGCRYPT_LT_AGE=3 +LIBGCRYPT_LT_REVISION=0 # If the API is changed in an incompatible way: increment the next counter. ----------------------------------------------------------------------- Summary of changes: NEWS | 2 +- configure.ac | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Sun Aug 27 10:17:24 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Sun, 27 Aug 2017 10:17:24 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-3-g52af575 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 52af575ae4d6961edf459d5ba7f7a8057ed4cb80 (commit) from 566c8efd585ce6941449c76da13eae597dbabddb (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 52af575ae4d6961edf459d5ba7f7a8057ed4cb80 Author: Werner Koch Date: Sun Aug 27 10:13:53 2017 +0200 Also bump the LT Current value. -- diff --git a/configure.ac b/configure.ac index 52e0f5e..a2ac9ce 100644 --- a/configure.ac +++ b/configure.ac @@ -54,7 +54,7 @@ AC_INIT([libgcrypt],[mym4_full_version],[http://bugs.gnupg.org]) # (Interfaces removed: CURRENT++, AGE=0, REVISION=0) # (Interfaces added: CURRENT++, AGE++, REVISION=0) # (No interfaces changed: REVISION++) -LIBGCRYPT_LT_CURRENT=22 +LIBGCRYPT_LT_CURRENT=23 LIBGCRYPT_LT_AGE=3 LIBGCRYPT_LT_REVISION=0 ----------------------------------------------------------------------- Summary of changes: configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Tue Aug 29 03:14:54 2017 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Tue, 29 Aug 2017 03:14:54 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-10-g1d5f726 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 1d5f726668b9cc32d6bb601f2329987058146c6c (commit) via fab712d654b2ccd24696ed90bc239860a128ad5b (commit) via 1ac3d3637dd80013b78e03b9b9f582091710d908 (commit) via e9be23c4ad9f42c9d3198c706f912b7e27f574bc (commit) via 449459a2770d3aecb1f36502bf1903e0cbd2873e (commit) via 9ed0fb37bd637d1a2e9498c24097cfeadec682ec (commit) via d4cd381defe5b37dda19bbda0986bdd38065bd31 (commit) from 52af575ae4d6961edf459d5ba7f7a8057ed4cb80 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 1d5f726668b9cc32d6bb601f2329987058146c6c Author: NIIBE Yutaka Date: Wed Aug 23 13:03:07 2017 +0900 ecc: Fix ec_mulm_25519. * mpi/ec.c (ec_mulm_25519): Improve reduction to 25519. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index ffdf3d1..88e2fab 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -455,13 +455,10 @@ ec_mulm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) m[LIMB_SIZE_25519] += cy; memset (m, 0, wsize * BYTES_PER_MPI_LIMB); - m[0] = m[LIMB_SIZE_25519] * 2 * 19; - cy = _gcry_mpih_add_n (wp, wp, m, wsize); - msb = (wp[LIMB_SIZE_25519-1] >> (255 % BITS_PER_MPI_LIMB)); - m[0] = (cy * 2 + msb) * 19; - _gcry_mpih_add_n (wp, wp, m, wsize); + m[0] = (m[LIMB_SIZE_25519] * 2 + msb) * 19; wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); + _gcry_mpih_add_n (wp, wp, m, wsize); m[0] = 0; cy = _gcry_mpih_sub_n (wp, wp, ctx->p->d, wsize); commit fab712d654b2ccd24696ed90bc239860a128ad5b Author: NIIBE Yutaka Date: Wed Aug 23 12:46:20 2017 +0900 ecc: Use 25519 method also for ed25519. * cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Don't use mpi_add since it resizes to have more limbs. * mpi/ec.c (point_resize): Fix for Edwards curve. (ec_p_init): Support Edwards curve. (_gcry_mpi_ec_get_affine): Use the methods. (dup_point_edwards, add_points_edwards, sub_points_edwards): Ditto. (_gcry_mpi_ec_mul_point): Resize MPIs of point to fixed size. (_gcry_mpi_ec_curve_point): Use the methods. Signed-off-by: NIIBE Yutaka diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 95c4510..ee99262 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -21,7 +21,7 @@ # Need to include ../src in addition to top_srcdir because gcrypt.h is # a built header. -AM_CPPFLAGS = -I../src -I$(top_srcdir)/src +AM_CPPFLAGS = -I../src -I$(top_srcdir)/src -I../mpi -I$(top_srcdir)/mpi AM_CFLAGS = $(GPG_ERROR_CFLAGS) AM_CCASFLAGS = $(NOEXECSTACK_FLAGS) diff --git a/cipher/ecc-curves.c b/cipher/ecc-curves.c index 3488ed3..86d0b4e 100644 --- a/cipher/ecc-curves.c +++ b/cipher/ecc-curves.c @@ -26,6 +26,7 @@ #include "g10lib.h" #include "mpi.h" +#include "mpi-internal.h" #include "cipher.h" #include "context.h" #include "ec-context.h" @@ -563,13 +564,25 @@ _gcry_ecc_fill_in_curve (unsigned int nbits, const char *name, { curve->a = scanval (domain_parms[idx].a); if (curve->a->sign) - mpi_add (curve->a, curve->p, curve->a); + { + mpi_resize (curve->a, curve->p->nlimbs); + _gcry_mpih_sub_n (curve->a->d, curve->p->d, + curve->a->d, curve->p->nlimbs); + curve->a->nlimbs = curve->p->nlimbs; + curve->a->sign = 0; + } } if (!curve->b) { curve->b = scanval (domain_parms[idx].b); if (curve->b->sign) - mpi_add (curve->b, curve->p, curve->b); + { + mpi_resize (curve->b, curve->p->nlimbs); + _gcry_mpih_sub_n (curve->b->d, curve->p->d, + curve->b->d, curve->p->nlimbs); + curve->b->nlimbs = curve->p->nlimbs; + curve->b->sign = 0; + } } if (!curve->n) curve->n = scanval (domain_parms[idx].n); diff --git a/mpi/ec.c b/mpi/ec.c index a47e223..ffdf3d1 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -156,28 +156,17 @@ _gcry_mpi_point_copy (gcry_mpi_point_t point) static void point_resize (mpi_point_t p, mpi_ec_t ctx) { - size_t nlimbs; + size_t nlimbs = ctx->p->nlimbs; - if (ctx->model == MPI_EC_MONTGOMERY) - { - nlimbs = ctx->p->nlimbs; + mpi_resize (p->x, nlimbs); + p->x->nlimbs = nlimbs; + mpi_resize (p->z, nlimbs); + p->z->nlimbs = nlimbs; - mpi_resize (p->x, nlimbs); - mpi_resize (p->z, nlimbs); - p->x->nlimbs = nlimbs; - p->z->nlimbs = nlimbs; - } - else + if (ctx->model != MPI_EC_MONTGOMERY) { - /* - * For now, we allocate enough limbs for our EC computation of ec_*. - * Once we will improve ec_* to be constant size (and constant - * time), NLIMBS can be ctx->p->nlimbs. - */ - nlimbs = 2*ctx->p->nlimbs+1; - mpi_resize (p->x, nlimbs); mpi_resize (p->y, nlimbs); - mpi_resize (p->z, nlimbs); + p->y->nlimbs = nlimbs; } } @@ -657,6 +646,13 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, mpi_resize (ctx->a, ctx->p->nlimbs); ctx->a->nlimbs = ctx->p->nlimbs; + + mpi_resize (ctx->b, ctx->p->nlimbs); + ctx->b->nlimbs = ctx->p->nlimbs; + + for (i=0; i< DIM(ctx->t.scratch); i++) + ctx->t.scratch[i]->nlimbs = ctx->p->nlimbs; + break; } @@ -909,10 +905,21 @@ _gcry_mpi_ec_get_affine (gcry_mpi_t x, gcry_mpi_t y, mpi_point_t point, z = mpi_new (0); ec_invm (z, point->z, ctx); + mpi_resize (z, ctx->p->nlimbs); + z->nlimbs = ctx->p->nlimbs; + if (x) - ec_mulm (x, point->x, z, ctx); + { + mpi_resize (x, ctx->p->nlimbs); + x->nlimbs = ctx->p->nlimbs; + ctx->mulm (x, point->x, z, ctx); + } if (y) - ec_mulm (y, point->y, z, ctx); + { + mpi_resize (y, ctx->p->nlimbs); + y->nlimbs = ctx->p->nlimbs; + ctx->mulm (y, point->y, z, ctx); + } _gcry_mpi_release (z); } @@ -1041,41 +1048,41 @@ dup_point_edwards (mpi_point_t result, mpi_point_t point, mpi_ec_t ctx) /* Compute: (X_3 : Y_3 : Z_3) = 2( X_1 : Y_1 : Z_1 ) */ /* B = (X_1 + Y_1)^2 */ - ec_addm (B, X1, Y1, ctx); - ec_pow2 (B, B, ctx); + ctx->addm (B, X1, Y1, ctx); + ctx->pow2 (B, B, ctx); /* C = X_1^2 */ /* D = Y_1^2 */ - ec_pow2 (C, X1, ctx); - ec_pow2 (D, Y1, ctx); + ctx->pow2 (C, X1, ctx); + ctx->pow2 (D, Y1, ctx); /* E = aC */ if (ctx->dialect == ECC_DIALECT_ED25519) - mpi_sub (E, ctx->p, C); + ctx->subm (E, ctx->p, C, ctx); else - ec_mulm (E, ctx->a, C, ctx); + ctx->mulm (E, ctx->a, C, ctx); /* F = E + D */ - ec_addm (F, E, D, ctx); + ctx->addm (F, E, D, ctx); /* H = Z_1^2 */ - ec_pow2 (H, Z1, ctx); + ctx->pow2 (H, Z1, ctx); /* J = F - 2H */ - ec_mul2 (J, H, ctx); - ec_subm (J, F, J, ctx); + ctx->mul2 (J, H, ctx); + ctx->subm (J, F, J, ctx); /* X_3 = (B - C - D) ? J */ - ec_subm (X3, B, C, ctx); - ec_subm (X3, X3, D, ctx); - ec_mulm (X3, X3, J, ctx); + ctx->subm (X3, B, C, ctx); + ctx->subm (X3, X3, D, ctx); + ctx->mulm (X3, X3, J, ctx); /* Y_3 = F ? (E - D) */ - ec_subm (Y3, E, D, ctx); - ec_mulm (Y3, Y3, F, ctx); + ctx->subm (Y3, E, D, ctx); + ctx->mulm (Y3, Y3, F, ctx); /* Z_3 = F ? J */ - ec_mulm (Z3, F, J, ctx); + ctx->mulm (Z3, F, J, ctx); #undef X1 #undef Y1 @@ -1293,54 +1300,56 @@ add_points_edwards (mpi_point_t result, #define G (ctx->t.scratch[6]) #define tmp (ctx->t.scratch[7]) + point_resize (result, ctx); + /* Compute: (X_3 : Y_3 : Z_3) = (X_1 : Y_1 : Z_1) + (X_2 : Y_2 : Z_3) */ /* A = Z1 ? Z2 */ - ec_mulm (A, Z1, Z2, ctx); + ctx->mulm (A, Z1, Z2, ctx); /* B = A^2 */ - ec_pow2 (B, A, ctx); + ctx->pow2 (B, A, ctx); /* C = X1 ? X2 */ - ec_mulm (C, X1, X2, ctx); + ctx->mulm (C, X1, X2, ctx); /* D = Y1 ? Y2 */ - ec_mulm (D, Y1, Y2, ctx); + ctx->mulm (D, Y1, Y2, ctx); /* E = d ? C ? D */ - ec_mulm (E, ctx->b, C, ctx); - ec_mulm (E, E, D, ctx); + ctx->mulm (E, ctx->b, C, ctx); + ctx->mulm (E, E, D, ctx); /* F = B - E */ - ec_subm (F, B, E, ctx); + ctx->subm (F, B, E, ctx); /* G = B + E */ - ec_addm (G, B, E, ctx); + ctx->addm (G, B, E, ctx); /* X_3 = A ? F ? ((X_1 + Y_1) ? (X_2 + Y_2) - C - D) */ - ec_addm (tmp, X1, Y1, ctx); - ec_addm (X3, X2, Y2, ctx); - ec_mulm (X3, X3, tmp, ctx); - ec_subm (X3, X3, C, ctx); - ec_subm (X3, X3, D, ctx); - ec_mulm (X3, X3, F, ctx); - ec_mulm (X3, X3, A, ctx); + ctx->addm (tmp, X1, Y1, ctx); + ctx->addm (X3, X2, Y2, ctx); + ctx->mulm (X3, X3, tmp, ctx); + ctx->subm (X3, X3, C, ctx); + ctx->subm (X3, X3, D, ctx); + ctx->mulm (X3, X3, F, ctx); + ctx->mulm (X3, X3, A, ctx); /* Y_3 = A ? G ? (D - aC) */ if (ctx->dialect == ECC_DIALECT_ED25519) { - ec_addm (Y3, D, C, ctx); + ctx->addm (Y3, D, C, ctx); } else { - ec_mulm (Y3, ctx->a, C, ctx); - ec_subm (Y3, D, Y3, ctx); + ctx->mulm (Y3, ctx->a, C, ctx); + ctx->subm (Y3, D, Y3, ctx); } - ec_mulm (Y3, Y3, G, ctx); - ec_mulm (Y3, Y3, A, ctx); + ctx->mulm (Y3, Y3, G, ctx); + ctx->mulm (Y3, Y3, A, ctx); /* Z_3 = F ? G */ - ec_mulm (Z3, F, G, ctx); + ctx->mulm (Z3, F, G, ctx); #undef X1 @@ -1451,7 +1460,7 @@ sub_points_edwards (mpi_point_t result, { mpi_point_t p2i = _gcry_mpi_point_new (0); point_set (p2i, p2); - mpi_sub (p2i->x, ctx->p, p2i->x); + ctx->subm (p2i->x, ctx->p, p2i->x, ctx); add_points_edwards (result, p1, p2i, ctx); _gcry_mpi_point_release (p2i); } @@ -1515,6 +1524,7 @@ _gcry_mpi_ec_mul_point (mpi_point_t result, mpi_set_ui (result->x, 0); mpi_set_ui (result->y, 1); mpi_set_ui (result->z, 1); + point_resize (point, ctx); } if (mpi_is_secure (scalar)) @@ -1536,6 +1546,12 @@ _gcry_mpi_ec_mul_point (mpi_point_t result, } else { + if (ctx->model == MPI_EC_EDWARDS) + { + point_resize (result, ctx); + point_resize (point, ctx); + } + for (j=nbits-1; j >= 0; j--) { _gcry_mpi_ec_dup_point (result, result, ctx); @@ -1778,19 +1794,21 @@ _gcry_mpi_ec_curve_point (gcry_mpi_point_t point, mpi_ec_t ctx) if (_gcry_mpi_ec_get_affine (x, y, point, ctx)) goto leave; + mpi_resize (w, ctx->p->nlimbs); + w->nlimbs = ctx->p->nlimbs; + /* a ? x^2 + y^2 - 1 - b ? x^2 ? y^2 == 0 */ - ec_pow2 (x, x, ctx); - ec_pow2 (y, y, ctx); + ctx->pow2 (x, x, ctx); + ctx->pow2 (y, y, ctx); if (ctx->dialect == ECC_DIALECT_ED25519) - mpi_sub (w, ctx->p, x); + ctx->subm (w, ctx->p, x, ctx); else - ec_mulm (w, ctx->a, x, ctx); - ec_addm (w, w, y, ctx); - ec_subm (w, w, mpi_const (MPI_C_ONE), ctx); - ec_mulm (x, x, y, ctx); - ec_mulm (x, x, ctx->b, ctx); - ec_subm (w, w, x, ctx); - if (!mpi_cmp_ui (w, 0)) + ctx->mulm (w, ctx->a, x, ctx); + ctx->addm (w, w, y, ctx); + ctx->mulm (x, x, y, ctx); + ctx->mulm (x, x, ctx->b, ctx); + ctx->subm (w, w, x, ctx); + if (!mpi_cmp_ui (w, 1)) res = 1; } break; commit 1ac3d3637dd80013b78e03b9b9f582091710d908 Author: NIIBE Yutaka Date: Wed Aug 23 12:43:38 2017 +0900 ecc: Clean up curve specific method support. * src/ec-context.h (struct mpi_ec_ctx_s): Remove MOD method. * mpi/ec.c (ec_mod_25519): Remove. (ec_p_init): Follow the removal of the MOD method. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index 06536be..a47e223 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -380,12 +380,6 @@ mpih_set_cond (mpi_ptr_t wp, mpi_ptr_t up, mpi_size_t usize, unsigned long set) /* Routines for 2^255 - 19. */ -static void -ec_mod_25519 (gcry_mpi_t w, mpi_ec_t ec) -{ - _gcry_mpi_mod (w, w, ec->p); -} - #define LIMB_SIZE_25519 ((256+BITS_PER_MPI_LIMB-1)/BITS_PER_MPI_LIMB) static void @@ -502,7 +496,6 @@ struct field_table { const char *p; /* computation routines for the field. */ - void (* mod) (gcry_mpi_t w, mpi_ec_t ctx); void (* addm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); void (* subm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); void (* mulm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); @@ -513,14 +506,13 @@ struct field_table { static const struct field_table field_table[] = { { "0x7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFED", - ec_mod_25519, ec_addm_25519, ec_subm_25519, ec_mulm_25519, ec_mul2_25519, ec_pow2_25519 }, - { NULL, NULL, NULL, NULL, NULL, NULL, NULL }, + { NULL, NULL, NULL, NULL, NULL, NULL }, }; /* Force recomputation of all helper variables. */ @@ -639,7 +631,6 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, ctx->t.scratch[i] = mpi_alloc_like (ctx->p); } - ctx->mod = ec_mod; ctx->addm = ec_addm; ctx->subm = ec_subm; ctx->mulm = ec_mulm; @@ -657,7 +648,6 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, if (!mpi_cmp (p, f_p)) { - ctx->mod = field_table[i].mod; ctx->addm = field_table[i].addm; ctx->subm = field_table[i].subm; ctx->mulm = field_table[i].mulm; diff --git a/src/ec-context.h b/src/ec-context.h index 18b26a5..e48ef6f 100644 --- a/src/ec-context.h +++ b/src/ec-context.h @@ -68,7 +68,6 @@ struct mpi_ec_ctx_s } t; /* Curve specific computation routines for the field. */ - void (* mod) (gcry_mpi_t w, mpi_ec_t ec); void (* addm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); void (* subm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ec); void (* mulm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); commit e9be23c4ad9f42c9d3198c706f912b7e27f574bc Author: NIIBE Yutaka Date: Wed Aug 23 11:11:17 2017 +0900 ecc: Relax condition for 25519 computations. * mpi/ec.c (ec_addm_25519, ec_subm_25519, ec_mulm_25519): Check number of limbs, allocated more is OK. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index b0eed97..06536be 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -396,7 +396,7 @@ ec_addm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) mpi_limb_t n[LIMB_SIZE_25519]; mpi_limb_t borrow; - if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + if (w->nlimbs != wsize || u->nlimbs != wsize || v->nlimbs != wsize) log_bug ("addm_25519: different sizes\n"); memset (n, 0, sizeof n); @@ -419,7 +419,7 @@ ec_subm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) mpi_limb_t n[LIMB_SIZE_25519]; mpi_limb_t borrow; - if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + if (w->nlimbs != wsize || u->nlimbs != wsize || v->nlimbs != wsize) log_bug ("subm_25519: different sizes\n"); memset (n, 0, sizeof n); @@ -444,7 +444,7 @@ ec_mulm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) int msb; (void)ctx; - if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + if (w->nlimbs != wsize || u->nlimbs != wsize || v->nlimbs != wsize) log_bug ("mulm_25519: different sizes\n"); up = u->d; commit 449459a2770d3aecb1f36502bf1903e0cbd2873e Author: NIIBE Yutaka Date: Wed Aug 23 10:22:21 2017 +0900 ecc: Fix ec_mulm_25519. * mpi/ec.c (ec_mulm_25519): Fix the cases of 0 to 18. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index d51be20..b0eed97 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -479,6 +479,11 @@ ec_mulm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) m[0] = (cy * 2 + msb) * 19; _gcry_mpih_add_n (wp, wp, m, wsize); wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); + + m[0] = 0; + cy = _gcry_mpih_sub_n (wp, wp, ctx->p->d, wsize); + mpih_set_cond (m, ctx->p->d, wsize, (cy != 0UL)); + _gcry_mpih_add_n (wp, wp, m, wsize); } static void commit 9ed0fb37bd637d1a2e9498c24097cfeadec682ec Author: NIIBE Yutaka Date: Wed Aug 23 08:48:53 2017 +0900 ecc: field specific routines for 25519. * mpi/ec.c (point_resize): Improve for X25519. (mpih_set_cond): New. (ec_mod_25519, ec_addm_25519, ec_subm_25519, ec_mulm_25519) (ec_mul2_25519, ec_pow2_25519): New. (ec_p_init): Fill by FIELD_TABLE. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index 74ee11d..d51be20 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -156,17 +156,29 @@ _gcry_mpi_point_copy (gcry_mpi_point_t point) static void point_resize (mpi_point_t p, mpi_ec_t ctx) { - /* - * For now, we allocate enough limbs for our EC computation of ec_*. - * Once we will improve ec_* to be constant size (and constant - * time), NLIMBS can be ctx->p->nlimbs. - */ - size_t nlimbs = 2*ctx->p->nlimbs+1; - - mpi_resize (p->x, nlimbs); - if (ctx->model != MPI_EC_MONTGOMERY) - mpi_resize (p->y, nlimbs); - mpi_resize (p->z, nlimbs); + size_t nlimbs; + + if (ctx->model == MPI_EC_MONTGOMERY) + { + nlimbs = ctx->p->nlimbs; + + mpi_resize (p->x, nlimbs); + mpi_resize (p->z, nlimbs); + p->x->nlimbs = nlimbs; + p->z->nlimbs = nlimbs; + } + else + { + /* + * For now, we allocate enough limbs for our EC computation of ec_*. + * Once we will improve ec_* to be constant size (and constant + * time), NLIMBS can be ctx->p->nlimbs. + */ + nlimbs = 2*ctx->p->nlimbs+1; + mpi_resize (p->x, nlimbs); + mpi_resize (p->y, nlimbs); + mpi_resize (p->z, nlimbs); + } } @@ -351,8 +363,161 @@ ec_invm (gcry_mpi_t x, gcry_mpi_t a, mpi_ec_t ctx) log_mpidump (" p", ctx->p); } } + +static void +mpih_set_cond (mpi_ptr_t wp, mpi_ptr_t up, mpi_size_t usize, unsigned long set) +{ + mpi_size_t i; + mpi_limb_t mask = ((mpi_limb_t)0) - set; + mpi_limb_t x; + + for (i = 0; i < usize; i++) + { + x = mask & (wp[i] ^ up[i]); + wp[i] = wp[i] ^ x; + } +} + +/* Routines for 2^255 - 19. */ + +static void +ec_mod_25519 (gcry_mpi_t w, mpi_ec_t ec) +{ + _gcry_mpi_mod (w, w, ec->p); +} + +#define LIMB_SIZE_25519 ((256+BITS_PER_MPI_LIMB-1)/BITS_PER_MPI_LIMB) + +static void +ec_addm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) +{ + mpi_ptr_t wp, up, vp; + mpi_size_t wsize = LIMB_SIZE_25519; + mpi_limb_t n[LIMB_SIZE_25519]; + mpi_limb_t borrow; + + if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + log_bug ("addm_25519: different sizes\n"); + + memset (n, 0, sizeof n); + up = u->d; + vp = v->d; + wp = w->d; + + _gcry_mpih_add_n (wp, up, vp, wsize); + borrow = _gcry_mpih_sub_n (wp, wp, ctx->p->d, wsize); + mpih_set_cond (n, ctx->p->d, wsize, (borrow != 0UL)); + _gcry_mpih_add_n (wp, wp, n, wsize); + wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); +} + +static void +ec_subm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) +{ + mpi_ptr_t wp, up, vp; + mpi_size_t wsize = LIMB_SIZE_25519; + mpi_limb_t n[LIMB_SIZE_25519]; + mpi_limb_t borrow; + + if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + log_bug ("subm_25519: different sizes\n"); + + memset (n, 0, sizeof n); + up = u->d; + vp = v->d; + wp = w->d; + + borrow = _gcry_mpih_sub_n (wp, up, vp, wsize); + mpih_set_cond (n, ctx->p->d, wsize, (borrow != 0UL)); + _gcry_mpih_add_n (wp, wp, n, wsize); + wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); +} +static void +ec_mulm_25519 (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx) +{ + mpi_ptr_t wp, up, vp; + mpi_size_t wsize = LIMB_SIZE_25519; + mpi_limb_t n[LIMB_SIZE_25519*2]; + mpi_limb_t m[LIMB_SIZE_25519+1]; + mpi_limb_t cy; + int msb; + + (void)ctx; + if (w->alloced != wsize || u->alloced != wsize || v->alloced != wsize) + log_bug ("mulm_25519: different sizes\n"); + + up = u->d; + vp = v->d; + wp = w->d; + + _gcry_mpih_mul_n (n, up, vp, wsize); + memcpy (wp, n, wsize * BYTES_PER_MPI_LIMB); + wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); + + memcpy (m, n+LIMB_SIZE_25519-1, (wsize+1) * BYTES_PER_MPI_LIMB); + _gcry_mpih_rshift (m, m, LIMB_SIZE_25519+1, (255 % BITS_PER_MPI_LIMB)); + + memcpy (n, m, wsize * BYTES_PER_MPI_LIMB); + cy = _gcry_mpih_lshift (m, m, LIMB_SIZE_25519, 4); + m[LIMB_SIZE_25519] = cy; + cy = _gcry_mpih_add_n (m, m, n, wsize); + m[LIMB_SIZE_25519] += cy; + cy = _gcry_mpih_add_n (m, m, n, wsize); + m[LIMB_SIZE_25519] += cy; + cy = _gcry_mpih_add_n (m, m, n, wsize); + m[LIMB_SIZE_25519] += cy; + + cy = _gcry_mpih_add_n (wp, wp, m, wsize); + m[LIMB_SIZE_25519] += cy; + + memset (m, 0, wsize * BYTES_PER_MPI_LIMB); + m[0] = m[LIMB_SIZE_25519] * 2 * 19; + cy = _gcry_mpih_add_n (wp, wp, m, wsize); + + msb = (wp[LIMB_SIZE_25519-1] >> (255 % BITS_PER_MPI_LIMB)); + m[0] = (cy * 2 + msb) * 19; + _gcry_mpih_add_n (wp, wp, m, wsize); + wp[LIMB_SIZE_25519-1] &= ~(1UL << (255 % BITS_PER_MPI_LIMB)); +} +static void +ec_mul2_25519 (gcry_mpi_t w, gcry_mpi_t u, mpi_ec_t ctx) +{ + ec_addm_25519 (w, u, u, ctx); +} + +static void +ec_pow2_25519 (gcry_mpi_t w, const gcry_mpi_t b, mpi_ec_t ctx) +{ + ec_mulm_25519 (w, b, b, ctx); +} + +struct field_table { + const char *p; + + /* computation routines for the field. */ + void (* mod) (gcry_mpi_t w, mpi_ec_t ctx); + void (* addm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); + void (* subm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); + void (* mulm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); + void (* mul2) (gcry_mpi_t w, gcry_mpi_t u, mpi_ec_t ctx); + void (* pow2) (gcry_mpi_t w, const gcry_mpi_t b, mpi_ec_t ctx); +}; + +static const struct field_table field_table[] = { + { + "0x7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFED", + ec_mod_25519, + ec_addm_25519, + ec_subm_25519, + ec_mulm_25519, + ec_mul2_25519, + ec_pow2_25519 + }, + { NULL, NULL, NULL, NULL, NULL, NULL, NULL }, +}; + /* Force recomputation of all helper variables. */ void _gcry_mpi_ec_get_reset (mpi_ec_t ec) @@ -473,8 +638,35 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, ctx->addm = ec_addm; ctx->subm = ec_subm; ctx->mulm = ec_mulm; - ctx->pow2 = ec_pow2; ctx->mul2 = ec_mul2; + ctx->pow2 = ec_pow2; + + for (i=0; field_table[i].p; i++) + { + gcry_mpi_t f_p; + gpg_err_code_t rc; + + rc = _gcry_mpi_scan (&f_p, GCRYMPI_FMT_HEX, field_table[i].p, 0, NULL); + if (rc) + log_fatal ("scanning ECC parameter failed: %s\n", gpg_strerror (rc)); + + if (!mpi_cmp (p, f_p)) + { + ctx->mod = field_table[i].mod; + ctx->addm = field_table[i].addm; + ctx->subm = field_table[i].subm; + ctx->mulm = field_table[i].mulm; + ctx->mul2 = field_table[i].mul2; + ctx->pow2 = field_table[i].pow2; + _gcry_mpi_release (f_p); + + mpi_resize (ctx->a, ctx->p->nlimbs); + ctx->a->nlimbs = ctx->p->nlimbs; + break; + } + + _gcry_mpi_release (f_p); + } /* Prepare for fast reduction. */ /* FIXME: need a test for NIST values. However it does not gain us @@ -1365,6 +1557,7 @@ _gcry_mpi_ec_mul_point (mpi_point_t result, mpi_point_struct p1_, p2_; mpi_point_t q1, q2, prd, sum; unsigned long sw; + mpi_size_t rsize; /* Compute scalar point multiplication with Montgomery Ladder. Note that we don't use Y-coordinate in the points at all. @@ -1385,6 +1578,9 @@ _gcry_mpi_ec_mul_point (mpi_point_t result, point_resize (&p1_, ctx); point_resize (&p2_, ctx); + mpi_resize (point->x, ctx->p->nlimbs); + point->x->nlimbs = ctx->p->nlimbs; + q1 = &p1; q2 = &p2; prd = &p1_; @@ -1406,7 +1602,9 @@ _gcry_mpi_ec_mul_point (mpi_point_t result, sw = (nbits & 1); point_swap_cond (&p1, &p1_, sw, ctx); - if (p1.z->nlimbs == 0) + rsize = p1.z->nlimbs; + MPN_NORMALIZE (p1.z->d, rsize); + if (rsize == 0) { mpi_set_ui (result->x, 1); mpi_set_ui (result->z, 0); commit d4cd381defe5b37dda19bbda0986bdd38065bd31 Author: NIIBE Yutaka Date: Mon Aug 21 14:32:08 2017 +0900 ecc: Add field specific computation methods. * src/ec-context.h (struct mpi_ec_ctx_s): Add methods. * mpi/ec.c (ec_p_init): Initialize the default methods. (montgomery_ladder): Use the methods. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index 4c16603..74ee11d 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -469,6 +469,13 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, ctx->t.scratch[i] = mpi_alloc_like (ctx->p); } + ctx->mod = ec_mod; + ctx->addm = ec_addm; + ctx->subm = ec_subm; + ctx->mulm = ec_mulm; + ctx->pow2 = ec_pow2; + ctx->mul2 = ec_mul2; + /* Prepare for fast reduction. */ /* FIXME: need a test for NIST values. However it does not gain us any real advantage, for 384 bits it is actually slower than using @@ -1177,24 +1184,24 @@ montgomery_ladder (mpi_point_t prd, mpi_point_t sum, mpi_point_t p1, mpi_point_t p2, gcry_mpi_t dif_x, mpi_ec_t ctx) { - ec_addm (sum->x, p2->x, p2->z, ctx); - ec_subm (p2->z, p2->x, p2->z, ctx); - ec_addm (prd->x, p1->x, p1->z, ctx); - ec_subm (p1->z, p1->x, p1->z, ctx); - ec_mulm (p2->x, p1->z, sum->x, ctx); - ec_mulm (p2->z, prd->x, p2->z, ctx); - ec_pow2 (p1->x, prd->x, ctx); - ec_pow2 (p1->z, p1->z, ctx); - ec_addm (sum->x, p2->x, p2->z, ctx); - ec_subm (p2->z, p2->x, p2->z, ctx); - ec_mulm (prd->x, p1->x, p1->z, ctx); - ec_subm (p1->z, p1->x, p1->z, ctx); - ec_pow2 (sum->x, sum->x, ctx); - ec_pow2 (sum->z, p2->z, ctx); - ec_mulm (prd->z, p1->z, ctx->a, ctx); /* CTX->A: (a-2)/4 */ - ec_mulm (sum->z, sum->z, dif_x, ctx); - ec_addm (prd->z, p1->x, prd->z, ctx); - ec_mulm (prd->z, prd->z, p1->z, ctx); + ctx->addm (sum->x, p2->x, p2->z, ctx); + ctx->subm (p2->z, p2->x, p2->z, ctx); + ctx->addm (prd->x, p1->x, p1->z, ctx); + ctx->subm (p1->z, p1->x, p1->z, ctx); + ctx->mulm (p2->x, p1->z, sum->x, ctx); + ctx->mulm (p2->z, prd->x, p2->z, ctx); + ctx->pow2 (p1->x, prd->x, ctx); + ctx->pow2 (p1->z, p1->z, ctx); + ctx->addm (sum->x, p2->x, p2->z, ctx); + ctx->subm (p2->z, p2->x, p2->z, ctx); + ctx->mulm (prd->x, p1->x, p1->z, ctx); + ctx->subm (p1->z, p1->x, p1->z, ctx); + ctx->pow2 (sum->x, sum->x, ctx); + ctx->pow2 (sum->z, p2->z, ctx); + ctx->mulm (prd->z, p1->z, ctx->a, ctx); /* CTX->A: (a-2)/4 */ + ctx->mulm (sum->z, sum->z, dif_x, ctx); + ctx->addm (prd->z, p1->x, prd->z, ctx); + ctx->mulm (prd->z, prd->z, p1->z, ctx); } diff --git a/src/ec-context.h b/src/ec-context.h index d74fb69..18b26a5 100644 --- a/src/ec-context.h +++ b/src/ec-context.h @@ -66,6 +66,14 @@ struct mpi_ec_ctx_s /* gcry_mpi_t s[10]; */ /* gcry_mpi_t c; */ } t; + + /* Curve specific computation routines for the field. */ + void (* mod) (gcry_mpi_t w, mpi_ec_t ec); + void (* addm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); + void (* subm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ec); + void (* mulm) (gcry_mpi_t w, gcry_mpi_t u, gcry_mpi_t v, mpi_ec_t ctx); + void (* pow2) (gcry_mpi_t w, const gcry_mpi_t b, mpi_ec_t ctx); + void (* mul2) (gcry_mpi_t w, gcry_mpi_t u, mpi_ec_t ctx); }; ----------------------------------------------------------------------- Summary of changes: cipher/Makefile.am | 2 +- cipher/ecc-curves.c | 17 ++- mpi/ec.c | 369 +++++++++++++++++++++++++++++++++++++++++----------- src/ec-context.h | 7 + 4 files changed, 315 insertions(+), 80 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Tue Aug 29 03:36:15 2017 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Tue, 29 Aug 2017 03:36:15 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-11-gdb3a8d6 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via db3a8d6890fb4a6436e082b49378c0bd891563ca (commit) from 1d5f726668b9cc32d6bb601f2329987058146c6c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit db3a8d6890fb4a6436e082b49378c0bd891563ca Author: NIIBE Yutaka Date: Tue Aug 29 10:33:08 2017 +0900 ecc: Fix scratch MPI. * mpi/ec.c (ec_p_init): Check if scratch MPI is allocated. Signed-off-by: NIIBE Yutaka diff --git a/mpi/ec.c b/mpi/ec.c index 88e2fab..ca293ca 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -647,7 +647,7 @@ ec_p_init (mpi_ec_t ctx, enum gcry_mpi_ec_models model, mpi_resize (ctx->b, ctx->p->nlimbs); ctx->b->nlimbs = ctx->p->nlimbs; - for (i=0; i< DIM(ctx->t.scratch); i++) + for (i=0; i< DIM(ctx->t.scratch) && ctx->t.scratch[i]; i++) ctx->t.scratch[i]->nlimbs = ctx->p->nlimbs; break; ----------------------------------------------------------------------- Summary of changes: mpi/ec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Tue Aug 29 09:13:48 2017 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Tue, 29 Aug 2017 09:13:48 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-14-ge4dc458 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via e4dc458b0b7dc9b8417a2177ef17822d9b9064ec (commit) via 8126a6717c80d4fc1766d7f975e872bee2f9f203 (commit) via a848ef44470a524c05624afb54b92cf25595acd2 (commit) from db3a8d6890fb4a6436e082b49378c0bd891563ca (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit e4dc458b0b7dc9b8417a2177ef17822d9b9064ec Author: NIIBE Yutaka Date: Tue Aug 29 16:11:42 2017 +0900 Tweak GCC version check. * src/global.c (_gcry_vcontrol): It's GCC 4.2 which started to support diagnostic pragma. Signed-off-by: NIIBE Yutaka diff --git a/src/global.c b/src/global.c index 4e2e274..ad9ab1d 100644 --- a/src/global.c +++ b/src/global.c @@ -705,7 +705,7 @@ _gcry_vcontrol (enum gcry_ctl_cmds cmd, va_list arg_ptr) rc = _gcry_fips_run_selftests (1); break; -#if _GCRY_GCC_VERSION >= 40600 +#if _GCRY_GCC_VERSION >= 40200 # pragma GCC diagnostic push # pragma GCC diagnostic ignored "-Wswitch" #endif @@ -733,7 +733,7 @@ _gcry_vcontrol (enum gcry_ctl_cmds cmd, va_list arg_ptr) case PRIV_CTL_DUMP_SECMEM_STATS: _gcry_secmem_dump_stats (1); break; -#if _GCRY_GCC_VERSION >= 40600 +#if _GCRY_GCC_VERSION >= 40200 # pragma GCC diagnostic pop #endif commit 8126a6717c80d4fc1766d7f975e872bee2f9f203 Author: NIIBE Yutaka Date: Tue Aug 29 16:10:54 2017 +0900 random: Fix warnings on Windows. * random/random-csprng.c (lock_seed_file): Vars with no use. Signed-off-by: NIIBE Yutaka diff --git a/random/random-csprng.c b/random/random-csprng.c index 8cb35e7..b06810a 100644 --- a/random/random-csprng.c +++ b/random/random-csprng.c @@ -704,6 +704,10 @@ lock_seed_file (int fd, const char *fname, int for_write) if (backoff < 10) backoff++ ; } +#else + (void)fd; + (void)fname; + (void)for_write; #endif /*!LOCK_SEED_FILE*/ return 0; } commit a848ef44470a524c05624afb54b92cf25595acd2 Author: NIIBE Yutaka Date: Tue Aug 29 16:09:39 2017 +0900 tests: Fix warnings on Windows. * tests/fipsdrv.c (print_dsa_domain_parameters, print_ecdsa_dq): Fix. Signed-off-by: NIIBE Yutaka diff --git a/tests/fipsdrv.c b/tests/fipsdrv.c index f9d9c45..71554e2 100644 --- a/tests/fipsdrv.c +++ b/tests/fipsdrv.c @@ -1835,7 +1835,7 @@ print_dsa_domain_parameters (gcry_sexp_t key) /* Extract the parameters from the S-expression and print them to stdout. */ for (idx=0; "pqg"[idx]; idx++) { - l2 = gcry_sexp_find_token (l1, "pqg"+idx, 1); + l2 = gcry_sexp_find_token (l1, &"pqg"[idx], 1); if (!l2) die ("no %c parameter in returned public key\n", "pqg"[idx]); mpi = gcry_sexp_nth_mpi (l2, 1, GCRYMPI_FMT_USG); @@ -1923,7 +1923,7 @@ print_ecdsa_dq (gcry_sexp_t key) /* Extract the parameters from the S-expression and print them to stdout. */ for (idx=0; "dq"[idx]; idx++) { - l2 = gcry_sexp_find_token (l1, "dq"+idx, 1); + l2 = gcry_sexp_find_token (l1, &"dq"[idx], 1); if (!l2) die ("no %c parameter in returned public key\n", "dq"[idx]); mpi = gcry_sexp_nth_mpi (l2, 1, GCRYMPI_FMT_USG); ----------------------------------------------------------------------- Summary of changes: random/random-csprng.c | 4 ++++ src/global.c | 4 ++-- tests/fipsdrv.c | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Tue Aug 29 10:41:16 2017 From: wk at gnupg.org (Werner Koch) Date: Tue, 29 Aug 2017 10:41:16 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.8.1-14-ge4dc458 In-Reply-To: (by NIIBE Yutaka's message of "Tue, 29 Aug 2017 09:13:48 +0200") References: Message-ID: <87shgaix5f.fsf@wheatstone.g10code.de> On Tue, 29 Aug 2017 09:13, cvs at cvs.gnupg.org said: > - l2 = gcry_sexp_find_token (l1, "pqg"+idx, 1); > + l2 = gcry_sexp_find_token (l1, &"pqg"[idx], 1); Oh dear, some compiler warnings are annoying and make the code harder to read. At least for those of us who learned C from K&R. SCNR, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: