From cvs at cvs.gnupg.org Thu Oct 2 13:22:50 2014 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 02 Oct 2014 13:22:50 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-115-g1e8b864 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 1e8b86494cf8fa045696bd447b16267ffd1797f0 (commit) from 51dae8c8c4b63bb5e1685cbd8722e35342524737 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 1e8b86494cf8fa045696bd447b16267ffd1797f0 Author: Werner Koch Date: Thu Oct 2 12:51:49 2014 +0200 build: Support SYSROOT based config script finding. * src/libgcrypt.m4: Add support for SYSROOT and set gpg_config_script_warn. Use AC_PATH_PROG instead of AC_PATH_TOOL because the config script is not expected to be installed with a prefix for its name * configure.ac: Print a library mismatch warning. * m4/gpg-error.m4: Update from git master. -- Also fixed the false copyright notice in libgcrypt.m4. diff --git a/configure.ac b/configure.ac index c5952c7..baed3ec 100644 --- a/configure.ac +++ b/configure.ac @@ -2123,9 +2123,17 @@ GCRY_MSG_SHOW([Try using Intel AVX2: ],[$avx2support]) GCRY_MSG_SHOW([Try using ARM NEON: ],[$neonsupport]) GCRY_MSG_SHOW([],[]) -if test "$print_egd_notice" = "yes"; then +if test "x${gpg_config_script_warn}" != x; then cat <= $min_gpg_error_version) ok=no @@ -83,8 +101,9 @@ AC_DEFUN([AM_PATH_GPG_ERROR], *** built for $gpg_error_config_host and thus may not match the *** used host $host. *** You may want to use the configure option --with-gpg-error-prefix -*** to specify a matching config script. +*** to specify a matching config script or use \$SYSROOT. ***]]) + gpg_config_script_warn="$gpg_config_script_warn libgpg-error" fi fi else diff --git a/src/libgcrypt.m4 b/src/libgcrypt.m4 index 6cf482f..c67cfec 100644 --- a/src/libgcrypt.m4 +++ b/src/libgcrypt.m4 @@ -1,13 +1,15 @@ -dnl Autoconf macros for libgcrypt -dnl Copyright (C) 2002, 2004, 2011 Free Software Foundation, Inc. -dnl -dnl This file is free software; as a special exception the author gives -dnl unlimited permission to copy and/or distribute it, with or without -dnl modifications, as long as this notice is preserved. -dnl -dnl This file is distributed in the hope that it will be useful, but -dnl WITHOUT ANY WARRANTY, to the extent permitted by law; without even the -dnl implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +# libgcrypt.m4 - Autoconf macros to detect libgcrypt +# Copyright (C) 2002, 2003, 2004, 2011, 2014 g10 Code GmbH +# +# This file is free software; as a special exception the author gives +# unlimited permission to copy and/or distribute it, with or without +# modifications, as long as this notice is preserved. +# +# This file is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY, to the extent permitted by law; without even the +# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +# +# Last-changed: 2014-10-02 dnl AM_PATH_LIBGCRYPT([MINIMUM-VERSION, @@ -20,19 +22,37 @@ dnl version of libgcrypt is at least 1.2.5 *and* the API number is 1. Using dnl this features allows to prevent build against newer versions of libgcrypt dnl with a changed API. dnl +dnl If a prefix option is not used, the config script is first +dnl searched in $SYSROOT/bin and then along $PATH. If the used +dnl config script does not match the host specification the script +dnl is added to the gpg_config_script_warn variable. +dnl AC_DEFUN([AM_PATH_LIBGCRYPT], [ AC_REQUIRE([AC_CANONICAL_HOST]) AC_ARG_WITH(libgcrypt-prefix, AC_HELP_STRING([--with-libgcrypt-prefix=PFX], [prefix where LIBGCRYPT is installed (optional)]), libgcrypt_config_prefix="$withval", libgcrypt_config_prefix="") - if test x$libgcrypt_config_prefix != x ; then - if test x${LIBGCRYPT_CONFIG+set} != xset ; then - LIBGCRYPT_CONFIG=$libgcrypt_config_prefix/bin/libgcrypt-config + if test x"${LIBGCRYPT_CONFIG}" = x ; then + if test x"${libgcrypt_config_prefix}" != x ; then + LIBGCRYPT_CONFIG="${libgcrypt_config_prefix}/bin/libgcrypt-config" + else + case "${SYSROOT}" in + /*) + if test -x "${SYSROOT}/bin/libgcrypt-config" ; then + LIBGCRYPT_CONFIG="${SYSROOT}/bin/libgcrypt-config" + fi + ;; + '') + ;; + *) + AC_MSG_WARN([Ignoring \$SYSROOT as it is not an absolute path.]) + ;; + esac fi fi - AC_PATH_TOOL(LIBGCRYPT_CONFIG, libgcrypt-config, no) + AC_PATH_PROG(LIBGCRYPT_CONFIG, libgcrypt-config, no) tmp=ifelse([$1], ,1:1.2.0,$1) if echo "$tmp" | grep ':' >/dev/null 2>/dev/null ; then req_libgcrypt_api=`echo "$tmp" | sed 's/\(.*\):\(.*\)/\1/'` @@ -108,8 +128,9 @@ AC_DEFUN([AM_PATH_LIBGCRYPT], *** built for $libgcrypt_config_host and thus may not match the *** used host $host. *** You may want to use the configure option --with-libgcrypt-prefix -*** to specify a matching config script. +*** to specify a matching config script or use \$SYSROOT. ***]]) + gpg_config_script_warn="$gpg_config_script_warn libgcrypt" fi fi else ----------------------------------------------------------------------- Summary of changes: configure.ac | 12 ++++++++++-- doc/gcrypt.texi | 2 +- m4/gpg-error.m4 | 33 ++++++++++++++++++++++++++------- src/libgcrypt.m4 | 51 ++++++++++++++++++++++++++++++++++++--------------- 4 files changed, 73 insertions(+), 25 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Thu Oct 2 15:03:54 2014 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 02 Oct 2014 15:03:54 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-116-g0ecd136 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 0ecd136a6ca02252f63ad229fa5240897bfe6544 (commit) from 1e8b86494cf8fa045696bd447b16267ffd1797f0 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 0ecd136a6ca02252f63ad229fa5240897bfe6544 Author: Werner Koch Date: Thu Oct 2 14:49:31 2014 +0200 build: Document SYSROOT. * configure.ac: Mark SYSROOT as arg var. diff --git a/configure.ac b/configure.ac index baed3ec..18db662 100644 --- a/configure.ac +++ b/configure.ac @@ -83,6 +83,8 @@ AC_CANONICAL_HOST AM_MAINTAINER_MODE AM_SILENT_RULES +AC_ARG_VAR(SYSROOT,[locate config scripts also below that directory]) + AH_TOP([ #ifndef _GCRYPT_CONFIG_H_INCLUDED #define _GCRYPT_CONFIG_H_INCLUDED diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index ecd4d7f..58671df 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -267,9 +267,9 @@ example shows how it can be used at the command line: gcc -c foo.c `libgcrypt-config --cflags` @end example -Adding the output of @samp{libgcrypt-config --cflags} to the compilers -command line will ensure that the compiler can find the Libgcrypt header -file. +Adding the output of @samp{libgcrypt-config --cflags} to the +compiler?s command line will ensure that the compiler can find the +Libgcrypt header file. A similar problem occurs when linking the program with the library. Again, the compiler has to find the library files. For this to work, @@ -314,7 +314,20 @@ found, execute @var{action-if-found}, otherwise do Additionally, the function defines @code{LIBGCRYPT_CFLAGS} to the flags needed for compilation of the program to find the @file{gcrypt.h} header file, and @code{LIBGCRYPT_LIBS} to the linker -flags needed to link the program to the Libgcrypt library. +flags needed to link the program to the Libgcrypt library. If the +used helper script does not match the target type you are building for +a warning is printed and the string @code{libgcrypt} is appended to the +variable @code{gpg_config_script_warn}. + +This macro searches for @command{libgcrypt-config} along the PATH. If +you are cross-compiling, it is useful to set the environment variable + at code{SYSROOT} to the top directory of your target. The macro will +then first look for the helper program in the @file{bin} directory +below that top directory. An absolute directory name must be used for + at code{SYSROOT}. Finally, if the configure command line option + at code{--libgcrypt-prefix} is used, only its value is used for the top +directory below which the helper script is expected. + @end defmac You can use the defined Autoconf variables like this in your ----------------------------------------------------------------------- Summary of changes: configure.ac | 2 ++ doc/gcrypt.texi | 21 +++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Sat Oct 4 14:25:18 2014 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 04 Oct 2014 15:25:18 +0300 Subject: [PATCH] Add Whirlpool AMD64/SSE2 assembly implementation Message-ID: <20141004122518.9382.84438.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. --- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna --- cipher/Makefile.am | 2 cipher/whirlpool-sse2-amd64.S | 335 +++++++++++++++++++++++++++++++++++++++++ cipher/whirlpool.c | 91 +++++++---- configure.ac | 7 + 4 files changed, 398 insertions(+), 37 deletions(-) create mode 100644 cipher/whirlpool-sse2-amd64.S diff --git a/cipher/Makefile.am b/cipher/Makefile.am index c165356..7f45cbb 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -87,7 +87,7 @@ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S \ stribog.c \ tiger.c \ -whirlpool.c \ +whirlpool.c whirlpool-sse2-amd64.S \ twofish.c twofish-amd64.S twofish-arm.S \ rfc2268.c \ camellia.c camellia.h camellia-glue.c camellia-aesni-avx-amd64.S \ diff --git a/cipher/whirlpool-sse2-amd64.S b/cipher/whirlpool-sse2-amd64.S new file mode 100644 index 0000000..d0bcf2d --- /dev/null +++ b/cipher/whirlpool-sse2-amd64.S @@ -0,0 +1,335 @@ +/* whirlpool-sse2-amd64.S - AMD64 assembly implementation of Whirlpool + * + * Copyright (C) 2014 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#ifdef __x86_64 +#include +#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_WHIRLPOOL) + +#ifdef __PIC__ +# define RIP %rip +#else +# define RIP +#endif + +.text + +/* look-up table offsets on RTAB */ +#define RC (0) +#define C0 (RC + (8 * 10)) +#define C1 (C0 + (8 * 256)) +#define C2 (C1 + (8 * 256)) +#define C3 (C2 + (8 * 256)) +#define C4 (C3 + (8 * 256)) +#define C5 (C4 + (8 * 256)) +#define C6 (C5 + (8 * 256)) +#define C7 (C6 + (8 * 256)) + +/* stack variables */ +#define STACK_DATAP (0) +#define STACK_STATEP (STACK_DATAP + 8) +#define STACK_ROUNDS (STACK_STATEP + 8) +#define STACK_NBLKS (STACK_ROUNDS + 8) +#define STACK_RBP (STACK_NBLKS + 8) +#define STACK_RBX (STACK_RBP + 8) +#define STACK_R12 (STACK_RBX + 8) +#define STACK_R13 (STACK_R12 + 8) +#define STACK_R14 (STACK_R13 + 8) +#define STACK_R15 (STACK_R14 + 8) +#define STACK_MAX (STACK_R15 + 8) + +/* register macros */ +#define RTAB %rbp + +#define RI1 %rax +#define RI2 %rbx +#define RI3 %rcx +#define RI4 %rdx + +#define RI1d %eax +#define RI2d %ebx +#define RI3d %ecx +#define RI4d %edx + +#define RI1bl %al +#define RI2bl %bl +#define RI3bl %cl +#define RI4bl %dl + +#define RI1bh %ah +#define RI2bh %bh +#define RI3bh %ch +#define RI4bh %dh + +#define RB0 %r8 +#define RB1 %r9 +#define RB2 %r10 +#define RB3 %r11 +#define RB4 %r12 +#define RB5 %r13 +#define RB6 %r14 +#define RB7 %r15 + +#define RT0 %rsi +#define RT1 %rdi + +#define RT0d %esi +#define RT1d %edi + +#define XKEY0 %xmm0 +#define XKEY1 %xmm1 +#define XKEY2 %xmm2 +#define XKEY3 %xmm3 +#define XKEY4 %xmm4 +#define XKEY5 %xmm5 +#define XKEY6 %xmm6 +#define XKEY7 %xmm7 + +#define XSTATE0 %xmm8 +#define XSTATE1 %xmm9 +#define XSTATE2 %xmm10 +#define XSTATE3 %xmm11 +#define XSTATE4 %xmm12 +#define XSTATE5 %xmm13 +#define XSTATE6 %xmm14 +#define XSTATE7 %xmm15 + +/*********************************************************************** + * AMD64 assembly implementation of Whirlpool. + * - Using table-lookups + * - Store state in XMM registers + ***********************************************************************/ +#define __do_whirl(op, ri, \ + b0, b1, b2, b3, b4, b5, b6, b7, \ + load_ri, load_arg) \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrq $16, ri; \ + op ## q C7(RTAB,RT0,8), b7; \ + op ## q C6(RTAB,RT1,8), b6; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrq $16, ri; \ + op ## q C5(RTAB,RT0,8), b5; \ + op ## q C4(RTAB,RT1,8), b4; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrl $16, ri ## d; \ + op ## q C3(RTAB,RT0,8), b3; \ + op ## q C2(RTAB,RT1,8), b2; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + load_ri( load_arg, ri); \ + op ## q C1(RTAB,RT0,8), b1; \ + op ## q C0(RTAB,RT1,8), b0; + +#define do_whirl(op, ri, rb_add, load_ri, load_arg) \ + __do_whirl(op, ##ri, rb_add, load_ri, load_arg) + +#define dummy(...) /*_*/ + +#define do_movq(src, dst) movq src, dst; + +#define RB_ADD0 RB0, RB1, RB2, RB3, RB4, RB5, RB6, RB7 +#define RB_ADD1 RB1, RB2, RB3, RB4, RB5, RB6, RB7, RB0 +#define RB_ADD2 RB2, RB3, RB4, RB5, RB6, RB7, RB0, RB1 +#define RB_ADD3 RB3, RB4, RB5, RB6, RB7, RB0, RB1, RB2 +#define RB_ADD4 RB4, RB5, RB6, RB7, RB0, RB1, RB2, RB3 +#define RB_ADD5 RB5, RB6, RB7, RB0, RB1, RB2, RB3, RB4 +#define RB_ADD6 RB6, RB7, RB0, RB1, RB2, RB3, RB4, RB5 +#define RB_ADD7 RB7, RB0, RB1, RB2, RB3, RB4, RB5, RB6 + +.align 8 +.globl _gcry_whirlpool_transform_amd64 +.type _gcry_whirlpool_transform_amd64, at function; + +_gcry_whirlpool_transform_amd64: + /* input: + * %rdi: state + * %rsi: inblk + * %rdx: nblks + * %rcx: look-up tables + */ + cmp $0, %rdx; + je .Lskip; + + subq $STACK_MAX, %rsp; + movq %rbp, STACK_RBP(%rsp); + movq %rbx, STACK_RBX(%rsp); + movq %r12, STACK_R12(%rsp); + movq %r13, STACK_R13(%rsp); + movq %r14, STACK_R14(%rsp); + movq %r15, STACK_R15(%rsp); + + movq %rdx, STACK_NBLKS(%rsp); + movq %rdi, STACK_STATEP(%rsp); + movq %rsi, STACK_DATAP(%rsp); + + movq %rcx, RTAB; + + jmp .Lfirst_block; + +.align 8 +.Lblock_loop: + movq STACK_DATAP(%rsp), %rsi; + movq RI1, %rdi; + +.Lfirst_block: + /* load data_block */ + movq 0*8(%rsi), RB0; + movq 1*8(%rsi), RB1; + bswapq RB0; + movq 2*8(%rsi), RB2; + bswapq RB1; + movq 3*8(%rsi), RB3; + bswapq RB2; + movq 4*8(%rsi), RB4; + bswapq RB3; + movq 5*8(%rsi), RB5; + bswapq RB4; + movq RB0, XSTATE0; + movq 6*8(%rsi), RB6; + bswapq RB5; + movq RB1, XSTATE1; + movq 7*8(%rsi), RB7; + bswapq RB6; + movq RB2, XSTATE2; + bswapq RB7; + movq RB3, XSTATE3; + movq RB4, XSTATE4; + movq RB5, XSTATE5; + movq RB6, XSTATE6; + movq RB7, XSTATE7; + + /* load key */ + movq 0*8(%rdi), XKEY0; + movq 1*8(%rdi), XKEY1; + movq 2*8(%rdi), XKEY2; + movq 3*8(%rdi), XKEY3; + movq 4*8(%rdi), XKEY4; + movq 5*8(%rdi), XKEY5; + movq 6*8(%rdi), XKEY6; + movq 7*8(%rdi), XKEY7; + + movq XKEY0, RI1; + movq XKEY1, RI2; + movq XKEY2, RI3; + movq XKEY3, RI4; + + /* prepare and store state */ + pxor XKEY0, XSTATE0; + pxor XKEY1, XSTATE1; + pxor XKEY2, XSTATE2; + pxor XKEY3, XSTATE3; + pxor XKEY4, XSTATE4; + pxor XKEY5, XSTATE5; + pxor XKEY6, XSTATE6; + pxor XKEY7, XSTATE7; + + movq XSTATE0, 0*8(%rdi); + movq XSTATE1, 1*8(%rdi); + movq XSTATE2, 2*8(%rdi); + movq XSTATE3, 3*8(%rdi); + movq XSTATE4, 4*8(%rdi); + movq XSTATE5, 5*8(%rdi); + movq XSTATE6, 6*8(%rdi); + movq XSTATE7, 7*8(%rdi); + + addq $64, STACK_DATAP(%rsp); + movl $(0), STACK_ROUNDS(%rsp); +.align 8 +.Lround_loop: + do_whirl(mov, RI1 /*XKEY0*/, RB_ADD0, do_movq, XKEY4); + do_whirl(xor, RI2 /*XKEY1*/, RB_ADD1, do_movq, XKEY5); + do_whirl(xor, RI3 /*XKEY2*/, RB_ADD2, do_movq, XKEY6); + do_whirl(xor, RI4 /*XKEY3*/, RB_ADD3, do_movq, XKEY7); + do_whirl(xor, RI1 /*XKEY0*/, RB_ADD4, do_movq, XSTATE0); + do_whirl(xor, RI2 /*XKEY1*/, RB_ADD5, do_movq, XSTATE1); + do_whirl(xor, RI3 /*XKEY2*/, RB_ADD6, do_movq, XSTATE2); + do_whirl(xor, RI4 /*XKEY3*/, RB_ADD7, do_movq, XSTATE3); + + movl STACK_ROUNDS(%rsp), RT0d; + movq RB1, XKEY1; + addl $1, STACK_ROUNDS(%rsp); + movq RB2, XKEY2; + movq RB3, XKEY3; + xorq RC(RTAB,RT0,8), RB0; /* Add round constant */ + movq RB4, XKEY4; + movq RB5, XKEY5; + movq RB0, XKEY0; + movq RB6, XKEY6; + movq RB7, XKEY7; + + do_whirl(xor, RI1 /*XSTATE0*/, RB_ADD0, do_movq, XSTATE4); + do_whirl(xor, RI2 /*XSTATE1*/, RB_ADD1, do_movq, XSTATE5); + do_whirl(xor, RI3 /*XSTATE2*/, RB_ADD2, do_movq, XSTATE6); + do_whirl(xor, RI4 /*XSTATE3*/, RB_ADD3, do_movq, XSTATE7); + + cmpl $10, STACK_ROUNDS(%rsp); + je .Lis_last_round; + + do_whirl(xor, RI1 /*XSTATE4*/, RB_ADD4, do_movq, XKEY0); + do_whirl(xor, RI2 /*XSTATE5*/, RB_ADD5, do_movq, XKEY1); + do_whirl(xor, RI3 /*XSTATE6*/, RB_ADD6, do_movq, XKEY2); + do_whirl(xor, RI4 /*XSTATE7*/, RB_ADD7, do_movq, XKEY3); + movq RB0, XSTATE0; + movq RB1, XSTATE1; + movq RB2, XSTATE2; + movq RB3, XSTATE3; + movq RB4, XSTATE4; + movq RB5, XSTATE5; + movq RB6, XSTATE6; + movq RB7, XSTATE7; + + jmp .Lround_loop; +.align 8 +.Lis_last_round: + do_whirl(xor, RI1 /*XSTATE4*/, RB_ADD4, dummy, _); + movq STACK_STATEP(%rsp), RI1; + do_whirl(xor, RI2 /*XSTATE5*/, RB_ADD5, dummy, _); + do_whirl(xor, RI3 /*XSTATE6*/, RB_ADD6, dummy, _); + do_whirl(xor, RI4 /*XSTATE7*/, RB_ADD7, dummy, _); + + /* store state */ + xorq RB0, 0*8(RI1); + xorq RB1, 1*8(RI1); + xorq RB2, 2*8(RI1); + xorq RB3, 3*8(RI1); + xorq RB4, 4*8(RI1); + xorq RB5, 5*8(RI1); + xorq RB6, 6*8(RI1); + xorq RB7, 7*8(RI1); + + subq $1, STACK_NBLKS(%rsp); + jnz .Lblock_loop; + + movq STACK_RBP(%rsp), %rbp; + movq STACK_RBX(%rsp), %rbx; + movq STACK_R12(%rsp), %r12; + movq STACK_R13(%rsp), %r13; + movq STACK_R14(%rsp), %r14; + movq STACK_R15(%rsp), %r15; + addq $STACK_MAX, %rsp; +.Lskip: + movl $(STACK_MAX + 8), %eax; + ret; +.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64; + +#endif +#endif diff --git a/cipher/whirlpool.c b/cipher/whirlpool.c index ffc6662..2732f63 100644 --- a/cipher/whirlpool.c +++ b/cipher/whirlpool.c @@ -40,6 +40,14 @@ #include "bufhelp.h" #include "hash-common.h" +/* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ +#undef USE_AMD64_ASM +#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# define USE_AMD64_ASM 1 +#endif + + + /* Size of a whirlpool block (in bytes). */ #define BLOCK_SIZE 64 @@ -89,8 +97,15 @@ typedef struct { + +struct whirlpool_tables_s { + u64 RC[R]; + u64 C[8][256]; +}; + +static const struct whirlpool_tables_s tab = +{ /* Round constants. */ -static const u64 rc[R] = { U64_C (0x1823c6e887b8014f), U64_C (0x36a6d2f5796f9152), @@ -102,13 +117,9 @@ static const u64 rc[R] = U64_C (0xe427418ba77d95d8), U64_C (0xfbee7c66dd17479e), U64_C (0xca2dbf07ad5a8333), - }; - - - + }, /* Main lookup boxes. */ -static const u64 C0[256] = - { + { { U64_C (0x18186018c07830d8), U64_C (0x23238c2305af4626), U64_C (0xc6c63fc67ef991b8), U64_C (0xe8e887e8136fcdfb), U64_C (0x878726874ca113cb), U64_C (0xb8b8dab8a9626d11), @@ -237,10 +248,7 @@ static const u64 C0[256] = U64_C (0x98985a98b4c22d2c), U64_C (0xa4a4aaa4490e55ed), U64_C (0x2828a0285d885075), U64_C (0x5c5c6d5cda31b886), U64_C (0xf8f8c7f8933fed6b), U64_C (0x8686228644a411c2), - }; - -static const u64 C1[256] = - { + }, { U64_C (0xd818186018c07830), U64_C (0x2623238c2305af46), U64_C (0xb8c6c63fc67ef991), U64_C (0xfbe8e887e8136fcd), U64_C (0xcb878726874ca113), U64_C (0x11b8b8dab8a9626d), @@ -369,10 +377,7 @@ static const u64 C1[256] = U64_C (0x2c98985a98b4c22d), U64_C (0xeda4a4aaa4490e55), U64_C (0x752828a0285d8850), U64_C (0x865c5c6d5cda31b8), U64_C (0x6bf8f8c7f8933fed), U64_C (0xc28686228644a411), - }; - -static const u64 C2[256] = - { + }, { U64_C (0x30d818186018c078), U64_C (0x462623238c2305af), U64_C (0x91b8c6c63fc67ef9), U64_C (0xcdfbe8e887e8136f), U64_C (0x13cb878726874ca1), U64_C (0x6d11b8b8dab8a962), @@ -501,10 +506,7 @@ static const u64 C2[256] = U64_C (0x2d2c98985a98b4c2), U64_C (0x55eda4a4aaa4490e), U64_C (0x50752828a0285d88), U64_C (0xb8865c5c6d5cda31), U64_C (0xed6bf8f8c7f8933f), U64_C (0x11c28686228644a4), - }; - -static const u64 C3[256] = - { + }, { U64_C (0x7830d818186018c0), U64_C (0xaf462623238c2305), U64_C (0xf991b8c6c63fc67e), U64_C (0x6fcdfbe8e887e813), U64_C (0xa113cb878726874c), U64_C (0x626d11b8b8dab8a9), @@ -633,10 +635,7 @@ static const u64 C3[256] = U64_C (0xc22d2c98985a98b4), U64_C (0x0e55eda4a4aaa449), U64_C (0x8850752828a0285d), U64_C (0x31b8865c5c6d5cda), U64_C (0x3fed6bf8f8c7f893), U64_C (0xa411c28686228644), - }; - -static const u64 C4[256] = - { + }, { U64_C (0xc07830d818186018), U64_C (0x05af462623238c23), U64_C (0x7ef991b8c6c63fc6), U64_C (0x136fcdfbe8e887e8), U64_C (0x4ca113cb87872687), U64_C (0xa9626d11b8b8dab8), @@ -765,10 +764,7 @@ static const u64 C4[256] = U64_C (0xb4c22d2c98985a98), U64_C (0x490e55eda4a4aaa4), U64_C (0x5d8850752828a028), U64_C (0xda31b8865c5c6d5c), U64_C (0x933fed6bf8f8c7f8), U64_C (0x44a411c286862286), - }; - -static const u64 C5[256] = - { + }, { U64_C (0x18c07830d8181860), U64_C (0x2305af462623238c), U64_C (0xc67ef991b8c6c63f), U64_C (0xe8136fcdfbe8e887), U64_C (0x874ca113cb878726), U64_C (0xb8a9626d11b8b8da), @@ -897,10 +893,7 @@ static const u64 C5[256] = U64_C (0x98b4c22d2c98985a), U64_C (0xa4490e55eda4a4aa), U64_C (0x285d8850752828a0), U64_C (0x5cda31b8865c5c6d), U64_C (0xf8933fed6bf8f8c7), U64_C (0x8644a411c2868622), - }; - -static const u64 C6[256] = - { + }, { U64_C (0x6018c07830d81818), U64_C (0x8c2305af46262323), U64_C (0x3fc67ef991b8c6c6), U64_C (0x87e8136fcdfbe8e8), U64_C (0x26874ca113cb8787), U64_C (0xdab8a9626d11b8b8), @@ -1029,10 +1022,7 @@ static const u64 C6[256] = U64_C (0x5a98b4c22d2c9898), U64_C (0xaaa4490e55eda4a4), U64_C (0xa0285d8850752828), U64_C (0x6d5cda31b8865c5c), U64_C (0xc7f8933fed6bf8f8), U64_C (0x228644a411c28686), - }; - -static const u64 C7[256] = - { + }, { U64_C (0x186018c07830d818), U64_C (0x238c2305af462623), U64_C (0xc63fc67ef991b8c6), U64_C (0xe887e8136fcdfbe8), U64_C (0x8726874ca113cb87), U64_C (0xb8dab8a9626d11b8), @@ -1161,7 +1151,18 @@ static const u64 C7[256] = U64_C (0x985a98b4c22d2c98), U64_C (0xa4aaa4490e55eda4), U64_C (0x28a0285d88507528), U64_C (0x5c6d5cda31b8865c), U64_C (0xf8c7f8933fed6bf8), U64_C (0x86228644a411c286), - }; + } } +}; +#define C tab.C +#define C0 C[0] +#define C1 C[1] +#define C2 C[2] +#define C3 C[3] +#define C4 C[4] +#define C5 C[5] +#define C6 C[6] +#define C7 C[7] +#define rc tab.RC @@ -1189,6 +1190,22 @@ whirlpool_init (void *ctx, unsigned int flags) } +#ifdef USE_AMD64_ASM + +extern unsigned int +_gcry_whirlpool_transform_amd64(u64 *state, const unsigned char *data, + size_t nblks, const struct whirlpool_tables_s *tables); + +static unsigned int +whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) +{ + whirlpool_context_t *context = ctx; + + return _gcry_whirlpool_transform_amd64( + context->hash_state, data, nblks, &tab); +} + +#else /* USE_AMD64_ASM */ /* * Transform block. @@ -1308,6 +1325,8 @@ whirlpool_transform ( void *c, const unsigned char *data, size_t nblks ) return burn; } +#endif /* !USE_AMD64_ASM */ + /* Bug compatibility Whirlpool version. */ static void diff --git a/configure.ac b/configure.ac index 18db662..d14b7f6 100644 --- a/configure.ac +++ b/configure.ac @@ -1943,6 +1943,13 @@ LIST_MEMBER(whirlpool, $enabled_digests) if test "$found" = "1" ; then GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool.lo" AC_DEFINE(USE_WHIRLPOOL, 1, [Defined if this module should be included]) + + case "${host}" in + x86_64-*-*) + # Build with the assembly implementation + GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool-sse2-amd64.lo" + ;; + esac fi # rmd160 and sha1 should be included always. From cvs at cvs.gnupg.org Sat Oct 4 14:37:34 2014 From: cvs at cvs.gnupg.org (by Andrei Scherer) Date: Sat, 04 Oct 2014 14:37:34 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-117-g30bd759 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 30bd759f398f45b04d0a783b875f59ce9bd1e51d (commit) from 0ecd136a6ca02252f63ad229fa5240897bfe6544 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 30bd759f398f45b04d0a783b875f59ce9bd1e51d Author: Andrei Scherer Date: Thu Aug 28 09:45:35 2014 -0800 Improved ripemd160 performance * cipher/rmd160.c (transform): Interleave the left and right lane rounds to introduce more instruction level parallelism. -- The benchmarks on different systems: Intel(R) Atom(TM) CPU N570 @ 1.66GHz before: Hash: | nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 | 13.07 ns/B 72.97 MiB/s - c/B after: Hash: | nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 | 11.37 ns/B 83.84 MiB/s - c/B Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz before: Hash: | nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 | 3.31 ns/B 288.0 MiB/s - c/B after: Hash: | nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 | 2.08 ns/B 458.5 MiB/s - c/B Signed-off-by: Andrei Scherer diff --git a/cipher/rmd160.c b/cipher/rmd160.c index 2aba0fe..e6d02f5 100644 --- a/cipher/rmd160.c +++ b/cipher/rmd160.c @@ -178,8 +178,7 @@ static unsigned int transform_blk ( void *ctx, const unsigned char *data ) { RMD160_CONTEXT *hd = ctx; - register u32 a,b,c,d,e; - u32 aa,bb,cc,dd,ee,t; + register u32 al, ar, bl, br, cl, cr, dl, dr, el, er; u32 x[16]; int i; @@ -201,196 +200,186 @@ transform_blk ( void *ctx, const unsigned char *data ) #define F2(x,y,z) ( ((x) | ~(y)) ^ (z) ) #define F3(x,y,z) ( ((x) & (z)) | ((y) & ~(z)) ) #define F4(x,y,z) ( (x) ^ ((y) | ~(z)) ) -#define R(a,b,c,d,e,f,k,r,s) do { t = a + f(b,c,d) + k + x[r]; \ - a = rol(t,s) + e; \ +#define R(a,b,c,d,e,f,k,r,s) do { a += f(b,c,d) + k + x[r]; \ + a = rol(a,s) + e; \ c = rol(c,10); \ } while(0) - /* left lane */ - a = hd->h0; - b = hd->h1; - c = hd->h2; - d = hd->h3; - e = hd->h4; - R( a, b, c, d, e, F0, K0, 0, 11 ); - R( e, a, b, c, d, F0, K0, 1, 14 ); - R( d, e, a, b, c, F0, K0, 2, 15 ); - R( c, d, e, a, b, F0, K0, 3, 12 ); - R( b, c, d, e, a, F0, K0, 4, 5 ); - R( a, b, c, d, e, F0, K0, 5, 8 ); - R( e, a, b, c, d, F0, K0, 6, 7 ); - R( d, e, a, b, c, F0, K0, 7, 9 ); - R( c, d, e, a, b, F0, K0, 8, 11 ); - R( b, c, d, e, a, F0, K0, 9, 13 ); - R( a, b, c, d, e, F0, K0, 10, 14 ); - R( e, a, b, c, d, F0, K0, 11, 15 ); - R( d, e, a, b, c, F0, K0, 12, 6 ); - R( c, d, e, a, b, F0, K0, 13, 7 ); - R( b, c, d, e, a, F0, K0, 14, 9 ); - R( a, b, c, d, e, F0, K0, 15, 8 ); - R( e, a, b, c, d, F1, K1, 7, 7 ); - R( d, e, a, b, c, F1, K1, 4, 6 ); - R( c, d, e, a, b, F1, K1, 13, 8 ); - R( b, c, d, e, a, F1, K1, 1, 13 ); - R( a, b, c, d, e, F1, K1, 10, 11 ); - R( e, a, b, c, d, F1, K1, 6, 9 ); - R( d, e, a, b, c, F1, K1, 15, 7 ); - R( c, d, e, a, b, F1, K1, 3, 15 ); - R( b, c, d, e, a, F1, K1, 12, 7 ); - R( a, b, c, d, e, F1, K1, 0, 12 ); - R( e, a, b, c, d, F1, K1, 9, 15 ); - R( d, e, a, b, c, F1, K1, 5, 9 ); - R( c, d, e, a, b, F1, K1, 2, 11 ); - R( b, c, d, e, a, F1, K1, 14, 7 ); - R( a, b, c, d, e, F1, K1, 11, 13 ); - R( e, a, b, c, d, F1, K1, 8, 12 ); - R( d, e, a, b, c, F2, K2, 3, 11 ); - R( c, d, e, a, b, F2, K2, 10, 13 ); - R( b, c, d, e, a, F2, K2, 14, 6 ); - R( a, b, c, d, e, F2, K2, 4, 7 ); - R( e, a, b, c, d, F2, K2, 9, 14 ); - R( d, e, a, b, c, F2, K2, 15, 9 ); - R( c, d, e, a, b, F2, K2, 8, 13 ); - R( b, c, d, e, a, F2, K2, 1, 15 ); - R( a, b, c, d, e, F2, K2, 2, 14 ); - R( e, a, b, c, d, F2, K2, 7, 8 ); - R( d, e, a, b, c, F2, K2, 0, 13 ); - R( c, d, e, a, b, F2, K2, 6, 6 ); - R( b, c, d, e, a, F2, K2, 13, 5 ); - R( a, b, c, d, e, F2, K2, 11, 12 ); - R( e, a, b, c, d, F2, K2, 5, 7 ); - R( d, e, a, b, c, F2, K2, 12, 5 ); - R( c, d, e, a, b, F3, K3, 1, 11 ); - R( b, c, d, e, a, F3, K3, 9, 12 ); - R( a, b, c, d, e, F3, K3, 11, 14 ); - R( e, a, b, c, d, F3, K3, 10, 15 ); - R( d, e, a, b, c, F3, K3, 0, 14 ); - R( c, d, e, a, b, F3, K3, 8, 15 ); - R( b, c, d, e, a, F3, K3, 12, 9 ); - R( a, b, c, d, e, F3, K3, 4, 8 ); - R( e, a, b, c, d, F3, K3, 13, 9 ); - R( d, e, a, b, c, F3, K3, 3, 14 ); - R( c, d, e, a, b, F3, K3, 7, 5 ); - R( b, c, d, e, a, F3, K3, 15, 6 ); - R( a, b, c, d, e, F3, K3, 14, 8 ); - R( e, a, b, c, d, F3, K3, 5, 6 ); - R( d, e, a, b, c, F3, K3, 6, 5 ); - R( c, d, e, a, b, F3, K3, 2, 12 ); - R( b, c, d, e, a, F4, K4, 4, 9 ); - R( a, b, c, d, e, F4, K4, 0, 15 ); - R( e, a, b, c, d, F4, K4, 5, 5 ); - R( d, e, a, b, c, F4, K4, 9, 11 ); - R( c, d, e, a, b, F4, K4, 7, 6 ); - R( b, c, d, e, a, F4, K4, 12, 8 ); - R( a, b, c, d, e, F4, K4, 2, 13 ); - R( e, a, b, c, d, F4, K4, 10, 12 ); - R( d, e, a, b, c, F4, K4, 14, 5 ); - R( c, d, e, a, b, F4, K4, 1, 12 ); - R( b, c, d, e, a, F4, K4, 3, 13 ); - R( a, b, c, d, e, F4, K4, 8, 14 ); - R( e, a, b, c, d, F4, K4, 11, 11 ); - R( d, e, a, b, c, F4, K4, 6, 8 ); - R( c, d, e, a, b, F4, K4, 15, 5 ); - R( b, c, d, e, a, F4, K4, 13, 6 ); - - aa = a; bb = b; cc = c; dd = d; ee = e; - - /* right lane */ - a = hd->h0; - b = hd->h1; - c = hd->h2; - d = hd->h3; - e = hd->h4; - R( a, b, c, d, e, F4, KK0, 5, 8); - R( e, a, b, c, d, F4, KK0, 14, 9); - R( d, e, a, b, c, F4, KK0, 7, 9); - R( c, d, e, a, b, F4, KK0, 0, 11); - R( b, c, d, e, a, F4, KK0, 9, 13); - R( a, b, c, d, e, F4, KK0, 2, 15); - R( e, a, b, c, d, F4, KK0, 11, 15); - R( d, e, a, b, c, F4, KK0, 4, 5); - R( c, d, e, a, b, F4, KK0, 13, 7); - R( b, c, d, e, a, F4, KK0, 6, 7); - R( a, b, c, d, e, F4, KK0, 15, 8); - R( e, a, b, c, d, F4, KK0, 8, 11); - R( d, e, a, b, c, F4, KK0, 1, 14); - R( c, d, e, a, b, F4, KK0, 10, 14); - R( b, c, d, e, a, F4, KK0, 3, 12); - R( a, b, c, d, e, F4, KK0, 12, 6); - R( e, a, b, c, d, F3, KK1, 6, 9); - R( d, e, a, b, c, F3, KK1, 11, 13); - R( c, d, e, a, b, F3, KK1, 3, 15); - R( b, c, d, e, a, F3, KK1, 7, 7); - R( a, b, c, d, e, F3, KK1, 0, 12); - R( e, a, b, c, d, F3, KK1, 13, 8); - R( d, e, a, b, c, F3, KK1, 5, 9); - R( c, d, e, a, b, F3, KK1, 10, 11); - R( b, c, d, e, a, F3, KK1, 14, 7); - R( a, b, c, d, e, F3, KK1, 15, 7); - R( e, a, b, c, d, F3, KK1, 8, 12); - R( d, e, a, b, c, F3, KK1, 12, 7); - R( c, d, e, a, b, F3, KK1, 4, 6); - R( b, c, d, e, a, F3, KK1, 9, 15); - R( a, b, c, d, e, F3, KK1, 1, 13); - R( e, a, b, c, d, F3, KK1, 2, 11); - R( d, e, a, b, c, F2, KK2, 15, 9); - R( c, d, e, a, b, F2, KK2, 5, 7); - R( b, c, d, e, a, F2, KK2, 1, 15); - R( a, b, c, d, e, F2, KK2, 3, 11); - R( e, a, b, c, d, F2, KK2, 7, 8); - R( d, e, a, b, c, F2, KK2, 14, 6); - R( c, d, e, a, b, F2, KK2, 6, 6); - R( b, c, d, e, a, F2, KK2, 9, 14); - R( a, b, c, d, e, F2, KK2, 11, 12); - R( e, a, b, c, d, F2, KK2, 8, 13); - R( d, e, a, b, c, F2, KK2, 12, 5); - R( c, d, e, a, b, F2, KK2, 2, 14); - R( b, c, d, e, a, F2, KK2, 10, 13); - R( a, b, c, d, e, F2, KK2, 0, 13); - R( e, a, b, c, d, F2, KK2, 4, 7); - R( d, e, a, b, c, F2, KK2, 13, 5); - R( c, d, e, a, b, F1, KK3, 8, 15); - R( b, c, d, e, a, F1, KK3, 6, 5); - R( a, b, c, d, e, F1, KK3, 4, 8); - R( e, a, b, c, d, F1, KK3, 1, 11); - R( d, e, a, b, c, F1, KK3, 3, 14); - R( c, d, e, a, b, F1, KK3, 11, 14); - R( b, c, d, e, a, F1, KK3, 15, 6); - R( a, b, c, d, e, F1, KK3, 0, 14); - R( e, a, b, c, d, F1, KK3, 5, 6); - R( d, e, a, b, c, F1, KK3, 12, 9); - R( c, d, e, a, b, F1, KK3, 2, 12); - R( b, c, d, e, a, F1, KK3, 13, 9); - R( a, b, c, d, e, F1, KK3, 9, 12); - R( e, a, b, c, d, F1, KK3, 7, 5); - R( d, e, a, b, c, F1, KK3, 10, 15); - R( c, d, e, a, b, F1, KK3, 14, 8); - R( b, c, d, e, a, F0, KK4, 12, 8); - R( a, b, c, d, e, F0, KK4, 15, 5); - R( e, a, b, c, d, F0, KK4, 10, 12); - R( d, e, a, b, c, F0, KK4, 4, 9); - R( c, d, e, a, b, F0, KK4, 1, 12); - R( b, c, d, e, a, F0, KK4, 5, 5); - R( a, b, c, d, e, F0, KK4, 8, 14); - R( e, a, b, c, d, F0, KK4, 7, 6); - R( d, e, a, b, c, F0, KK4, 6, 8); - R( c, d, e, a, b, F0, KK4, 2, 13); - R( b, c, d, e, a, F0, KK4, 13, 6); - R( a, b, c, d, e, F0, KK4, 14, 5); - R( e, a, b, c, d, F0, KK4, 0, 15); - R( d, e, a, b, c, F0, KK4, 3, 13); - R( c, d, e, a, b, F0, KK4, 9, 11); - R( b, c, d, e, a, F0, KK4, 11, 11); - - - t = hd->h1 + d + cc; - hd->h1 = hd->h2 + e + dd; - hd->h2 = hd->h3 + a + ee; - hd->h3 = hd->h4 + b + aa; - hd->h4 = hd->h0 + c + bb; - hd->h0 = t; - - return /*burn_stack*/ 108+5*sizeof(void*); + /* left lane and right lanes interleaved */ + al = ar = hd->h0; + bl = br = hd->h1; + cl = cr = hd->h2; + dl = dr = hd->h3; + el = er = hd->h4; + R( al, bl, cl, dl, el, F0, K0, 0, 11 ); + R( ar, br, cr, dr, er, F4, KK0, 5, 8); + R( el, al, bl, cl, dl, F0, K0, 1, 14 ); + R( er, ar, br, cr, dr, F4, KK0, 14, 9); + R( dl, el, al, bl, cl, F0, K0, 2, 15 ); + R( dr, er, ar, br, cr, F4, KK0, 7, 9); + R( cl, dl, el, al, bl, F0, K0, 3, 12 ); + R( cr, dr, er, ar, br, F4, KK0, 0, 11); + R( bl, cl, dl, el, al, F0, K0, 4, 5 ); + R( br, cr, dr, er, ar, F4, KK0, 9, 13); + R( al, bl, cl, dl, el, F0, K0, 5, 8 ); + R( ar, br, cr, dr, er, F4, KK0, 2, 15); + R( el, al, bl, cl, dl, F0, K0, 6, 7 ); + R( er, ar, br, cr, dr, F4, KK0, 11, 15); + R( dl, el, al, bl, cl, F0, K0, 7, 9 ); + R( dr, er, ar, br, cr, F4, KK0, 4, 5); + R( cl, dl, el, al, bl, F0, K0, 8, 11 ); + R( cr, dr, er, ar, br, F4, KK0, 13, 7); + R( bl, cl, dl, el, al, F0, K0, 9, 13 ); + R( br, cr, dr, er, ar, F4, KK0, 6, 7); + R( al, bl, cl, dl, el, F0, K0, 10, 14 ); + R( ar, br, cr, dr, er, F4, KK0, 15, 8); + R( el, al, bl, cl, dl, F0, K0, 11, 15 ); + R( er, ar, br, cr, dr, F4, KK0, 8, 11); + R( dl, el, al, bl, cl, F0, K0, 12, 6 ); + R( dr, er, ar, br, cr, F4, KK0, 1, 14); + R( cl, dl, el, al, bl, F0, K0, 13, 7 ); + R( cr, dr, er, ar, br, F4, KK0, 10, 14); + R( bl, cl, dl, el, al, F0, K0, 14, 9 ); + R( br, cr, dr, er, ar, F4, KK0, 3, 12); + R( al, bl, cl, dl, el, F0, K0, 15, 8 ); + R( ar, br, cr, dr, er, F4, KK0, 12, 6); + R( el, al, bl, cl, dl, F1, K1, 7, 7 ); + R( er, ar, br, cr, dr, F3, KK1, 6, 9); + R( dl, el, al, bl, cl, F1, K1, 4, 6 ); + R( dr, er, ar, br, cr, F3, KK1, 11, 13); + R( cl, dl, el, al, bl, F1, K1, 13, 8 ); + R( cr, dr, er, ar, br, F3, KK1, 3, 15); + R( bl, cl, dl, el, al, F1, K1, 1, 13 ); + R( br, cr, dr, er, ar, F3, KK1, 7, 7); + R( al, bl, cl, dl, el, F1, K1, 10, 11 ); + R( ar, br, cr, dr, er, F3, KK1, 0, 12); + R( el, al, bl, cl, dl, F1, K1, 6, 9 ); + R( er, ar, br, cr, dr, F3, KK1, 13, 8); + R( dl, el, al, bl, cl, F1, K1, 15, 7 ); + R( dr, er, ar, br, cr, F3, KK1, 5, 9); + R( cl, dl, el, al, bl, F1, K1, 3, 15 ); + R( cr, dr, er, ar, br, F3, KK1, 10, 11); + R( bl, cl, dl, el, al, F1, K1, 12, 7 ); + R( br, cr, dr, er, ar, F3, KK1, 14, 7); + R( al, bl, cl, dl, el, F1, K1, 0, 12 ); + R( ar, br, cr, dr, er, F3, KK1, 15, 7); + R( el, al, bl, cl, dl, F1, K1, 9, 15 ); + R( er, ar, br, cr, dr, F3, KK1, 8, 12); + R( dl, el, al, bl, cl, F1, K1, 5, 9 ); + R( dr, er, ar, br, cr, F3, KK1, 12, 7); + R( cl, dl, el, al, bl, F1, K1, 2, 11 ); + R( cr, dr, er, ar, br, F3, KK1, 4, 6); + R( bl, cl, dl, el, al, F1, K1, 14, 7 ); + R( br, cr, dr, er, ar, F3, KK1, 9, 15); + R( al, bl, cl, dl, el, F1, K1, 11, 13 ); + R( ar, br, cr, dr, er, F3, KK1, 1, 13); + R( el, al, bl, cl, dl, F1, K1, 8, 12 ); + R( er, ar, br, cr, dr, F3, KK1, 2, 11); + R( dl, el, al, bl, cl, F2, K2, 3, 11 ); + R( dr, er, ar, br, cr, F2, KK2, 15, 9); + R( cl, dl, el, al, bl, F2, K2, 10, 13 ); + R( cr, dr, er, ar, br, F2, KK2, 5, 7); + R( bl, cl, dl, el, al, F2, K2, 14, 6 ); + R( br, cr, dr, er, ar, F2, KK2, 1, 15); + R( al, bl, cl, dl, el, F2, K2, 4, 7 ); + R( ar, br, cr, dr, er, F2, KK2, 3, 11); + R( el, al, bl, cl, dl, F2, K2, 9, 14 ); + R( er, ar, br, cr, dr, F2, KK2, 7, 8); + R( dl, el, al, bl, cl, F2, K2, 15, 9 ); + R( dr, er, ar, br, cr, F2, KK2, 14, 6); + R( cl, dl, el, al, bl, F2, K2, 8, 13 ); + R( cr, dr, er, ar, br, F2, KK2, 6, 6); + R( bl, cl, dl, el, al, F2, K2, 1, 15 ); + R( br, cr, dr, er, ar, F2, KK2, 9, 14); + R( al, bl, cl, dl, el, F2, K2, 2, 14 ); + R( ar, br, cr, dr, er, F2, KK2, 11, 12); + R( el, al, bl, cl, dl, F2, K2, 7, 8 ); + R( er, ar, br, cr, dr, F2, KK2, 8, 13); + R( dl, el, al, bl, cl, F2, K2, 0, 13 ); + R( dr, er, ar, br, cr, F2, KK2, 12, 5); + R( cl, dl, el, al, bl, F2, K2, 6, 6 ); + R( cr, dr, er, ar, br, F2, KK2, 2, 14); + R( bl, cl, dl, el, al, F2, K2, 13, 5 ); + R( br, cr, dr, er, ar, F2, KK2, 10, 13); + R( al, bl, cl, dl, el, F2, K2, 11, 12 ); + R( ar, br, cr, dr, er, F2, KK2, 0, 13); + R( el, al, bl, cl, dl, F2, K2, 5, 7 ); + R( er, ar, br, cr, dr, F2, KK2, 4, 7); + R( dl, el, al, bl, cl, F2, K2, 12, 5 ); + R( dr, er, ar, br, cr, F2, KK2, 13, 5); + R( cl, dl, el, al, bl, F3, K3, 1, 11 ); + R( cr, dr, er, ar, br, F1, KK3, 8, 15); + R( bl, cl, dl, el, al, F3, K3, 9, 12 ); + R( br, cr, dr, er, ar, F1, KK3, 6, 5); + R( al, bl, cl, dl, el, F3, K3, 11, 14 ); + R( ar, br, cr, dr, er, F1, KK3, 4, 8); + R( el, al, bl, cl, dl, F3, K3, 10, 15 ); + R( er, ar, br, cr, dr, F1, KK3, 1, 11); + R( dl, el, al, bl, cl, F3, K3, 0, 14 ); + R( dr, er, ar, br, cr, F1, KK3, 3, 14); + R( cl, dl, el, al, bl, F3, K3, 8, 15 ); + R( cr, dr, er, ar, br, F1, KK3, 11, 14); + R( bl, cl, dl, el, al, F3, K3, 12, 9 ); + R( br, cr, dr, er, ar, F1, KK3, 15, 6); + R( al, bl, cl, dl, el, F3, K3, 4, 8 ); + R( ar, br, cr, dr, er, F1, KK3, 0, 14); + R( el, al, bl, cl, dl, F3, K3, 13, 9 ); + R( er, ar, br, cr, dr, F1, KK3, 5, 6); + R( dl, el, al, bl, cl, F3, K3, 3, 14 ); + R( dr, er, ar, br, cr, F1, KK3, 12, 9); + R( cl, dl, el, al, bl, F3, K3, 7, 5 ); + R( cr, dr, er, ar, br, F1, KK3, 2, 12); + R( bl, cl, dl, el, al, F3, K3, 15, 6 ); + R( br, cr, dr, er, ar, F1, KK3, 13, 9); + R( al, bl, cl, dl, el, F3, K3, 14, 8 ); + R( ar, br, cr, dr, er, F1, KK3, 9, 12); + R( el, al, bl, cl, dl, F3, K3, 5, 6 ); + R( er, ar, br, cr, dr, F1, KK3, 7, 5); + R( dl, el, al, bl, cl, F3, K3, 6, 5 ); + R( dr, er, ar, br, cr, F1, KK3, 10, 15); + R( cl, dl, el, al, bl, F3, K3, 2, 12 ); + R( cr, dr, er, ar, br, F1, KK3, 14, 8); + R( bl, cl, dl, el, al, F4, K4, 4, 9 ); + R( br, cr, dr, er, ar, F0, KK4, 12, 8); + R( al, bl, cl, dl, el, F4, K4, 0, 15 ); + R( ar, br, cr, dr, er, F0, KK4, 15, 5); + R( el, al, bl, cl, dl, F4, K4, 5, 5 ); + R( er, ar, br, cr, dr, F0, KK4, 10, 12); + R( dl, el, al, bl, cl, F4, K4, 9, 11 ); + R( dr, er, ar, br, cr, F0, KK4, 4, 9); + R( cl, dl, el, al, bl, F4, K4, 7, 6 ); + R( cr, dr, er, ar, br, F0, KK4, 1, 12); + R( bl, cl, dl, el, al, F4, K4, 12, 8 ); + R( br, cr, dr, er, ar, F0, KK4, 5, 5); + R( al, bl, cl, dl, el, F4, K4, 2, 13 ); + R( ar, br, cr, dr, er, F0, KK4, 8, 14); + R( el, al, bl, cl, dl, F4, K4, 10, 12 ); + R( er, ar, br, cr, dr, F0, KK4, 7, 6); + R( dl, el, al, bl, cl, F4, K4, 14, 5 ); + R( dr, er, ar, br, cr, F0, KK4, 6, 8); + R( cl, dl, el, al, bl, F4, K4, 1, 12 ); + R( cr, dr, er, ar, br, F0, KK4, 2, 13); + R( bl, cl, dl, el, al, F4, K4, 3, 13 ); + R( br, cr, dr, er, ar, F0, KK4, 13, 6); + R( al, bl, cl, dl, el, F4, K4, 8, 14 ); + R( ar, br, cr, dr, er, F0, KK4, 14, 5); + R( el, al, bl, cl, dl, F4, K4, 11, 11 ); + R( er, ar, br, cr, dr, F0, KK4, 0, 15); + R( dl, el, al, bl, cl, F4, K4, 6, 8 ); + R( dr, er, ar, br, cr, F0, KK4, 3, 13); + R( cl, dl, el, al, bl, F4, K4, 15, 5 ); + R( cr, dr, er, ar, br, F0, KK4, 9, 11); + R( bl, cl, dl, el, al, F4, K4, 13, 6 ); + R( br, cr, dr, er, ar, F0, KK4, 11, 11); + + dr += cl + hd->h1; + hd->h1 = hd->h2 + dl + er; + hd->h2 = hd->h3 + el + ar; + hd->h3 = hd->h4 + al + br; + hd->h4 = hd->h0 + bl + cr; + hd->h0 = dr; + + return /*burn_stack*/ 104+5*sizeof(void*); } ----------------------------------------------------------------------- Summary of changes: cipher/rmd160.c | 367 +++++++++++++++++++++++++++---------------------------- 1 file changed, 178 insertions(+), 189 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Sat Oct 4 14:48:14 2014 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sat, 04 Oct 2014 14:48:14 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-118-gde0ccd4 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via de0ccd4dce7ec185a678d78878d4538dd609ca0f (commit) from 30bd759f398f45b04d0a783b875f59ce9bd1e51d (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit de0ccd4dce7ec185a678d78878d4538dd609ca0f Author: Jussi Kivilinna Date: Sun Aug 31 13:17:24 2014 +0300 Add Whirlpool AMD64/SSE2 assembly implementation * cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. -- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna diff --git a/cipher/Makefile.am b/cipher/Makefile.am index c165356..7f45cbb 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -87,7 +87,7 @@ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S \ stribog.c \ tiger.c \ -whirlpool.c \ +whirlpool.c whirlpool-sse2-amd64.S \ twofish.c twofish-amd64.S twofish-arm.S \ rfc2268.c \ camellia.c camellia.h camellia-glue.c camellia-aesni-avx-amd64.S \ diff --git a/cipher/whirlpool-sse2-amd64.S b/cipher/whirlpool-sse2-amd64.S new file mode 100644 index 0000000..d0bcf2d --- /dev/null +++ b/cipher/whirlpool-sse2-amd64.S @@ -0,0 +1,335 @@ +/* whirlpool-sse2-amd64.S - AMD64 assembly implementation of Whirlpool + * + * Copyright (C) 2014 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#ifdef __x86_64 +#include +#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_WHIRLPOOL) + +#ifdef __PIC__ +# define RIP %rip +#else +# define RIP +#endif + +.text + +/* look-up table offsets on RTAB */ +#define RC (0) +#define C0 (RC + (8 * 10)) +#define C1 (C0 + (8 * 256)) +#define C2 (C1 + (8 * 256)) +#define C3 (C2 + (8 * 256)) +#define C4 (C3 + (8 * 256)) +#define C5 (C4 + (8 * 256)) +#define C6 (C5 + (8 * 256)) +#define C7 (C6 + (8 * 256)) + +/* stack variables */ +#define STACK_DATAP (0) +#define STACK_STATEP (STACK_DATAP + 8) +#define STACK_ROUNDS (STACK_STATEP + 8) +#define STACK_NBLKS (STACK_ROUNDS + 8) +#define STACK_RBP (STACK_NBLKS + 8) +#define STACK_RBX (STACK_RBP + 8) +#define STACK_R12 (STACK_RBX + 8) +#define STACK_R13 (STACK_R12 + 8) +#define STACK_R14 (STACK_R13 + 8) +#define STACK_R15 (STACK_R14 + 8) +#define STACK_MAX (STACK_R15 + 8) + +/* register macros */ +#define RTAB %rbp + +#define RI1 %rax +#define RI2 %rbx +#define RI3 %rcx +#define RI4 %rdx + +#define RI1d %eax +#define RI2d %ebx +#define RI3d %ecx +#define RI4d %edx + +#define RI1bl %al +#define RI2bl %bl +#define RI3bl %cl +#define RI4bl %dl + +#define RI1bh %ah +#define RI2bh %bh +#define RI3bh %ch +#define RI4bh %dh + +#define RB0 %r8 +#define RB1 %r9 +#define RB2 %r10 +#define RB3 %r11 +#define RB4 %r12 +#define RB5 %r13 +#define RB6 %r14 +#define RB7 %r15 + +#define RT0 %rsi +#define RT1 %rdi + +#define RT0d %esi +#define RT1d %edi + +#define XKEY0 %xmm0 +#define XKEY1 %xmm1 +#define XKEY2 %xmm2 +#define XKEY3 %xmm3 +#define XKEY4 %xmm4 +#define XKEY5 %xmm5 +#define XKEY6 %xmm6 +#define XKEY7 %xmm7 + +#define XSTATE0 %xmm8 +#define XSTATE1 %xmm9 +#define XSTATE2 %xmm10 +#define XSTATE3 %xmm11 +#define XSTATE4 %xmm12 +#define XSTATE5 %xmm13 +#define XSTATE6 %xmm14 +#define XSTATE7 %xmm15 + +/*********************************************************************** + * AMD64 assembly implementation of Whirlpool. + * - Using table-lookups + * - Store state in XMM registers + ***********************************************************************/ +#define __do_whirl(op, ri, \ + b0, b1, b2, b3, b4, b5, b6, b7, \ + load_ri, load_arg) \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrq $16, ri; \ + op ## q C7(RTAB,RT0,8), b7; \ + op ## q C6(RTAB,RT1,8), b6; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrq $16, ri; \ + op ## q C5(RTAB,RT0,8), b5; \ + op ## q C4(RTAB,RT1,8), b4; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + shrl $16, ri ## d; \ + op ## q C3(RTAB,RT0,8), b3; \ + op ## q C2(RTAB,RT1,8), b2; \ + movzbl ri ## bl, RT0d; \ + movzbl ri ## bh, RT1d; \ + load_ri( load_arg, ri); \ + op ## q C1(RTAB,RT0,8), b1; \ + op ## q C0(RTAB,RT1,8), b0; + +#define do_whirl(op, ri, rb_add, load_ri, load_arg) \ + __do_whirl(op, ##ri, rb_add, load_ri, load_arg) + +#define dummy(...) /*_*/ + +#define do_movq(src, dst) movq src, dst; + +#define RB_ADD0 RB0, RB1, RB2, RB3, RB4, RB5, RB6, RB7 +#define RB_ADD1 RB1, RB2, RB3, RB4, RB5, RB6, RB7, RB0 +#define RB_ADD2 RB2, RB3, RB4, RB5, RB6, RB7, RB0, RB1 +#define RB_ADD3 RB3, RB4, RB5, RB6, RB7, RB0, RB1, RB2 +#define RB_ADD4 RB4, RB5, RB6, RB7, RB0, RB1, RB2, RB3 +#define RB_ADD5 RB5, RB6, RB7, RB0, RB1, RB2, RB3, RB4 +#define RB_ADD6 RB6, RB7, RB0, RB1, RB2, RB3, RB4, RB5 +#define RB_ADD7 RB7, RB0, RB1, RB2, RB3, RB4, RB5, RB6 + +.align 8 +.globl _gcry_whirlpool_transform_amd64 +.type _gcry_whirlpool_transform_amd64, at function; + +_gcry_whirlpool_transform_amd64: + /* input: + * %rdi: state + * %rsi: inblk + * %rdx: nblks + * %rcx: look-up tables + */ + cmp $0, %rdx; + je .Lskip; + + subq $STACK_MAX, %rsp; + movq %rbp, STACK_RBP(%rsp); + movq %rbx, STACK_RBX(%rsp); + movq %r12, STACK_R12(%rsp); + movq %r13, STACK_R13(%rsp); + movq %r14, STACK_R14(%rsp); + movq %r15, STACK_R15(%rsp); + + movq %rdx, STACK_NBLKS(%rsp); + movq %rdi, STACK_STATEP(%rsp); + movq %rsi, STACK_DATAP(%rsp); + + movq %rcx, RTAB; + + jmp .Lfirst_block; + +.align 8 +.Lblock_loop: + movq STACK_DATAP(%rsp), %rsi; + movq RI1, %rdi; + +.Lfirst_block: + /* load data_block */ + movq 0*8(%rsi), RB0; + movq 1*8(%rsi), RB1; + bswapq RB0; + movq 2*8(%rsi), RB2; + bswapq RB1; + movq 3*8(%rsi), RB3; + bswapq RB2; + movq 4*8(%rsi), RB4; + bswapq RB3; + movq 5*8(%rsi), RB5; + bswapq RB4; + movq RB0, XSTATE0; + movq 6*8(%rsi), RB6; + bswapq RB5; + movq RB1, XSTATE1; + movq 7*8(%rsi), RB7; + bswapq RB6; + movq RB2, XSTATE2; + bswapq RB7; + movq RB3, XSTATE3; + movq RB4, XSTATE4; + movq RB5, XSTATE5; + movq RB6, XSTATE6; + movq RB7, XSTATE7; + + /* load key */ + movq 0*8(%rdi), XKEY0; + movq 1*8(%rdi), XKEY1; + movq 2*8(%rdi), XKEY2; + movq 3*8(%rdi), XKEY3; + movq 4*8(%rdi), XKEY4; + movq 5*8(%rdi), XKEY5; + movq 6*8(%rdi), XKEY6; + movq 7*8(%rdi), XKEY7; + + movq XKEY0, RI1; + movq XKEY1, RI2; + movq XKEY2, RI3; + movq XKEY3, RI4; + + /* prepare and store state */ + pxor XKEY0, XSTATE0; + pxor XKEY1, XSTATE1; + pxor XKEY2, XSTATE2; + pxor XKEY3, XSTATE3; + pxor XKEY4, XSTATE4; + pxor XKEY5, XSTATE5; + pxor XKEY6, XSTATE6; + pxor XKEY7, XSTATE7; + + movq XSTATE0, 0*8(%rdi); + movq XSTATE1, 1*8(%rdi); + movq XSTATE2, 2*8(%rdi); + movq XSTATE3, 3*8(%rdi); + movq XSTATE4, 4*8(%rdi); + movq XSTATE5, 5*8(%rdi); + movq XSTATE6, 6*8(%rdi); + movq XSTATE7, 7*8(%rdi); + + addq $64, STACK_DATAP(%rsp); + movl $(0), STACK_ROUNDS(%rsp); +.align 8 +.Lround_loop: + do_whirl(mov, RI1 /*XKEY0*/, RB_ADD0, do_movq, XKEY4); + do_whirl(xor, RI2 /*XKEY1*/, RB_ADD1, do_movq, XKEY5); + do_whirl(xor, RI3 /*XKEY2*/, RB_ADD2, do_movq, XKEY6); + do_whirl(xor, RI4 /*XKEY3*/, RB_ADD3, do_movq, XKEY7); + do_whirl(xor, RI1 /*XKEY0*/, RB_ADD4, do_movq, XSTATE0); + do_whirl(xor, RI2 /*XKEY1*/, RB_ADD5, do_movq, XSTATE1); + do_whirl(xor, RI3 /*XKEY2*/, RB_ADD6, do_movq, XSTATE2); + do_whirl(xor, RI4 /*XKEY3*/, RB_ADD7, do_movq, XSTATE3); + + movl STACK_ROUNDS(%rsp), RT0d; + movq RB1, XKEY1; + addl $1, STACK_ROUNDS(%rsp); + movq RB2, XKEY2; + movq RB3, XKEY3; + xorq RC(RTAB,RT0,8), RB0; /* Add round constant */ + movq RB4, XKEY4; + movq RB5, XKEY5; + movq RB0, XKEY0; + movq RB6, XKEY6; + movq RB7, XKEY7; + + do_whirl(xor, RI1 /*XSTATE0*/, RB_ADD0, do_movq, XSTATE4); + do_whirl(xor, RI2 /*XSTATE1*/, RB_ADD1, do_movq, XSTATE5); + do_whirl(xor, RI3 /*XSTATE2*/, RB_ADD2, do_movq, XSTATE6); + do_whirl(xor, RI4 /*XSTATE3*/, RB_ADD3, do_movq, XSTATE7); + + cmpl $10, STACK_ROUNDS(%rsp); + je .Lis_last_round; + + do_whirl(xor, RI1 /*XSTATE4*/, RB_ADD4, do_movq, XKEY0); + do_whirl(xor, RI2 /*XSTATE5*/, RB_ADD5, do_movq, XKEY1); + do_whirl(xor, RI3 /*XSTATE6*/, RB_ADD6, do_movq, XKEY2); + do_whirl(xor, RI4 /*XSTATE7*/, RB_ADD7, do_movq, XKEY3); + movq RB0, XSTATE0; + movq RB1, XSTATE1; + movq RB2, XSTATE2; + movq RB3, XSTATE3; + movq RB4, XSTATE4; + movq RB5, XSTATE5; + movq RB6, XSTATE6; + movq RB7, XSTATE7; + + jmp .Lround_loop; +.align 8 +.Lis_last_round: + do_whirl(xor, RI1 /*XSTATE4*/, RB_ADD4, dummy, _); + movq STACK_STATEP(%rsp), RI1; + do_whirl(xor, RI2 /*XSTATE5*/, RB_ADD5, dummy, _); + do_whirl(xor, RI3 /*XSTATE6*/, RB_ADD6, dummy, _); + do_whirl(xor, RI4 /*XSTATE7*/, RB_ADD7, dummy, _); + + /* store state */ + xorq RB0, 0*8(RI1); + xorq RB1, 1*8(RI1); + xorq RB2, 2*8(RI1); + xorq RB3, 3*8(RI1); + xorq RB4, 4*8(RI1); + xorq RB5, 5*8(RI1); + xorq RB6, 6*8(RI1); + xorq RB7, 7*8(RI1); + + subq $1, STACK_NBLKS(%rsp); + jnz .Lblock_loop; + + movq STACK_RBP(%rsp), %rbp; + movq STACK_RBX(%rsp), %rbx; + movq STACK_R12(%rsp), %r12; + movq STACK_R13(%rsp), %r13; + movq STACK_R14(%rsp), %r14; + movq STACK_R15(%rsp), %r15; + addq $STACK_MAX, %rsp; +.Lskip: + movl $(STACK_MAX + 8), %eax; + ret; +.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64; + +#endif +#endif diff --git a/cipher/whirlpool.c b/cipher/whirlpool.c index ffc6662..2732f63 100644 --- a/cipher/whirlpool.c +++ b/cipher/whirlpool.c @@ -40,6 +40,14 @@ #include "bufhelp.h" #include "hash-common.h" +/* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ +#undef USE_AMD64_ASM +#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# define USE_AMD64_ASM 1 +#endif + + + /* Size of a whirlpool block (in bytes). */ #define BLOCK_SIZE 64 @@ -89,8 +97,15 @@ typedef struct { + +struct whirlpool_tables_s { + u64 RC[R]; + u64 C[8][256]; +}; + +static const struct whirlpool_tables_s tab = +{ /* Round constants. */ -static const u64 rc[R] = { U64_C (0x1823c6e887b8014f), U64_C (0x36a6d2f5796f9152), @@ -102,13 +117,9 @@ static const u64 rc[R] = U64_C (0xe427418ba77d95d8), U64_C (0xfbee7c66dd17479e), U64_C (0xca2dbf07ad5a8333), - }; - - - + }, /* Main lookup boxes. */ -static const u64 C0[256] = - { + { { U64_C (0x18186018c07830d8), U64_C (0x23238c2305af4626), U64_C (0xc6c63fc67ef991b8), U64_C (0xe8e887e8136fcdfb), U64_C (0x878726874ca113cb), U64_C (0xb8b8dab8a9626d11), @@ -237,10 +248,7 @@ static const u64 C0[256] = U64_C (0x98985a98b4c22d2c), U64_C (0xa4a4aaa4490e55ed), U64_C (0x2828a0285d885075), U64_C (0x5c5c6d5cda31b886), U64_C (0xf8f8c7f8933fed6b), U64_C (0x8686228644a411c2), - }; - -static const u64 C1[256] = - { + }, { U64_C (0xd818186018c07830), U64_C (0x2623238c2305af46), U64_C (0xb8c6c63fc67ef991), U64_C (0xfbe8e887e8136fcd), U64_C (0xcb878726874ca113), U64_C (0x11b8b8dab8a9626d), @@ -369,10 +377,7 @@ static const u64 C1[256] = U64_C (0x2c98985a98b4c22d), U64_C (0xeda4a4aaa4490e55), U64_C (0x752828a0285d8850), U64_C (0x865c5c6d5cda31b8), U64_C (0x6bf8f8c7f8933fed), U64_C (0xc28686228644a411), - }; - -static const u64 C2[256] = - { + }, { U64_C (0x30d818186018c078), U64_C (0x462623238c2305af), U64_C (0x91b8c6c63fc67ef9), U64_C (0xcdfbe8e887e8136f), U64_C (0x13cb878726874ca1), U64_C (0x6d11b8b8dab8a962), @@ -501,10 +506,7 @@ static const u64 C2[256] = U64_C (0x2d2c98985a98b4c2), U64_C (0x55eda4a4aaa4490e), U64_C (0x50752828a0285d88), U64_C (0xb8865c5c6d5cda31), U64_C (0xed6bf8f8c7f8933f), U64_C (0x11c28686228644a4), - }; - -static const u64 C3[256] = - { + }, { U64_C (0x7830d818186018c0), U64_C (0xaf462623238c2305), U64_C (0xf991b8c6c63fc67e), U64_C (0x6fcdfbe8e887e813), U64_C (0xa113cb878726874c), U64_C (0x626d11b8b8dab8a9), @@ -633,10 +635,7 @@ static const u64 C3[256] = U64_C (0xc22d2c98985a98b4), U64_C (0x0e55eda4a4aaa449), U64_C (0x8850752828a0285d), U64_C (0x31b8865c5c6d5cda), U64_C (0x3fed6bf8f8c7f893), U64_C (0xa411c28686228644), - }; - -static const u64 C4[256] = - { + }, { U64_C (0xc07830d818186018), U64_C (0x05af462623238c23), U64_C (0x7ef991b8c6c63fc6), U64_C (0x136fcdfbe8e887e8), U64_C (0x4ca113cb87872687), U64_C (0xa9626d11b8b8dab8), @@ -765,10 +764,7 @@ static const u64 C4[256] = U64_C (0xb4c22d2c98985a98), U64_C (0x490e55eda4a4aaa4), U64_C (0x5d8850752828a028), U64_C (0xda31b8865c5c6d5c), U64_C (0x933fed6bf8f8c7f8), U64_C (0x44a411c286862286), - }; - -static const u64 C5[256] = - { + }, { U64_C (0x18c07830d8181860), U64_C (0x2305af462623238c), U64_C (0xc67ef991b8c6c63f), U64_C (0xe8136fcdfbe8e887), U64_C (0x874ca113cb878726), U64_C (0xb8a9626d11b8b8da), @@ -897,10 +893,7 @@ static const u64 C5[256] = U64_C (0x98b4c22d2c98985a), U64_C (0xa4490e55eda4a4aa), U64_C (0x285d8850752828a0), U64_C (0x5cda31b8865c5c6d), U64_C (0xf8933fed6bf8f8c7), U64_C (0x8644a411c2868622), - }; - -static const u64 C6[256] = - { + }, { U64_C (0x6018c07830d81818), U64_C (0x8c2305af46262323), U64_C (0x3fc67ef991b8c6c6), U64_C (0x87e8136fcdfbe8e8), U64_C (0x26874ca113cb8787), U64_C (0xdab8a9626d11b8b8), @@ -1029,10 +1022,7 @@ static const u64 C6[256] = U64_C (0x5a98b4c22d2c9898), U64_C (0xaaa4490e55eda4a4), U64_C (0xa0285d8850752828), U64_C (0x6d5cda31b8865c5c), U64_C (0xc7f8933fed6bf8f8), U64_C (0x228644a411c28686), - }; - -static const u64 C7[256] = - { + }, { U64_C (0x186018c07830d818), U64_C (0x238c2305af462623), U64_C (0xc63fc67ef991b8c6), U64_C (0xe887e8136fcdfbe8), U64_C (0x8726874ca113cb87), U64_C (0xb8dab8a9626d11b8), @@ -1161,7 +1151,18 @@ static const u64 C7[256] = U64_C (0x985a98b4c22d2c98), U64_C (0xa4aaa4490e55eda4), U64_C (0x28a0285d88507528), U64_C (0x5c6d5cda31b8865c), U64_C (0xf8c7f8933fed6bf8), U64_C (0x86228644a411c286), - }; + } } +}; +#define C tab.C +#define C0 C[0] +#define C1 C[1] +#define C2 C[2] +#define C3 C[3] +#define C4 C[4] +#define C5 C[5] +#define C6 C[6] +#define C7 C[7] +#define rc tab.RC @@ -1189,6 +1190,22 @@ whirlpool_init (void *ctx, unsigned int flags) } +#ifdef USE_AMD64_ASM + +extern unsigned int +_gcry_whirlpool_transform_amd64(u64 *state, const unsigned char *data, + size_t nblks, const struct whirlpool_tables_s *tables); + +static unsigned int +whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) +{ + whirlpool_context_t *context = ctx; + + return _gcry_whirlpool_transform_amd64( + context->hash_state, data, nblks, &tab); +} + +#else /* USE_AMD64_ASM */ /* * Transform block. @@ -1308,6 +1325,8 @@ whirlpool_transform ( void *c, const unsigned char *data, size_t nblks ) return burn; } +#endif /* !USE_AMD64_ASM */ + /* Bug compatibility Whirlpool version. */ static void diff --git a/configure.ac b/configure.ac index 18db662..d14b7f6 100644 --- a/configure.ac +++ b/configure.ac @@ -1943,6 +1943,13 @@ LIST_MEMBER(whirlpool, $enabled_digests) if test "$found" = "1" ; then GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool.lo" AC_DEFINE(USE_WHIRLPOOL, 1, [Defined if this module should be included]) + + case "${host}" in + x86_64-*-*) + # Build with the assembly implementation + GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool-sse2-amd64.lo" + ;; + esac fi # rmd160 and sha1 should be included always. ----------------------------------------------------------------------- Summary of changes: cipher/Makefile.am | 2 +- cipher/whirlpool-sse2-amd64.S | 335 +++++++++++++++++++++++++++++++++++++++++ cipher/whirlpool.c | 91 ++++++----- configure.ac | 7 + 4 files changed, 398 insertions(+), 37 deletions(-) create mode 100644 cipher/whirlpool-sse2-amd64.S hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From teichm at in.tum.de Tue Oct 7 18:41:29 2014 From: teichm at in.tum.de (Markus Teich) Date: Tue, 7 Oct 2014 18:41:29 +0200 Subject: [PATCH revised] Add gcry_mpi_ec_sub. In-Reply-To: <20140808142214.GF32507@trolle> References: <20140722122411.GG4246@yoink.cs.uwaterloo.ca> <1406738375-14267-1-git-send-email-teichm@in.tum.de> <20140808093726.GD32507@trolle> <877g2juo5c.fsf@vigenere.g10code.de> <20140808142214.GF32507@trolle> Message-ID: <20141007164129.GU2670@trolle> Markus Teich wrote: > thanks, I missed the docu on patch submission format. And now revised with the ?signed of? line. Sorry for the delay, but contributing to libgcrypt seems to be very time consuming? :( --Markus From teichm at in.tum.de Tue Oct 7 18:43:04 2014 From: teichm at in.tum.de (Markus Teich) Date: Tue, 7 Oct 2014 18:43:04 +0200 Subject: [PATCH revised] Add gcry_mpi_ec_sub. In-Reply-To: <20140808142214.GF32507@trolle> References: <20140722122411.GG4246@yoink.cs.uwaterloo.ca> <1406738375-14267-1-git-send-email-teichm@in.tum.de> <20140808093726.GD32507@trolle> <877g2juo5c.fsf@vigenere.g10code.de> <20140808142214.GF32507@trolle> Message-ID: <20141007164304.GV2670@trolle> Markus Teich wrote: > Is the attached patch ok? Adding the actual patch would be nice, right? :/ --Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-mpi-Add-gcry_mpi_ec_sub.patch Type: text/x-diff Size: 8693 bytes Desc: not available URL: From cvs at cvs.gnupg.org Wed Oct 8 14:51:37 2014 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Wed, 08 Oct 2014 14:51:37 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-120-ga078436 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via a078436be5b656e4a2acfaeb5f054b9991f617e5 (commit) via 5c906e2cdb14e93fb4915fdc69c7353a5fa35709 (commit) from de0ccd4dce7ec185a678d78878d4538dd609ca0f (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit a078436be5b656e4a2acfaeb5f054b9991f617e5 Author: Werner Koch Date: Wed Oct 8 14:42:36 2014 +0200 doc: Fix a configure option name. -- diff --git a/AUTHORS b/AUTHORS index 860dea2..f72a421 100644 --- a/AUTHORS +++ b/AUTHORS @@ -137,7 +137,7 @@ Authors with a DCO ================== Andrei Scherer -2014-0822:BF7CEF794F9.000003F0andsch at inbox.com: +2014-08-22:BF7CEF794F9.000003F0andsch at inbox.com: Christian Aistleitner 2013-02-26:20130226110144.GA12678 at quelltextlich.at: diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 58671df..63edf06 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -325,7 +325,7 @@ you are cross-compiling, it is useful to set the environment variable then first look for the helper program in the @file{bin} directory below that top directory. An absolute directory name must be used for @code{SYSROOT}. Finally, if the configure command line option - at code{--libgcrypt-prefix} is used, only its value is used for the top + at code{--with-libgcrypt-prefix} is used, only its value is used for the top directory below which the helper script is expected. @end defmac commit 5c906e2cdb14e93fb4915fdc69c7353a5fa35709 Author: Werner Koch Date: Wed Oct 8 14:41:21 2014 +0200 Fix prime test for 2 and lower and add check command to mpicalc. * cipher/primegen.c (check_prime): Return true for the small primes. (_gcry_prime_check): Return correct values for 2 and lower numbers. * src/mpicalc.c (do_primecheck): New. (main): Add command 'P'. (main): Allow for larger input data. diff --git a/cipher/primegen.c b/cipher/primegen.c index 14a5ccf..ce6db8d 100644 --- a/cipher/primegen.c +++ b/cipher/primegen.c @@ -868,7 +868,7 @@ check_prime( gcry_mpi_t prime, gcry_mpi_t val_2, int rm_rounds, for (i=0; (x = small_prime_numbers[i]); i++ ) { if ( mpi_divisible_ui( prime, x ) ) - return 0; + return !mpi_cmp_ui (prime, x); } /* A quick Fermat test. */ @@ -1169,19 +1169,20 @@ _gcry_prime_generate (gcry_mpi_t *prime, unsigned int prime_bits, gcry_err_code_t _gcry_prime_check (gcry_mpi_t x, unsigned int flags) { - gcry_err_code_t rc = 0; - gcry_mpi_t val_2 = mpi_alloc_set_ui (2); /* Used by the Fermat test. */ - (void)flags; + switch (mpi_cmp_ui (x, 2)) + { + case 0: return 0; /* 2 is a prime */ + case -1: return GPG_ERR_NO_PRIME; /* Only numbers > 1 are primes. */ + } + /* We use 64 rounds because the prime we are going to test is not guaranteed to be a random one. */ - if (! check_prime (x, val_2, 64, NULL, NULL)) - rc = GPG_ERR_NO_PRIME; - - mpi_free (val_2); + if (check_prime (x, mpi_const (MPI_C_TWO), 64, NULL, NULL)) + return 0; - return rc; + return GPG_ERR_NO_PRIME; } /* Find a generator for PRIME where the factorization of (prime-1) is diff --git a/src/mpicalc.c b/src/mpicalc.c index b2b4335..f1fbbef 100644 --- a/src/mpicalc.c +++ b/src/mpicalc.c @@ -254,6 +254,23 @@ do_nbits (void) } +static void +do_primecheck (void) +{ + gpg_error_t err; + + if (stackidx < 1) + { + fputs ("stack underflow\n", stderr); + return; + } + err = gcry_prime_check (stack[stackidx - 1], 0); + mpi_set_ui (stack[stackidx - 1], !err); + if (err && gpg_err_code (err) != GPG_ERR_NO_PRIME) + fprintf (stderr, "checking prime failed: %s\n", gpg_strerror (err)); +} + + static int my_getc (void) { @@ -295,6 +312,7 @@ print_help (void) "d dup item [-1] := [0] {+1}\n" "r reverse [0] := [1], [1] := [0] {0}\n" "b # of bits [0] := nbits([0]) {0}\n" + "P prime check [0] := is_prime([0])?1:0 {0}\n" "c clear stack\n" "p print top item\n" "f print the stack\n" @@ -313,7 +331,7 @@ main (int argc, char **argv) int print_config = 0; int i, c; int state = 0; - char strbuf[1000]; + char strbuf[4096]; int stridx = 0; if (argc) @@ -508,6 +526,9 @@ main (int argc, char **argv) case 'b': do_nbits (); break; + case 'P': + do_primecheck (); + break; case 'c': for (i = 0; i < stackidx; i++) { ----------------------------------------------------------------------- Summary of changes: AUTHORS | 2 +- cipher/primegen.c | 19 ++++++++++--------- doc/gcrypt.texi | 2 +- src/mpicalc.c | 23 ++++++++++++++++++++++- 4 files changed, 34 insertions(+), 12 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Wed Oct 8 15:01:15 2014 From: cvs at cvs.gnupg.org (by Markus Teich) Date: Wed, 08 Oct 2014 15:01:15 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-121-g23ecadf Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 23ecadf309f8056c35cc092e58df801ac0eab862 (commit) from a078436be5b656e4a2acfaeb5f054b9991f617e5 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 23ecadf309f8056c35cc092e58df801ac0eab862 Author: Markus Teich Date: Tue Oct 7 18:24:27 2014 +0200 mpi: Add gcry_mpi_ec_sub. * NEWS (gcry_mpi_ec_sub): New. * doc/gcrypt.texi (gcry_mpi_ec_sub): New. * mpi/ec.c (_gcry_mpi_ec_sub, sub_points_edwards): New. (sub_points_montgomery, sub_points_weierstrass): New stubs. * src/gcrypt-int.h (_gcry_mpi_ec_sub): New. * src/gcrypt.h.in (gcry_mpi_ec_sub): New. * src/libgcrypt.def (gcry_mpi_ec_sub): New. * src/libgcrypt.vers (gcry_mpi_ec_sub): New. * src/mpi.h (_gcry_mpi_ec_sub_points): New. * src/visibility.c (gcry_mpi_ec_sub): New. * src/visibility.h (gcry_mpi_ec_sub): New. -- This function subtracts two points on the curve. Only Twisted Edwards curves are supported with this change. Signed-off-by: Markus Teich diff --git a/NEWS b/NEWS index 214c676..0150fdd 100644 --- a/NEWS +++ b/NEWS @@ -29,6 +29,7 @@ Noteworthy changes in version 1.7.0 (unreleased) GCRYCTL_SET_SBOX NEW. gcry_cipher_set_sbox NEW macro. GCRY_MD_GOSTR3411_CP NEW. + gcry_mpi_ec_sub NEW. Noteworthy changes in version 1.6.0 (2013-12-16) diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 63edf06..108d53a 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -4806,6 +4806,15 @@ Add the points @var{u} and @var{v} of the elliptic curve described by @var{ctx} and store the result into @var{w}. @end deftypefun + at deftypefun void gcry_mpi_ec_sub ( @ + @w{gcry_mpi_point_t @var{w}}, @w{gcry_mpi_point_t @var{u}}, @ + @w{gcry_mpi_point_t @var{v}}, @w{gcry_ctx_t @var{ctx}}) + +Subtracts the point @var{v} from the point @var{u} of the elliptic +curve described by @var{ctx} and store the result into @var{w}. Only +Twisted Edwards curves are supported for now. + at end deftypefun + @deftypefun void gcry_mpi_ec_mul ( @ @w{gcry_mpi_point_t @var{w}}, @w{gcry_mpi_t @var{n}}, @ @w{gcry_mpi_point_t @var{u}}, @w{gcry_ctx_t @var{ctx}}) diff --git a/mpi/ec.c b/mpi/ec.c index a55291a..80f3b22 100644 --- a/mpi/ec.c +++ b/mpi/ec.c @@ -1131,6 +1131,71 @@ _gcry_mpi_ec_add_points (mpi_point_t result, } +/* RESULT = P1 - P2 (Weierstrass version).*/ +static void +sub_points_weierstrass (mpi_point_t result, + mpi_point_t p1, mpi_point_t p2, + mpi_ec_t ctx) +{ + (void)result; + (void)p1; + (void)p2; + (void)ctx; + log_fatal ("%s: %s not yet supported\n", + "_gcry_mpi_ec_sub_points", "Weierstrass"); +} + + +/* RESULT = P1 - P2 (Montgomery version).*/ +static void +sub_points_montgomery (mpi_point_t result, + mpi_point_t p1, mpi_point_t p2, + mpi_ec_t ctx) +{ + (void)result; + (void)p1; + (void)p2; + (void)ctx; + log_fatal ("%s: %s not yet supported\n", + "_gcry_mpi_ec_sub_points", "Montgomery"); +} + + +/* RESULT = P1 - P2 (Twisted Edwards version).*/ +static void +sub_points_edwards (mpi_point_t result, + mpi_point_t p1, mpi_point_t p2, + mpi_ec_t ctx) +{ + mpi_point_t p2i = _gcry_mpi_point_new (0); + point_set (p2i, p2); + _gcry_mpi_neg (p2i->x, p2i->x); + add_points_edwards (result, p1, p2i, ctx); + _gcry_mpi_point_release (p2i); +} + + +/* RESULT = P1 - P2 */ +void +_gcry_mpi_ec_sub_points (mpi_point_t result, + mpi_point_t p1, mpi_point_t p2, + mpi_ec_t ctx) +{ + switch (ctx->model) + { + case MPI_EC_WEIERSTRASS: + sub_points_weierstrass (result, p1, p2, ctx); + break; + case MPI_EC_MONTGOMERY: + sub_points_montgomery (result, p1, p2, ctx); + break; + case MPI_EC_EDWARDS: + sub_points_edwards (result, p1, p2, ctx); + break; + } +} + + /* Scalar point multiplication - the main function for ECC. If takes an integer SCALAR and a POINT as well as the usual context CTX. RESULT will be set to the resulting point. */ diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h index 8a6df84..918937b 100644 --- a/src/gcrypt-int.h +++ b/src/gcrypt-int.h @@ -430,6 +430,8 @@ int _gcry_mpi_ec_get_affine (gcry_mpi_t x, gcry_mpi_t y, gcry_mpi_point_t point, void _gcry_mpi_ec_dup (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_ctx_t ctx); void _gcry_mpi_ec_add (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_mpi_point_t v, mpi_ec_t ctx); +void _gcry_mpi_ec_sub (gcry_mpi_point_t w, + gcry_mpi_point_t u, gcry_mpi_point_t v, mpi_ec_t ctx); void _gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u, mpi_ec_t ctx); int _gcry_mpi_ec_curve_point (gcry_mpi_point_t w, mpi_ec_t ctx); diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index 65d9ef6..f3207c9 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -704,6 +704,10 @@ void gcry_mpi_ec_dup (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_ctx_t ctx); void gcry_mpi_ec_add (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx); +/* W = U - V. */ +void gcry_mpi_ec_sub (gcry_mpi_point_t w, + gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx); + /* W = N * U. */ void gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u, gcry_ctx_t ctx); diff --git a/src/libgcrypt.def b/src/libgcrypt.def index 57ed490..924f17f 100644 --- a/src/libgcrypt.def +++ b/src/libgcrypt.def @@ -276,5 +276,7 @@ EXPORTS gcry_mac_ctl @242 gcry_mac_get_algo @243 + gcry_mpi_ec_sub @244 + ;; end of file with public symbols for Windows. diff --git a/src/libgcrypt.vers b/src/libgcrypt.vers index 7ee0541..7e8df3f 100644 --- a/src/libgcrypt.vers +++ b/src/libgcrypt.vers @@ -105,7 +105,7 @@ GCRYPT_1.6 { gcry_mpi_ec_get_mpi; gcry_mpi_ec_get_point; gcry_mpi_ec_set_mpi; gcry_mpi_ec_set_point; gcry_mpi_ec_get_affine; - gcry_mpi_ec_dup; gcry_mpi_ec_add; gcry_mpi_ec_mul; + gcry_mpi_ec_dup; gcry_mpi_ec_add; gcry_mpi_ec_sub; gcry_mpi_ec_mul; gcry_mpi_ec_curve_point; gcry_log_debug; diff --git a/src/mpi.h b/src/mpi.h index 7407b7f..13b5117 100644 --- a/src/mpi.h +++ b/src/mpi.h @@ -286,6 +286,9 @@ void _gcry_mpi_ec_dup_point (mpi_point_t result, void _gcry_mpi_ec_add_points (mpi_point_t result, mpi_point_t p1, mpi_point_t p2, mpi_ec_t ctx); +void _gcry_mpi_ec_sub_points (mpi_point_t result, + mpi_point_t p1, mpi_point_t p2, + mpi_ec_t ctx); void _gcry_mpi_ec_mul_point (mpi_point_t result, gcry_mpi_t scalar, mpi_point_t point, mpi_ec_t ctx); diff --git a/src/visibility.c b/src/visibility.c index 6ed57ca..fa23e53 100644 --- a/src/visibility.c +++ b/src/visibility.c @@ -567,6 +567,14 @@ gcry_mpi_ec_add (gcry_mpi_point_t w, } void +gcry_mpi_ec_sub (gcry_mpi_point_t w, + gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx) +{ + _gcry_mpi_ec_sub_points (w, u, v, + _gcry_ctx_get_pointer (ctx, CONTEXT_TYPE_EC)); +} + +void gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u, gcry_ctx_t ctx) { diff --git a/src/visibility.h b/src/visibility.h index 96b5235..fa3c763 100644 --- a/src/visibility.h +++ b/src/visibility.h @@ -218,6 +218,7 @@ MARK_VISIBLEX (gcry_mpi_copy) MARK_VISIBLEX (gcry_mpi_div) MARK_VISIBLEX (gcry_mpi_dump) MARK_VISIBLEX (gcry_mpi_ec_add) +MARK_VISIBLEX (gcry_mpi_ec_sub) MARK_VISIBLEX (gcry_mpi_ec_curve_point) MARK_VISIBLEX (gcry_mpi_ec_dup) MARK_VISIBLEX (gcry_mpi_ec_get_affine) @@ -486,6 +487,7 @@ MARK_VISIBLEX (_gcry_mpi_get_const) #define gcry_mpi_abs _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_mpi_ec_add _gcry_USE_THE_UNDERSCORED_FUNCTION +#define gcry_mpi_ec_sub _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_mpi_ec_curve_point _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_mpi_ec_dup _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_mpi_ec_get_affine _gcry_USE_THE_UNDERSCORED_FUNCTION ----------------------------------------------------------------------- Summary of changes: NEWS | 1 + doc/gcrypt.texi | 9 ++++++++ mpi/ec.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++ src/gcrypt-int.h | 2 ++ src/gcrypt.h.in | 4 ++++ src/libgcrypt.def | 2 ++ src/libgcrypt.vers | 2 +- src/mpi.h | 3 +++ src/visibility.c | 8 +++++++ src/visibility.h | 2 ++ 10 files changed, 97 insertions(+), 1 deletion(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Wed Oct 8 15:01:05 2014 From: wk at gnupg.org (Werner Koch) Date: Wed, 08 Oct 2014 15:01:05 +0200 Subject: [PATCH revised] Add gcry_mpi_ec_sub. In-Reply-To: <20141007164129.GU2670@trolle> (Markus Teich's message of "Tue, 7 Oct 2014 18:41:29 +0200") References: <20140722122411.GG4246@yoink.cs.uwaterloo.ca> <1406738375-14267-1-git-send-email-teichm@in.tum.de> <20140808093726.GD32507@trolle> <877g2juo5c.fsf@vigenere.g10code.de> <20140808142214.GF32507@trolle> <20141007164129.GU2670@trolle> Message-ID: <87r3yi90mm.fsf@vigenere.g10code.de> On Tue, 7 Oct 2014 18:41, teichm at in.tum.de said: > And now revised with the ?signed of? line. Sorry for the delay, but contributing > to libgcrypt seems to be very time consuming? :( As is the maintaining ... Pushed. Thanks. Please send a DCO to this list (see doc/HACKING). Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From vcizek at suse.cz Wed Oct 8 14:40:31 2014 From: vcizek at suse.cz (Vitezslav Cizek) Date: Wed, 8 Oct 2014 14:40:31 +0200 Subject: FIPS 186-4 compliance patches for rsa/dsa/ecdsa Message-ID: <20141008124029.GA3566@ursa.suse.cz> Hi, The libgcrypt code isn't compliant with the latest FIPS 186-4. There are some changes necessary, especially in the key generation code. I've created issue 1736. (https://bugs.g10code.com/gnupg/issue1736) Patches are attached there. Can someone please review them? -- Vita Cizek -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: From teichm at in.tum.de Wed Oct 8 20:05:10 2014 From: teichm at in.tum.de (Markus Teich) Date: Wed, 8 Oct 2014 20:05:10 +0200 Subject: [PATCH revised] Add gcry_mpi_ec_sub. In-Reply-To: <87r3yi90mm.fsf@vigenere.g10code.de> References: <20140722122411.GG4246@yoink.cs.uwaterloo.ca> <1406738375-14267-1-git-send-email-teichm@in.tum.de> <20140808093726.GD32507@trolle> <877g2juo5c.fsf@vigenere.g10code.de> <20140808142214.GF32507@trolle> <20141007164129.GU2670@trolle> <87r3yi90mm.fsf@vigenere.g10code.de> Message-ID: <20141008180509.GA2770@trolle> Werner Koch wrote: > Please send a DCO to this list (see doc/HACKING). I did, but the mail from my non-subscribed gpg-adress was silently dropped. :/ Here it is again: Libgcrypt Developer's Certificate of Origin. Version 1.0 ========================================================= By making a contribution to the Libgcrypt project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the free software license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate free software license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same free software license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the free software license(s) involved. Signed-off-by: Markus Teich And I have a few questions about that: Say A certifies (a), (b) and (c), but not (d) and B certifies all four parts. If A creates a patch, sends it to B and B publishes it, would that be a valid contribution? What if A only meant to show the patch to B for review but not for public sharing? Is it ok to sign an address X with the gpg key, which only contains address Y as I am doing right now? At least doc/HACKING does not forbid it. --Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: From cvs at cvs.gnupg.org Thu Oct 9 08:31:40 2014 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 09 Oct 2014 08:31:40 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-122-g669a83b Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 669a83ba86c38b271d85ed4bf1cabc7cc8160583 (commit) from 23ecadf309f8056c35cc092e58df801ac0eab862 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 669a83ba86c38b271d85ed4bf1cabc7cc8160583 Author: Werner Koch Date: Thu Oct 9 08:31:35 2014 +0200 Register DCO for Markus Teich -- diff --git a/AUTHORS b/AUTHORS index f72a421..e186a48 100644 --- a/AUTHORS +++ b/AUTHORS @@ -157,6 +157,9 @@ Jussi Kivilinna Jussi Kivilinna 2013-05-06:5186720A.4090101 at iki.fi: +Markus Teich +2014-10-08:20141008180509.GA2770 at trolle: + Milan Broz 2014-01-13:52D44CC6.4050707 at gmail.com: ----------------------------------------------------------------------- Summary of changes: AUTHORS | 3 +++ 1 file changed, 3 insertions(+) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Thu Oct 9 08:38:06 2014 From: wk at gnupg.org (Werner Koch) Date: Thu, 09 Oct 2014 08:38:06 +0200 Subject: [PATCH revised] Add gcry_mpi_ec_sub. In-Reply-To: <20141008180509.GA2770@trolle> (Markus Teich's message of "Wed, 8 Oct 2014 20:05:10 +0200") References: <20140722122411.GG4246@yoink.cs.uwaterloo.ca> <1406738375-14267-1-git-send-email-teichm@in.tum.de> <20140808093726.GD32507@trolle> <877g2juo5c.fsf@vigenere.g10code.de> <20140808142214.GF32507@trolle> <20141007164129.GU2670@trolle> <87r3yi90mm.fsf@vigenere.g10code.de> <20141008180509.GA2770@trolle> Message-ID: <871tqh694h.fsf@vigenere.g10code.de> On Wed, 8 Oct 2014 20:05, teichm at in.tum.de said: > I did, but the mail from my non-subscribed gpg-adress was silently dropped. :/ > Here it is again: Is is probably in the moderator queue. > And I have a few questions about that: > Say A certifies (a), (b) and (c), but not (d) and B certifies all four parts. If > A creates a patch, sends it to B and B publishes it, would that be a valid > contribution? What if A only meant to show the patch to B for review but not for > public sharing? At the end of the chain it always goes back to (a) or (b) which requires "and I have the right to submit it under the free software license" or "and I have the right under that license to submit that work with modifications" Thus you need to decide whether you have the right to submit (i.e. publish) it. > Is it ok to sign an address X with the gpg key, which only contains address Y as > I am doing right now? At least doc/HACKING does not forbid it. If its you, I am fine with it ;-) Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Thu Oct 9 08:51:25 2014 From: wk at gnupg.org (Werner Koch) Date: Thu, 09 Oct 2014 08:51:25 +0200 Subject: FIPS 186-4 compliance patches for rsa/dsa/ecdsa In-Reply-To: <20141008124029.GA3566@ursa.suse.cz> (Vitezslav Cizek's message of "Wed, 8 Oct 2014 14:40:31 +0200") References: <20141008124029.GA3566@ursa.suse.cz> Message-ID: <87wq894txu.fsf@vigenere.g10code.de> Hi, I am not very inclined to add patches just for the sake of sell-it-to-the-gov specs. In particular not if a quick sample shows - /* We ignore step 1 from pksc5v2.1 which demands a check that dklen - is not larger that 0xffffffff * hlen. */ + /* Step 1 */ + /* If dkLen > (2^32 - 1) * hLen, output "derived key too long" and stop. */ + if (dklen > 4294967295 * hlen) + return GPG_ERR_INV_VALUE; Which is wrong. 0xffffffff * hlen overflows on many architectures and the condition does not work as expected. It does not also not help that you changed DKLEN from int to long under the assumption that sizeof(long) > sizeof (int) - which is for example wrong for the majority of desktop systems. If you want to have these patches considered, please format them and the commit logs according to doc/HACKING and send a DCO. An description of why these changes benefit would also be appreciated. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From andsch at inbox.com Mon Oct 13 16:47:30 2014 From: andsch at inbox.com (And Sch) Date: Mon, 13 Oct 2014 06:47:30 -0800 Subject: comparison between signed and unsigned integer Message-ID: <4C8B1AEE2C4.0000006Candsch@inbox.com> I recently added '-Wextra' to my compile flags and I get many of the following warnings when compiling libgcrypt. warning: comparison between signed and unsigned integer expressions [-Wsign-compare] I have looked through them all and most of them are comparing a signed counter with size_t or unsigned int, which should be benign. However, researching the warning there are certain nasty bugs that appear if the signed int is ever negative... http://www.jwwalker.com/pages/safe-compare.html https://www.securecoding.cert.org/confluence/display/cplusplus/INT02-CPP.+Understand+integer+conversion+rules Now, Werner Koch said in the bug tracker that fixing this may introduce bugs, and I would agree. It probably wouldn't be worthwhile because there are no obvious bugs ATM. However, here is my second proposal, why not add a call to assert() before the comparison to make sure the signed int is not negative. This shouldn't introduce any bugs AFAIK, and can be turned off globally. ____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop! Check it out at http://www.inbox.com/marineaquarium