From wk at gnupg.org Fri Oct 2 15:08:36 2015 From: wk at gnupg.org (Werner Koch) Date: Fri, 02 Oct 2015 15:08:36 +0200 Subject: Typo in documentation In-Reply-To: <560A913D.7020104@gmail.com> (Ben Wiederhake's message of "Tue, 29 Sep 2015 15:25:17 +0200") References: <560A913D.7020104@gmail.com> Message-ID: <87pp0x8lkb.fsf@vigenere.g10code.de> On Tue, 29 Sep 2015 15:25, ben.wiederhake at gmail.com said: > The very first code example contains the misspelled word "intialized", > instead of "in**i**tialized". Thanks for reporting. Fixed in my local repo will eventually be pushed. Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Fri Oct 2 15:06:07 2015 From: wk at gnupg.org (Werner Koch) Date: Fri, 02 Oct 2015 15:06:07 +0200 Subject: Determine interest: AES with IGE mode? In-Reply-To: <560AAEEE.3060607@gmail.com> (Ben Wiederhake's message of "Tue, 29 Sep 2015 17:31:58 +0200") References: <56095952.7000504@gmail.com> <5609B182.7010708@gmail.com> <20150929133828.GA23454@pi.ip.fi> <560AAEEE.3060607@gmail.com> Message-ID: <87wpv58log.fsf@vigenere.g10code.de> On Tue, 29 Sep 2015 17:31, ben.wiederhake at gmail.com said: > If there are any concrete concerns about security, it may be worth to > put it into libgcrypt as deprecated. Then: > - People who desparately need AES_IGE (like us) have access to it. > - People who don't really require it can see that it is deprecated. Interesting NEWS line then * Support for the new but deprecated IGE mode. Given that our cipher mode implementation is pretty modular I am not against adding it as long as there is only a generic mode and no bulk mode optimization. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Fri Oct 2 15:11:49 2015 From: wk at gnupg.org (Werner Koch) Date: Fri, 02 Oct 2015 15:11:49 +0200 Subject: [PATCH] Add missing entry in gitignore. In-Reply-To: <56055AB2.8020806@gmail.com> (Ben Wiederhake's message of "Fri, 25 Sep 2015 16:31:14 +0200") References: <56055AB2.8020806@gmail.com> Message-ID: <87lhbl8ley.fsf@vigenere.g10code.de> On Fri, 25 Sep 2015 16:31, ben.wiederhake at gmail.com said: > and ./configure, git reports the directory as "not clean" due to > the "new" file tests/hashtest-256g generated by configure. Thanks merged into your other typo fix commit. But you should really do VPATH builds ;-) Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From ben.wiederhake at gmail.com Fri Oct 2 15:18:06 2015 From: ben.wiederhake at gmail.com (Ben Wiederhake) Date: Fri, 02 Oct 2015 15:18:06 +0200 Subject: Fwd: Re: Determine interest: AES with IGE mode? In-Reply-To: <560E83C0.9070707@gmail.com> References: <560E83C0.9070707@gmail.com> Message-ID: <560E840E.4040304@gmail.com> Whoops, forgot to CC the mailing list. -------- Weitergeleitete Nachricht -------- Betreff: Re: Determine interest: AES with IGE mode? Datum: Fri, 02 Oct 2015 15:16:48 +0200 Von: Ben Wiederhake An: Werner Koch Hello, >> If there are any concrete concerns about security, it may be worth to >> put it into libgcrypt as deprecated. Then: >> - People who desparately need AES_IGE (like us) have access to it. >> - People who don't really require it can see that it is deprecated. > > Interesting NEWS line then > > * Support for the new but deprecated IGE mode. I know, sorry, but there definitely are people who are going to need it. > Given that our cipher mode implementation is pretty modular I am not > against adding it as long as there is only a generic mode and no bulk > mode optimization. I absolutely agree. In some not-really-representative tests ("encode a 2 GiB file on a quiet system"), the encryption process was only limited by my harddrive, potentially exceeding 60 MiB/s. While that's slow in comparison to highly optimised AES implementations, it's still pretty good, given that it's not even using the optimized buf_xor function (or whatever it was called). So there is (hopefully) no need in such a highly optimized version. With regards Ben Wiederhake From cvs at cvs.gnupg.org Tue Oct 13 05:34:50 2015 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Tue, 13 Oct 2015 05:34:50 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-263-g73374fd Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 73374fdd27c7ba28b19f9672c68a6f5b72252fe5 (commit) from 3a3d5410cc83f7069c7cb1ab384905f382292d32 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 73374fdd27c7ba28b19f9672c68a6f5b72252fe5 Author: NIIBE Yutaka Date: Tue Oct 13 12:28:00 2015 +0900 Fix declaration of return type. * src/gcrypt-int.h (_gcry_sexp_extract_param): Return gpg_error_t. * cipher/dsa.c (dsa_generate): Fix call to _gcry_sexp_extract_param. * src/g10lib.h (_gcry_vcontrol): Return gcry_err_code_t. * src/visibility.c (gcry_mpi_snatch): Fix call to _gcry_mpi_snatch. -- GnuPG-bug-id: 2074 diff --git a/cipher/dsa.c b/cipher/dsa.c index 09cd969..723f690 100644 --- a/cipher/dsa.c +++ b/cipher/dsa.c @@ -968,12 +968,14 @@ dsa_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) static gcry_err_code_t dsa_check_secret_key (gcry_sexp_t keyparms) { + gcry_error_t err; gcry_err_code_t rc; DSA_secret_key sk = {NULL, NULL, NULL, NULL, NULL}; - rc = _gcry_sexp_extract_param (keyparms, NULL, "pqgyx", - &sk.p, &sk.q, &sk.g, &sk.y, &sk.x, - NULL); + err = _gcry_sexp_extract_param (keyparms, NULL, "pqgyx", + &sk.p, &sk.q, &sk.g, &sk.y, &sk.x, + NULL); + rc = gpg_err_code (err); if (rc) goto leave; diff --git a/src/g10lib.h b/src/g10lib.h index 50a08ec..d1f9426 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -92,7 +92,7 @@ /*-- src/global.c -*/ int _gcry_global_is_operational (void); -gcry_error_t _gcry_vcontrol (enum gcry_ctl_cmds cmd, va_list arg_ptr); +gcry_err_code_t _gcry_vcontrol (enum gcry_ctl_cmds cmd, va_list arg_ptr); void _gcry_check_heap (const void *a); int _gcry_get_debug_flag (unsigned int mask); diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h index 29d4fd3..ea3c8d5 100644 --- a/src/gcrypt-int.h +++ b/src/gcrypt-int.h @@ -329,10 +329,10 @@ void *_gcry_sexp_nth_buffer (const gcry_sexp_t list, int number, size_t *rlength); char *_gcry_sexp_nth_string (gcry_sexp_t list, int number); gcry_mpi_t _gcry_sexp_nth_mpi (gcry_sexp_t list, int number, int mpifmt); -gpg_err_code_t _gcry_sexp_extract_param (gcry_sexp_t sexp, - const char *path, - const char *list, - ...) _GCRY_GCC_ATTR_SENTINEL(0); +gpg_error_t _gcry_sexp_extract_param (gcry_sexp_t sexp, + const char *path, + const char *list, + ...) _GCRY_GCC_ATTR_SENTINEL(0); #define sexp_new(a, b, c, d) _gcry_sexp_new ((a), (b), (c), (d)) #define sexp_create(a, b, c, d, e) _gcry_sexp_create ((a), (b), (c), (d), (e)) diff --git a/src/visibility.c b/src/visibility.c index fa23e53..3e1f28b 100644 --- a/src/visibility.c +++ b/src/visibility.c @@ -292,7 +292,7 @@ gcry_mpi_copy (const gcry_mpi_t a) void gcry_mpi_snatch (gcry_mpi_t w, const gcry_mpi_t u) { - return _gcry_mpi_snatch (w, u); + _gcry_mpi_snatch (w, u); } gcry_mpi_t ----------------------------------------------------------------------- Summary of changes: cipher/dsa.c | 8 +++++--- src/g10lib.h | 2 +- src/gcrypt-int.h | 8 ++++---- src/visibility.c | 2 +- 4 files changed, 11 insertions(+), 9 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From gniibe at fsij.org Tue Oct 13 05:26:15 2015 From: gniibe at fsij.org (NIIBE Yutaka) Date: Tue, 13 Oct 2015 12:26:15 +0900 Subject: GCC 5 compiling libgcrypt/cipher/rijndael-aesni.c Message-ID: <561C79D7.40506@fsij.org> Hello, With master branch, I encountered failure of building libgcrypt/cipher/rijndael-aesni.c for i686. $ gcc --version gcc (Debian 5.2.1-17) 5.2.1 20150911 ===================================== /bin/bash ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../libgcrypt/cipher -I.. -I../src -I../../libgcrypt/src -g -O2 -fvisibility=hidden -Wall -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast -Wwrite-strings -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -MT rijndael-aesni.lo -MD -MP -MF .deps/rijndael-aesni.Tpo -c -o rijndael-aesni.lo ../../libgcrypt/cipher/rijndael-aesni.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../libgcrypt/cipher -I.. -I../src -I../../libgcrypt/src -g -O2 -fvisibility=hidden -Wall -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast -Wwrite-strings -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -MT rijndael-aesni.lo -MD -MP -MF .deps/rijndael-aesni.Tpo -c ../../libgcrypt/cipher/rijndael-aesni.c -fPIC -DPIC -o .libs/rijndael-aesni.o ../../libgcrypt/cipher/rijndael-aesni.c: In function '_gcry_aes_aesni_ctr_enc': ../../libgcrypt/cipher/rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints asm volatile (/* detect if 8-bit carry handling is needed */ ^ Makefile:639: recipe for target 'rijndael-aesni.lo' failed make[2]: *** [rijndael-aesni.lo] Error 1 ===================================== -- From cvs at cvs.gnupg.org Tue Oct 13 07:48:04 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Tue, 13 Oct 2015 07:48:04 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-264-gfa94b61 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via fa94b6111948a614ebdcb67f7942eced8b84c579 (commit) from 73374fdd27c7ba28b19f9672c68a6f5b72252fe5 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit fa94b6111948a614ebdcb67f7942eced8b84c579 Author: Jussi Kivilinna Date: Tue Oct 13 08:33:00 2015 +0300 Fix compiling AES/AES-NI implementation on linux-i386 * cipher/rijndael-aesni.c (do_aesni_ctr_4): Split assembly block in two parts to reduce number of register constraints needed. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 5c85903..97e0ad0 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -961,8 +961,17 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, aesenclast_xmm1_xmm2 aesenclast_xmm1_xmm3 aesenclast_xmm1_xmm4 + : + : [ctr] "r" (ctr), + [key] "r" (ctx->keyschenc), + [rounds] "g" (ctx->rounds), + [addb_1] "m" (bige_addb_const[0][0]), + [addb_2] "m" (bige_addb_const[1][0]), + [addb_3] "m" (bige_addb_const[2][0]), + [addb_4] "m" (bige_addb_const[3][0]) + : "%esi", "cc", "memory"); - "movdqu (%[src]), %%xmm1\n\t" /* Get block 1. */ + asm volatile ("movdqu (%[src]), %%xmm1\n\t" /* Get block 1. */ "pxor %%xmm1, %%xmm0\n\t" /* EncCTR-1 ^= input */ "movdqu %%xmm0, (%[dst])\n\t" /* Store block 1 */ @@ -977,18 +986,10 @@ do_aesni_ctr_4 (const RIJNDAEL_context *ctx, "movdqu 48(%[src]), %%xmm1\n\t" /* Get block 4. */ "pxor %%xmm1, %%xmm4\n\t" /* EncCTR-4 ^= input */ "movdqu %%xmm4, 48(%[dst])" /* Store block 4. */ - : - : [ctr] "r" (ctr), - [src] "r" (a), - [dst] "r" (b), - [key] "r" (ctx->keyschenc), - [rounds] "g" (ctx->rounds), - [addb_1] "m" (bige_addb_const[0][0]), - [addb_2] "m" (bige_addb_const[1][0]), - [addb_3] "m" (bige_addb_const[2][0]), - [addb_4] "m" (bige_addb_const[3][0]) - : "%esi", "cc", "memory"); + : [src] "r" (a), + [dst] "r" (b) + : "memory"); #undef aesenc_xmm1_xmm0 #undef aesenc_xmm1_xmm2 #undef aesenc_xmm1_xmm3 ----------------------------------------------------------------------- Summary of changes: cipher/rijndael-aesni.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Tue Oct 13 07:50:08 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 13 Oct 2015 08:50:08 +0300 Subject: GCC 5 compiling libgcrypt/cipher/rijndael-aesni.c In-Reply-To: <561C79D7.40506@fsij.org> References: <561C79D7.40506@fsij.org> Message-ID: <561C9B90.7070002@iki.fi> Hello, On 13.10.2015 06:26, NIIBE Yutaka wrote: > Hello, > > With master branch, I encountered failure of building > libgcrypt/cipher/rijndael-aesni.c for i686. This happens with gcc-4.9 too. That assembly block is trying to use too many register constraints for SysV/i386. I pushed fix for this to libgcrypt/master. -Jussi > > $ gcc --version > gcc (Debian 5.2.1-17) 5.2.1 20150911 > > ===================================== > /bin/bash ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../libgcrypt/cipher -I.. -I../src -I../../libgcrypt/src -g -O2 > -fvisibility=hidden -Wall -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast > -Wwrite-strings -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -MT rijndael-aesni.lo -MD -MP > -MF .deps/rijndael-aesni.Tpo -c -o rijndael-aesni.lo ../../libgcrypt/cipher/rijndael-aesni.c > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../libgcrypt/cipher -I.. -I../src -I../../libgcrypt/src -g -O2 -fvisibility=hidden -Wall > -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast -Wwrite-strings > -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -MT rijndael-aesni.lo -MD -MP -MF > .deps/rijndael-aesni.Tpo -c ../../libgcrypt/cipher/rijndael-aesni.c -fPIC -DPIC -o .libs/rijndael-aesni.o > ../../libgcrypt/cipher/rijndael-aesni.c: In function '_gcry_aes_aesni_ctr_enc': > ../../libgcrypt/cipher/rijndael-aesni.c:817:3: error: 'asm' operand has impossible constraints > asm volatile (/* detect if 8-bit carry handling is needed */ > ^ > Makefile:639: recipe for target 'rijndael-aesni.lo' failed > make[2]: *** [rijndael-aesni.lo] Error 1 > ===================================== > -- > > _______________________________________________ > Gcrypt-devel mailing list > Gcrypt-devel at gnupg.org > http://lists.gnupg.org/mailman/listinfo/gcrypt-devel > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 648 bytes Desc: OpenPGP digital signature URL: From gilles.vanassche at st.com Tue Oct 13 10:59:41 2015 From: gilles.vanassche at st.com (Gilles Van Assche) Date: Tue, 13 Oct 2015 10:59:41 +0200 Subject: Keccak in libgcrypt Message-ID: <561CC7FD.1010905@st.com> Dear all, As an everyday user of GPG and co-designer of Keccak, I am happy to see that you recently added Keccak/SHA-3 to libgcrypt's development branch. :-) As I went through your code, I had some comments?please see below. Anyway, please let me know if I can help. Best wishes and happy coding, Gilles My comments are the following: 1) The piece of code that you started from (i.e., Keccak-readable-and-compact.c) is not meant to be fast. There are faster alternatives available in the Keccak Code Package. In particular, you can find implementations aiming for speed on 32-bit platforms (including ARM assembly code) and on 64-bit platforms. These implementations are organized in a way that only the code for the Keccak-f[1600] permutation and state input/output depends on the target platform. The rest is implemented using platform-independent code. 2) It would be great to add support for the SHAKE's. I understand that they are a new kind of functions (?extendable-output functions?), so requiring a different interface compared to traditional hash functions, but they can become really useful in the context of RSA (replacing the MGF) and ECDSA (or EdDSA). 3) The type KECCAK_CONTEXT contains a buffer (byte buf[MD_BLOCK_MAX_BLOCKSIZE]) that is quite large. In fact, the sponge construction does not need such a buffer, namely, the input bytes can be XORed into the state as they arrive. The Keccak Code Package readily implements this. From cvs at cvs.gnupg.org Wed Oct 14 05:09:26 2015 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Wed, 14 Oct 2015 05:09:26 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-265-g813565a Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 813565a07ca575c87e1252c6ed26018653ecd338 (commit) from fa94b6111948a614ebdcb67f7942eced8b84c579 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 813565a07ca575c87e1252c6ed26018653ecd338 Author: NIIBE Yutaka Date: Wed Oct 14 11:52:40 2015 +0900 Fix gpg_error_t and gpg_err_code_t confusion. * src/gcrypt-int.h (_gcry_sexp_extract_param): Revert the change. * cipher/dsa.c (dsa_check_secret_key): Ditto. * src/sexp.c (_gcry_sexp_extract_param): Return gpg_err_code_t. * src/gcrypt-int.h (_gcry_err_make_from_errno) (_gcry_error_from_errno): Return gpg_error_t. * cipher/cipher.c (_gcry_cipher_open_internal) (_gcry_cipher_ctl, _gcry_cipher_ctl): Don't use gcry_error. * src/global.c (_gcry_vcontrol): Likewise. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Use gpg_err_code_from_syserror. * cipher/mac.c (mac_reset, mac_setkey, mac_setiv, mac_write) (mac_read, mac_verify): Return gcry_err_code_t. * cipher/rsa-common.c (mgf1): Use gcry_err_code_t for ERR. * src/visibility.c (gcry_error_from_errno): Return gpg_error_t. -- Reverting a part of 73374fdd and fix _gcry_sexp_extract_param return type, instead. Fix similar coding mistakes, throughout. diff --git a/cipher/cipher.c b/cipher/cipher.c index 30c2f48..ab9f0dc 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -590,7 +590,7 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, *handle = err ? NULL : h; - return gcry_error (err); + return err; } @@ -1271,10 +1271,10 @@ _gcry_cipher_ctl (gcry_cipher_hd_t h, int cmd, void *buffer, size_t buflen) size_t authtaglen; if (h->mode != GCRY_CIPHER_MODE_CCM) - return gcry_error (GPG_ERR_INV_CIPHER_MODE); + return GPG_ERR_INV_CIPHER_MODE; if (!buffer || buflen != 3 * sizeof(u64)) - return gcry_error (GPG_ERR_INV_ARG); + return GPG_ERR_INV_ARG; /* This command is used to pass additional length parameters needed by CCM mode to initialize CBC-MAC. */ @@ -1317,7 +1317,7 @@ _gcry_cipher_ctl (gcry_cipher_hd_t h, int cmd, void *buffer, size_t buflen) /* This command expects NULL for H and BUFFER to point to an integer with the algo number. */ if( h || !buffer || buflen != sizeof(int) ) - return gcry_error (GPG_ERR_CIPHER_ALGO); + return GPG_ERR_CIPHER_ALGO; disable_cipher_algo( *(int*)buffer ); break; diff --git a/cipher/dsa.c b/cipher/dsa.c index 723f690..01d153f 100644 --- a/cipher/dsa.c +++ b/cipher/dsa.c @@ -968,14 +968,12 @@ dsa_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) static gcry_err_code_t dsa_check_secret_key (gcry_sexp_t keyparms) { - gcry_error_t err; gcry_err_code_t rc; DSA_secret_key sk = {NULL, NULL, NULL, NULL, NULL}; - err = _gcry_sexp_extract_param (keyparms, NULL, "pqgyx", + rc = _gcry_sexp_extract_param (keyparms, NULL, "pqgyx", &sk.p, &sk.q, &sk.g, &sk.y, &sk.x, NULL); - rc = gpg_err_code (err); if (rc) goto leave; diff --git a/cipher/ecc-eddsa.c b/cipher/ecc-eddsa.c index 1e95489..2a52b78 100644 --- a/cipher/ecc-eddsa.c +++ b/cipher/ecc-eddsa.c @@ -508,7 +508,7 @@ _gcry_ecc_eddsa_genkey (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, hash_d = xtrymalloc_secure (2*b); if (!hash_d) { - rc = gpg_error_from_syserror (); + rc = gpg_err_code_from_syserror (); goto leave; } dlen = b; diff --git a/cipher/mac.c b/cipher/mac.c index 9bb360c..b8a5534 100644 --- a/cipher/mac.c +++ b/cipher/mac.c @@ -241,7 +241,7 @@ mac_open (gcry_mac_hd_t * hd, int algo, int secure, gcry_ctx_t ctx) } -static gcry_error_t +static gcry_err_code_t mac_reset (gcry_mac_hd_t hd) { if (hd->spec->ops->reset) @@ -263,7 +263,7 @@ mac_close (gcry_mac_hd_t hd) } -static gcry_error_t +static gcry_err_code_t mac_setkey (gcry_mac_hd_t hd, const void *key, size_t keylen) { if (!hd->spec->ops->setkey) @@ -275,7 +275,7 @@ mac_setkey (gcry_mac_hd_t hd, const void *key, size_t keylen) } -static gcry_error_t +static gcry_err_code_t mac_setiv (gcry_mac_hd_t hd, const void *iv, size_t ivlen) { if (!hd->spec->ops->setiv) @@ -287,7 +287,7 @@ mac_setiv (gcry_mac_hd_t hd, const void *iv, size_t ivlen) } -static gcry_error_t +static gcry_err_code_t mac_write (gcry_mac_hd_t hd, const void *inbuf, size_t inlen) { if (!hd->spec->ops->write) @@ -299,7 +299,7 @@ mac_write (gcry_mac_hd_t hd, const void *inbuf, size_t inlen) } -static gcry_error_t +static gcry_err_code_t mac_read (gcry_mac_hd_t hd, void *outbuf, size_t * outlen) { if (!outbuf || !outlen || *outlen == 0 || !hd->spec->ops->read) @@ -309,7 +309,7 @@ mac_read (gcry_mac_hd_t hd, void *outbuf, size_t * outlen) } -static gcry_error_t +static gcry_err_code_t mac_verify (gcry_mac_hd_t hd, const void *buf, size_t buflen) { if (!buf || buflen == 0 || !hd->spec->ops->verify) diff --git a/cipher/rsa-common.c b/cipher/rsa-common.c index f56e989..b260142 100644 --- a/cipher/rsa-common.c +++ b/cipher/rsa-common.c @@ -393,7 +393,7 @@ mgf1 (unsigned char *output, size_t outlen, unsigned char *seed, size_t seedlen, size_t dlen, nbytes, n; int idx; gcry_md_hd_t hd; - gcry_error_t err; + gcry_err_code_t err; err = _gcry_md_open (&hd, algo, 0); if (err) diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h index ea3c8d5..8014d61 100644 --- a/src/gcrypt-int.h +++ b/src/gcrypt-int.h @@ -277,7 +277,7 @@ _gcry_err_code_to_errno (gcry_err_code_t code) /* Return an error value with the error source SOURCE and the system error ERR. */ -static inline gcry_err_code_t +static inline gcry_error_t _gcry_err_make_from_errno (gpg_err_source_t source, int err) { return gpg_err_make_from_errno (source, err); @@ -285,7 +285,7 @@ _gcry_err_make_from_errno (gpg_err_source_t source, int err) /* Return an error value with the system error ERR. */ -static inline gcry_err_code_t +static inline gcry_error_t _gcry_error_from_errno (int err) { return gpg_error (gpg_err_code_from_errno (err)); @@ -329,10 +329,10 @@ void *_gcry_sexp_nth_buffer (const gcry_sexp_t list, int number, size_t *rlength); char *_gcry_sexp_nth_string (gcry_sexp_t list, int number); gcry_mpi_t _gcry_sexp_nth_mpi (gcry_sexp_t list, int number, int mpifmt); -gpg_error_t _gcry_sexp_extract_param (gcry_sexp_t sexp, - const char *path, - const char *list, - ...) _GCRY_GCC_ATTR_SENTINEL(0); +gpg_err_code_t _gcry_sexp_extract_param (gcry_sexp_t sexp, + const char *path, + const char *list, + ...) _GCRY_GCC_ATTR_SENTINEL(0); #define sexp_new(a, b, c, d) _gcry_sexp_new ((a), (b), (c), (d)) #define sexp_create(a, b, c, d, e) _gcry_sexp_create ((a), (b), (c), (d), (e)) diff --git a/src/global.c b/src/global.c index 4e8df86..2290393 100644 --- a/src/global.c +++ b/src/global.c @@ -490,7 +490,7 @@ _gcry_vcontrol (enum gcry_ctl_cmds cmd, va_list arg_ptr) _gcry_set_preferred_rng_type (0); rc = _gcry_rndegd_set_socket_name (va_arg (arg_ptr, const char *)); #else - rc = gpg_error (GPG_ERR_NOT_SUPPORTED); + rc = GPG_ERR_NOT_SUPPORTED; #endif break; diff --git a/src/sexp.c b/src/sexp.c index 1c014e0..f1bbffa 100644 --- a/src/sexp.c +++ b/src/sexp.c @@ -2423,7 +2423,7 @@ _gcry_sexp_vextract_param (gcry_sexp_t sexp, const char *path, return rc; } -gpg_error_t +gpg_err_code_t _gcry_sexp_extract_param (gcry_sexp_t sexp, const char *path, const char *list, ...) { @@ -2433,5 +2433,5 @@ _gcry_sexp_extract_param (gcry_sexp_t sexp, const char *path, va_start (arg_ptr, list); rc = _gcry_sexp_vextract_param (sexp, path, list, arg_ptr); va_end (arg_ptr); - return gpg_error (rc); + return rc; } diff --git a/src/visibility.c b/src/visibility.c index 3e1f28b..cbf24e7 100644 --- a/src/visibility.c +++ b/src/visibility.c @@ -57,7 +57,7 @@ gcry_err_make_from_errno (gcry_err_source_t source, int err) return _gcry_err_make_from_errno (source, err); } -gcry_err_code_t +gcry_error_t gcry_error_from_errno (int err) { return _gcry_error_from_errno (err); ----------------------------------------------------------------------- Summary of changes: cipher/cipher.c | 8 ++++---- cipher/dsa.c | 4 +--- cipher/ecc-eddsa.c | 2 +- cipher/mac.c | 12 ++++++------ cipher/rsa-common.c | 2 +- src/gcrypt-int.h | 12 ++++++------ src/global.c | 2 +- src/sexp.c | 4 ++-- src/visibility.c | 2 +- 9 files changed, 23 insertions(+), 25 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From xxiao8 at fosiao.com Wed Oct 14 05:45:05 2015 From: xxiao8 at fosiao.com (xxiao8) Date: Tue, 13 Oct 2015 22:45:05 -0500 Subject: libgcrypt FSM shutdown state -- what does this mean? Message-ID: <561DCFC1.9060501@fosiao.com> Based on https://gnupg.org/documentation/manuals/gcrypt/FIPS-Finite-State-Machine.html if anything went wrong it enters to Shutdown state, what does this mean to a library? I have cryptsetup that uses this library? how do I know libgcrypt has shutdown? Is there a way to monitor libgcrypt that it's still running OK and not in Fatal-Erro-->Shutdown state? Thanks, xxiao From wk at gnupg.org Wed Oct 14 11:19:39 2015 From: wk at gnupg.org (Werner Koch) Date: Wed, 14 Oct 2015 11:19:39 +0200 Subject: libgcrypt FSM shutdown state -- what does this mean? In-Reply-To: <561DCFC1.9060501@fosiao.com> (xxiao8@fosiao.com's message of "Tue, 13 Oct 2015 22:45:05 -0500") References: <561DCFC1.9060501@fosiao.com> Message-ID: <87pp0hu7ro.fsf@vigenere.g10code.de> On Wed, 14 Oct 2015 05:45, xxiao8 at fosiao.com said: > if anything went wrong it enters to Shutdown state, what does this > mean to a library? I have cryptsetup that uses this library? how do I > know libgcrypt has shutdown? Is there a way to monitor libgcrypt that The whole state machinery is more for documentary purposes because we can't map the FIPS required states onto our processing model. Thus you won't see all states and in particular not the shutdown state. Instead we end up here _gcry_fips_noreturn (void) { #ifdef HAVE_SYSLOG syslog (LOG_USER|LOG_ERR, "Libgcrypt terminated the application"); #endif /*HAVE_SYSLOG*/ fflush (NULL); abort (); /*NOTREACHED*/ } for severe errors. What you can do is to check wether libgcrypt is in operational mode (selftests done and not in an error state or not runnign in FIPS mode): if (gcry_control (GCRYCTL_OPERATIONAL_P, 0)) puts ("Libgcrypt is operational"); Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From cvs at cvs.gnupg.org Thu Oct 15 04:30:24 2015 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Thu, 15 Oct 2015 04:30:24 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-266-g1c6d269 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 1c6d2698a84e4bf82735287c1d64954bfc1a1982 (commit) from 813565a07ca575c87e1252c6ed26018653ecd338 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 1c6d2698a84e4bf82735287c1d64954bfc1a1982 Author: NIIBE Yutaka Date: Thu Oct 15 11:28:54 2015 +0900 Fix double free on error. * src/hmac256.c (_gcry_hmac256_finalize): Don't free HD. diff --git a/src/hmac256.c b/src/hmac256.c index 94a26da..6b62ed3 100644 --- a/src/hmac256.c +++ b/src/hmac256.c @@ -426,10 +426,8 @@ _gcry_hmac256_finalize (hmac256_context_t hd, size_t *r_dlen) tmphd = _gcry_hmac256_new (NULL, 0); if (!tmphd) - { - free (hd); - return NULL; - } + return NULL; + _gcry_hmac256_update (tmphd, hd->opad, 64); _gcry_hmac256_update (tmphd, hd->buf, 32); finalize (tmphd); ----------------------------------------------------------------------- Summary of changes: src/hmac256.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From xxiao8 at fosiao.com Thu Oct 15 05:09:44 2015 From: xxiao8 at fosiao.com (xxiao8) Date: Wed, 14 Oct 2015 22:09:44 -0500 Subject: libgcrypt FSM shutdown state -- what does this mean? In-Reply-To: <87pp0hu7ro.fsf@vigenere.g10code.de> References: <561DCFC1.9060501@fosiao.com> <87pp0hu7ro.fsf@vigenere.g10code.de> Message-ID: <561F18F8.8000109@fosiao.com> I don't have a self-written program instead I use cryptsetup that uses libgcrypt, will it suffice to say: 1. if cryptsetup runs successfully under fips mode that means libgcrypt passes its selftest internally? I don't see how to invoke a self-test under fips externally(I have a kernel running fips mode) 2. if for whatever reason, libgcrypt enters into 'fatal-error/shutdown' state, the cryptsetup will quit as it can not use the library any more, is this correct? will the libgcrypt "disable" itself so other programs that use it can no longer function? otherwise how can I detect libgcrypt went bad? Thanks xxiao On 10/14/2015 04:19 AM, Werner Koch wrote: > On Wed, 14 Oct 2015 05:45, xxiao8 at fosiao.com said: > >> if anything went wrong it enters to Shutdown state, what does this >> mean to a library? I have cryptsetup that uses this library? how do I >> know libgcrypt has shutdown? Is there a way to monitor libgcrypt that > > The whole state machinery is more for documentary purposes because we > can't map the FIPS required states onto our processing model. Thus you > won't see all states and in particular not the shutdown state. Instead > we end up here > > _gcry_fips_noreturn (void) > { > #ifdef HAVE_SYSLOG > syslog (LOG_USER|LOG_ERR, "Libgcrypt terminated the application"); > #endif /*HAVE_SYSLOG*/ > fflush (NULL); > abort (); > /*NOTREACHED*/ > } > > for severe errors. > > What you can do is to check wether libgcrypt is in operational mode > (selftests done and not in an error state or not runnign in FIPS mode): > > if (gcry_control (GCRYCTL_OPERATIONAL_P, 0)) > puts ("Libgcrypt is operational"); > > > Salam-Shalom, > > Werner > > From cvs at cvs.gnupg.org Thu Oct 22 03:38:10 2015 From: cvs at cvs.gnupg.org (by NIIBE Yutaka) Date: Thu, 22 Oct 2015 03:38:10 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-267-gf7505b5 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via f7505b550dd591e33d3a3fab9277c43c460f1bad (commit) from 1c6d2698a84e4bf82735287c1d64954bfc1a1982 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit f7505b550dd591e33d3a3fab9277c43c460f1bad Author: NIIBE Yutaka Date: Thu Oct 22 09:58:24 2015 +0900 md: keep contexts for HMAC in GcryDigestEntry. * cipher/md.c (struct gcry_md_context): Add flags.hmac. Remove macpads and mcpads_Bsize. (md_open): Initialize flags.hmac. Remove macpads initialization. (md_enable): Allocate contexts when flags.hmac is enabled. (md_copy): Remove macpads copying. Add copying contexts. (_gcry_md_reset): When flags.hmac is enabled, restore precomputed context with input pad (md_close): Remove macpads wiping. (md_final): When flags.hmac is enabled, compute hmac by precomputed context with output pad. (prepare_macpads): Prepare precomputed contexts with input pad and output pad for each registered digest entry. (_gcry_md_setkey): Just call prepare_macpads. -- This change is making things straight in HMAC computation. This makes HMAC computation allow multple algorithms in future. Libgcrypt's code has a potential to compute digests for multiple algorithms at once (currently, it's not enabled). HMAC code didn't work well with multple algorithms, because the macpads were only allocated for an algorithm. Now, it's allocated for each algorithm. We now precompute hash contexts, instead of keeping input pad and output pad. This can be performance improvement, which is described in RFC 2104. Thanks to: Andrea Visconti, Simone Bossi, Hany Ragab and Alexandro Cal? For the discussion and their paper of CANS2015, which titled: On the weaknesses of PBKDF2 diff --git a/cipher/md.c b/cipher/md.c index 19b2c9b..c6bf90d 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -108,10 +108,9 @@ struct gcry_md_context unsigned int secure: 1; unsigned int finalized:1; unsigned int bugemu1:1; + unsigned int hmac:1; } flags; GcryDigestEntry *list; - byte *macpads; - int macpads_Bsize; /* Blocksize as used for the HMAC pads. */ }; @@ -331,43 +330,8 @@ md_open (gcry_md_hd_t *h, int algo, unsigned int flags) ctx->magic = secure ? CTX_MAGIC_SECURE : CTX_MAGIC_NORMAL; ctx->actual_handle_size = n + sizeof (struct gcry_md_context); ctx->flags.secure = secure; + ctx->flags.hmac = hmac; ctx->flags.bugemu1 = !!(flags & GCRY_MD_FLAG_BUGEMU1); - - if (hmac) - { - switch (algo) - { - case GCRY_MD_SHA3_224: - ctx->macpads_Bsize = 1152 / 8; - break; - case GCRY_MD_SHA3_256: - ctx->macpads_Bsize = 1088 / 8; - break; - case GCRY_MD_SHA3_384: - ctx->macpads_Bsize = 832 / 8; - break; - case GCRY_MD_SHA3_512: - ctx->macpads_Bsize = 576 / 8; - break; - case GCRY_MD_SHA384: - case GCRY_MD_SHA512: - ctx->macpads_Bsize = 128; - break; - case GCRY_MD_GOSTR3411_94: - case GCRY_MD_GOSTR3411_CP: - ctx->macpads_Bsize = 32; - break; - default: - ctx->macpads_Bsize = 64; - break; - } - ctx->macpads = xtrymalloc_secure (2*(ctx->macpads_Bsize)); - if (!ctx->macpads) - { - err = gpg_err_code_from_errno (errno); - md_close (hd); - } - } } if (! err) @@ -447,7 +411,7 @@ md_enable (gcry_md_hd_t hd, int algorithm) if (!err) { size_t size = (sizeof (*entry) - + spec->contextsize + + spec->contextsize * (h->flags.hmac? 3 : 1) - sizeof (entry->context)); /* And allocate a new list entry. */ @@ -515,17 +479,6 @@ md_copy (gcry_md_hd_t ahd, gcry_md_hd_t *b_hd) memcpy (b, a, sizeof *a); b->list = NULL; b->debug = NULL; - if (a->macpads) - { - b->macpads = xtrymalloc_secure (2*(a->macpads_Bsize)); - if (! b->macpads) - { - err = gpg_err_code_from_errno (errno); - md_close (bhd); - } - else - memcpy (b->macpads, a->macpads, (2*(a->macpads_Bsize))); - } } /* Copy the complete list of algorithms. The copied list is @@ -535,13 +488,9 @@ md_copy (gcry_md_hd_t ahd, gcry_md_hd_t *b_hd) for (ar = a->list; ar; ar = ar->next) { if (a->flags.secure) - br = xtrymalloc_secure (sizeof *br - + ar->spec->contextsize - - sizeof(ar->context)); + br = xtrymalloc_secure (ar->actual_struct_size); else - br = xtrymalloc (sizeof *br - + ar->spec->contextsize - - sizeof (ar->context)); + br = xtrymalloc (ar->actual_struct_size); if (!br) { err = gpg_err_code_from_errno (errno); @@ -549,8 +498,7 @@ md_copy (gcry_md_hd_t ahd, gcry_md_hd_t *b_hd) break; } - memcpy (br, ar, (sizeof (*br) + ar->spec->contextsize - - sizeof (ar->context))); + memcpy (br, ar, ar->actual_struct_size); br->next = b->list; b->list = br; } @@ -591,14 +539,19 @@ _gcry_md_reset (gcry_md_hd_t a) a->bufpos = a->ctx->flags.finalized = 0; - for (r = a->ctx->list; r; r = r->next) - { - memset (r->context.c, 0, r->spec->contextsize); - (*r->spec->init) (&r->context.c, - a->ctx->flags.bugemu1? GCRY_MD_FLAG_BUGEMU1:0); - } - if (a->ctx->macpads) - md_write (a, a->ctx->macpads, a->ctx->macpads_Bsize); /* inner pad */ + if (a->ctx->flags.hmac) + for (r = a->ctx->list; r; r = r->next) + { + memcpy (r->context.c, r->context.c + r->spec->contextsize, + r->spec->contextsize); + } + else + for (r = a->ctx->list; r; r = r->next) + { + memset (r->context.c, 0, r->spec->contextsize); + (*r->spec->init) (&r->context.c, + a->ctx->flags.bugemu1? GCRY_MD_FLAG_BUGEMU1:0); + } } @@ -618,12 +571,6 @@ md_close (gcry_md_hd_t a) xfree (r); } - if (a->ctx->macpads) - { - wipememory (a->ctx->macpads, 2*(a->ctx->macpads_Bsize)); - xfree(a->ctx->macpads); - } - wipememory (a, a->ctx->actual_handle_size); xfree(a); } @@ -686,66 +633,120 @@ md_final (gcry_md_hd_t a) a->ctx->flags.finalized = 1; - if (a->ctx->macpads) + if (!a->ctx->flags.hmac) + return; + + for (r = a->ctx->list; r; r = r->next) { - /* Finish the hmac. */ - int algo = md_get_algo (a); - byte *p = md_read (a, algo); - size_t dlen = md_digest_length (algo); - gcry_md_hd_t om; + byte *p = r->spec->read (&r->context.c); + size_t dlen = r->spec->mdlen; + byte *hash; gcry_err_code_t err; - err = md_open (&om, algo, - ((a->ctx->flags.secure? GCRY_MD_FLAG_SECURE:0) - | (a->ctx->flags.bugemu1? GCRY_MD_FLAG_BUGEMU1:0))); - if (err) - _gcry_fatal_error (err, NULL); - md_write (om, - (a->ctx->macpads)+(a->ctx->macpads_Bsize), - a->ctx->macpads_Bsize); - md_write (om, p, dlen); - md_final (om); - /* Replace our digest with the mac (they have the same size). */ - memcpy (p, md_read (om, algo), dlen); - md_close (om); + if (a->ctx->flags.secure) + hash = xtrymalloc_secure (dlen); + else + hash = xtrymalloc (dlen); + if (!hash) + { + err = gpg_err_code_from_errno (errno); + _gcry_fatal_error (err, NULL); + } + + memcpy (hash, p, dlen); + memcpy (r->context.c, r->context.c + r->spec->contextsize * 2, + r->spec->contextsize); + (*r->spec->write) (&r->context.c, hash, dlen); + (*r->spec->final) (&r->context.c); + xfree (hash); } } static gcry_err_code_t -prepare_macpads (gcry_md_hd_t hd, const unsigned char *key, size_t keylen) +prepare_macpads (gcry_md_hd_t a, const unsigned char *key, size_t keylen) { - int i; - int algo = md_get_algo (hd); - unsigned char *helpkey = NULL; - unsigned char *ipad, *opad; + GcryDigestEntry *r; - if (!algo) + if (!a->ctx->list) return GPG_ERR_DIGEST_ALGO; /* Might happen if no algo is enabled. */ - if ( keylen > hd->ctx->macpads_Bsize ) + for (r = a->ctx->list; r; r = r->next) { - helpkey = xtrymalloc_secure (md_digest_length (algo)); - if (!helpkey) - return gpg_err_code_from_errno (errno); - _gcry_md_hash_buffer (algo, helpkey, key, keylen); - key = helpkey; - keylen = md_digest_length (algo); - gcry_assert ( keylen <= hd->ctx->macpads_Bsize ); - } + const unsigned char *k; + size_t k_len; + unsigned char *key_allocated = NULL; + int macpad_Bsize; + int i; - memset ( hd->ctx->macpads, 0, 2*(hd->ctx->macpads_Bsize) ); - ipad = hd->ctx->macpads; - opad = (hd->ctx->macpads)+(hd->ctx->macpads_Bsize); - memcpy ( ipad, key, keylen ); - memcpy ( opad, key, keylen ); - for (i=0; i < hd->ctx->macpads_Bsize; i++ ) - { - ipad[i] ^= 0x36; - opad[i] ^= 0x5c; + switch (r->spec->algo) + { + case GCRY_MD_SHA3_224: + macpad_Bsize = 1152 / 8; + break; + case GCRY_MD_SHA3_256: + macpad_Bsize = 1088 / 8; + break; + case GCRY_MD_SHA3_384: + macpad_Bsize = 832 / 8; + break; + case GCRY_MD_SHA3_512: + macpad_Bsize = 576 / 8; + break; + case GCRY_MD_SHA384: + case GCRY_MD_SHA512: + macpad_Bsize = 128; + break; + case GCRY_MD_GOSTR3411_94: + case GCRY_MD_GOSTR3411_CP: + macpad_Bsize = 32; + break; + default: + macpad_Bsize = 64; + break; + } + + if ( keylen > macpad_Bsize ) + { + k = key_allocated = xtrymalloc_secure (r->spec->mdlen); + if (!k) + return gpg_err_code_from_errno (errno); + _gcry_md_hash_buffer (r->spec->algo, key_allocated, key, keylen); + k_len = r->spec->mdlen; + gcry_assert ( k_len <= macpad_Bsize ); + } + else + { + k = key; + k_len = keylen; + } + + (*r->spec->init) (&r->context.c, + a->ctx->flags.bugemu1? GCRY_MD_FLAG_BUGEMU1:0); + a->bufpos = 0; + for (i=0; i < k_len; i++ ) + _gcry_md_putc (a, k[i] ^ 0x36); + for (; i < macpad_Bsize; i++ ) + _gcry_md_putc (a, 0x36); + (*r->spec->write) (&r->context.c, a->buf, a->bufpos); + memcpy (r->context.c + r->spec->contextsize, r->context.c, + r->spec->contextsize); + + (*r->spec->init) (&r->context.c, + a->ctx->flags.bugemu1? GCRY_MD_FLAG_BUGEMU1:0); + a->bufpos = 0; + for (i=0; i < k_len; i++ ) + _gcry_md_putc (a, k[i] ^ 0x5c); + for (; i < macpad_Bsize; i++ ) + _gcry_md_putc (a, 0x5c); + (*r->spec->write) (&r->context.c, a->buf, a->bufpos); + memcpy (r->context.c + r->spec->contextsize*2, r->context.c, + r->spec->contextsize); + + xfree (key_allocated); } - xfree (helpkey); + a->bufpos = 0; return 0; } @@ -780,14 +781,9 @@ _gcry_md_setkey (gcry_md_hd_t hd, const void *key, size_t keylen) { gcry_err_code_t rc; - if (!hd->ctx->macpads) - rc = GPG_ERR_CONFLICT; - else - { - rc = prepare_macpads (hd, key, keylen); - if (!rc) - _gcry_md_reset (hd); - } + rc = prepare_macpads (hd, key, keylen); + if (!rc) + _gcry_md_reset (hd); return rc; } ----------------------------------------------------------------------- Summary of changes: cipher/md.c | 244 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 120 insertions(+), 124 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Fri Oct 23 21:28:03 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 23 Oct 2015 22:28:03 +0300 Subject: [PATCH] bench-slope: add KDF/PBKDF2 benchmark Message-ID: <20151023192803.5460.47179.stgit@localhost6.localdomain6> * tests/bench-slope.c (bench_kdf_mode, bench_kdf_init, bench_kdf_free) (bench_kdf_do_bench, kdf_ops, kdf_bench_one, kdf_bench): New. (print_help): Add 'kdf'. (main): Add KDF benchmarks. -- Introduce KDF benchmarking to bench-slope. Output is given as nanosecs/iter (and cycles/iter if --cpu-mhz used). Only PBKDF2 is support with this initial patch. For example, below shows output of KDF bench-slope before and after commit "md: keep contexts for HMAC in GcryDigestEntry", on Intel Core i5-4570 @ 3.2 Ghz: Before: $ tests/bench-slope --cpu-mhz 3201 kdf KDF: | nanosecs/iter cycles/iter PBKDF2-HMAC-MD5 | 882.4 2824.7 PBKDF2-HMAC-SHA1 | 832.6 2665.0 PBKDF2-HMAC-RIPEMD160 | 1148.3 3675.6 PBKDF2-HMAC-TIGER192 | 1339.6 4288.2 PBKDF2-HMAC-SHA256 | 1460.5 4675.1 PBKDF2-HMAC-SHA384 | 1723.2 5515.8 PBKDF2-HMAC-SHA512 | 1729.1 5534.7 PBKDF2-HMAC-SHA224 | 1424.0 4558.3 PBKDF2-HMAC-WHIRLPOOL | 2459.7 7873.5 PBKDF2-HMAC-TIGER | 1350.2 4322.1 PBKDF2-HMAC-TIGER2 | 1348.7 4317.3 PBKDF2-HMAC-GOSTR3411_94 | 7374.1 23604.4 PBKDF2-HMAC-STRIBOG256 | 6060.0 19398.1 PBKDF2-HMAC-STRIBOG512 | 7512.8 24048.3 PBKDF2-HMAC-GOSTR3411_CP | 7378.3 23618.0 PBKDF2-HMAC-SHA3-224 | 2789.6 8929.5 PBKDF2-HMAC-SHA3-256 | 2785.1 8915.0 PBKDF2-HMAC-SHA3-384 | 2955.5 9460.5 PBKDF2-HMAC-SHA3-512 | 2859.7 9153.9 = After: $ tests/bench-slope --cpu-mhz 3201 kdf KDF: | nanosecs/iter cycles/iter PBKDF2-HMAC-MD5 | 405.9 1299.2 PBKDF2-HMAC-SHA1 | 392.1 1255.0 PBKDF2-HMAC-RIPEMD160 | 540.9 1731.5 PBKDF2-HMAC-TIGER192 | 637.1 2039.4 PBKDF2-HMAC-SHA256 | 691.8 2214.3 PBKDF2-HMAC-SHA384 | 848.0 2714.3 PBKDF2-HMAC-SHA512 | 875.7 2803.1 PBKDF2-HMAC-SHA224 | 689.2 2206.0 PBKDF2-HMAC-WHIRLPOOL | 1535.6 4915.5 PBKDF2-HMAC-TIGER | 636.3 2036.7 PBKDF2-HMAC-TIGER2 | 636.6 2037.7 PBKDF2-HMAC-GOSTR3411_94 | 5311.5 17002.2 PBKDF2-HMAC-STRIBOG256 | 4308.0 13790.0 PBKDF2-HMAC-STRIBOG512 | 5767.4 18461.4 PBKDF2-HMAC-GOSTR3411_CP | 5309.4 16995.4 PBKDF2-HMAC-SHA3-224 | 1333.1 4267.2 PBKDF2-HMAC-SHA3-256 | 1327.8 4250.4 PBKDF2-HMAC-SHA3-384 | 1392.8 4458.3 PBKDF2-HMAC-SHA3-512 | 1428.5 4572.7 = Signed-off-by: Jussi Kivilinna --- tests/bench-slope.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+), 1 deletion(-) diff --git a/tests/bench-slope.c b/tests/bench-slope.c index 394d7fc..2679556 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -1571,13 +1571,176 @@ mac_bench (char **argv, int argc) } +/************************************************************ KDF benchmarks. */ + +struct bench_kdf_mode +{ + struct bench_ops *ops; + + int algo; + int subalgo; +}; + + +static int +bench_kdf_init (struct bench_obj *obj) +{ + struct bench_kdf_mode *mode = obj->priv; + + if (mode->algo == GCRY_KDF_PBKDF2) + { + obj->min_bufsize = 2; + obj->max_bufsize = 2 * 32; + obj->step_size = 2; + } + + obj->num_measure_repetitions = num_measurement_repetitions; + + return 0; +} + +static void +bench_kdf_free (struct bench_obj *obj) +{ + (void)obj; +} + +static void +bench_kdf_do_bench (struct bench_obj *obj, void *buf, size_t buflen) +{ + struct bench_kdf_mode *mode = obj->priv; + char keybuf[16]; + + (void)buf; + + if (mode->algo == GCRY_KDF_PBKDF2) + { + gcry_kdf_derive("qwerty", 6, mode->algo, mode->subalgo, "01234567", 8, + buflen, sizeof(keybuf), keybuf); + } +} + +static struct bench_ops kdf_ops = { + &bench_kdf_init, + &bench_kdf_free, + &bench_kdf_do_bench +}; + + +static void +kdf_bench_one (int algo, int subalgo) +{ + struct bench_kdf_mode mode = { &kdf_ops }; + struct bench_obj obj = { 0 }; + double nsecs_per_iteration; + double cycles_per_iteration; + char algo_name[32]; + char nsecpiter_buf[16]; + char cpiter_buf[16]; + + mode.algo = algo; + mode.subalgo = subalgo; + + switch (subalgo) + { + case GCRY_MD_CRC32: + case GCRY_MD_CRC32_RFC1510: + case GCRY_MD_CRC24_RFC2440: + case GCRY_MD_MD4: + /* Skip CRC32s. */ + return; + } + + *algo_name = 0; + + if (algo == GCRY_KDF_PBKDF2) + { + snprintf (algo_name, sizeof(algo_name), "PBKDF2-HMAC-%s", + gcry_md_algo_name (subalgo)); + } + + bench_print_algo (-24, algo_name); + + obj.ops = mode.ops; + obj.priv = &mode; + + nsecs_per_iteration = do_slope_benchmark (&obj); + + strcpy(cpiter_buf, csv_mode ? "" : "-"); + + double_to_str (nsecpiter_buf, sizeof (nsecpiter_buf), nsecs_per_iteration); + + /* If user didn't provide CPU speed, we cannot show cycles/iter results. */ + if (cpu_ghz > 0.0) + { + cycles_per_iteration = nsecs_per_iteration * cpu_ghz; + double_to_str (cpiter_buf, sizeof (cpiter_buf), cycles_per_iteration); + } + + if (csv_mode) + { + printf ("%s,%s,%s,,,,,,,,,%s,ns/iter,%s,c/iter\n", + current_section_name, + current_algo_name ? current_algo_name : "", + current_mode_name ? current_mode_name : "", + nsecpiter_buf, + cpiter_buf); + } + else + { + printf ("%14s %13s\n", nsecpiter_buf, cpiter_buf); + } +} + +void +kdf_bench (char **argv, int argc) +{ + char algo_name[32]; + int i, j; + + bench_print_section ("kdf", "KDF"); + + if (!csv_mode) + { + printf (" %-*s | ", 24, ""); + printf ("%14s %13s\n", "nanosecs/iter", "cycles/iter"); + } + + if (argv && argc) + { + for (i = 0; i < argc; i++) + { + for (j = 1; j < 400; j++) + { + if (gcry_md_test_algo (j)) + continue; + + snprintf (algo_name, sizeof(algo_name), "PBKDF2-HMAC-%s", + gcry_md_algo_name (j)); + + if (!strcmp(argv[i], algo_name)) + kdf_bench_one (GCRY_KDF_PBKDF2, j); + } + } + } + else + { + for (i = 1; i < 400; i++) + if (!gcry_md_test_algo (i)) + kdf_bench_one (GCRY_KDF_PBKDF2, i); + } + + bench_print_footer (24); +} + + /************************************************************** Main program. */ void print_help (void) { static const char *help_lines[] = { - "usage: bench-slope [options] [hash|mac|cipher [algonames]]", + "usage: bench-slope [options] [hash|mac|cipher|kdf [algonames]]", "", " options:", " --cpu-mhz Set CPU speed for calculating cycles", @@ -1744,6 +1907,7 @@ main (int argc, char **argv) hash_bench (NULL, 0); mac_bench (NULL, 0); cipher_bench (NULL, 0); + kdf_bench (NULL, 0); } else if (!strcmp (*argv, "hash")) { @@ -1769,6 +1933,14 @@ main (int argc, char **argv) warm_up_cpu (); cipher_bench ((argc == 0) ? NULL : argv, argc); } + else if (!strcmp (*argv, "kdf")) + { + argc--; + argv++; + + warm_up_cpu (); + kdf_bench ((argc == 0) ? NULL : argv, argc); + } else { fprintf (stderr, PGM ": unknown argument: %s\n", *argv); From jussi.kivilinna at iki.fi Sat Oct 24 15:26:13 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 24 Oct 2015 16:26:13 +0300 Subject: Keccak in libgcrypt In-Reply-To: <561CC7FD.1010905@st.com> References: <561CC7FD.1010905@st.com> Message-ID: <562B86F5.6080709@iki.fi> Hello, On 2015-10-13 11:59, Gilles Van Assche wrote: > Dear all, > > As an everyday user of GPG and co-designer of Keccak, I am happy to see > that you recently added Keccak/SHA-3 to libgcrypt's development branch. :-) > > As I went through your code, I had some comments?please see below. Thanks for looking it through and for the comments. > > Anyway, please let me know if I can help. > > Best wishes and happy coding, > Gilles > > > My comments are the following: > > 1) The piece of code that you started from (i.e., > Keccak-readable-and-compact.c) is not meant to be fast. There are faster > alternatives available in the Keccak Code Package. In particular, you > can find implementations aiming for speed on 32-bit platforms (including > ARM assembly code) and on 64-bit platforms. These implementations are > organized in a way that only the code for the Keccak-f[1600] permutation > and state input/output depends on the target platform. The rest is > implemented using platform-independent code. I've worked on rewrite of Keccak in libgcrypt. For starting poing I picked 'simple' and 'simple32bi' implementations from SUPERCOP package, as they seem to give reasonable performance and were easy to integrate. Refactored code will also simplify adding new (assembly) implementations to libgcrypt. > > 2) It would be great to add support for the SHAKE's. I understand that > they are a new kind of functions (?extendable-output functions?), so > requiring a different interface compared to traditional hash functions, > but they can become really useful in the context of RSA (replacing the > MGF) and ECDSA (or EdDSA). > > 3) The type KECCAK_CONTEXT contains a buffer (byte > buf[MD_BLOCK_MAX_BLOCKSIZE]) that is quite large. In fact, the sponge > construction does not need such a buffer, namely, the input bytes can be > XORed into the state as they arrive. The Keccak Code Package readily > implements this. > Ok, in the code, I've removed the use of the extra input buffer with Keccak. -Jussi > > _______________________________________________ > Gcrypt-devel mailing list > Gcrypt-devel at gnupg.org > http://lists.gnupg.org/mailman/listinfo/gcrypt-devel > From jussi.kivilinna at iki.fi Sat Oct 24 15:29:57 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 24 Oct 2015 16:29:57 +0300 Subject: [PATCH 1/2] Fix OCB amd64 assembly implementations for x32 Message-ID: <20151024132957.14704.59465.stgit@localhost6.localdomain6> * cipher/camellia-glue.c (_gcry_camellia_aesni_avx_ocb_enc) (_gcry_camellia_aesni_avx_ocb_dec, _gcry_camellia_aesni_avx_ocb_auth) (_gcry_camellia_aesni_avx2_ocb_enc, _gcry_camellia_aesni_avx2_ocb_dec) (_gcry_camellia_aesni_avx2_ocb_auth, _gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_auth): Change 'Ls' from pointer array to u64 array. * cipher/serpent.c (_gcry_serpent_sse2_ocb_enc) (_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth) (_gcry_serpent_avx2_ocb_enc, _gcry_serpent_avx2_ocb_dec) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Ditto. * cipher/twofish.c (_gcry_twofish_amd64_ocb_enc) (_gcry_twofish_amd64_ocb_dec, _gcry_twofish_amd64_ocb_auth) (twofish_amd64_ocb_enc, twofish_amd64_ocb_dec, twofish_amd64_ocb_auth) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Ditto. -- Pointers on x32 are 32-bit, but amd64 assembly implementations expect 64-bit pointers. Pass 'Ls' array to 64-bit integers so that input arrays has correct format for assembly functions. Signed-off-by: Jussi Kivilinna --- cipher/camellia-glue.c | 116 ++++++++++++++++++++++++++---------------------- cipher/serpent.c | 104 +++++++++++++++++++++++-------------------- cipher/twofish.c | 32 +++++++------ 3 files changed, 136 insertions(+), 116 deletions(-) diff --git a/cipher/camellia-glue.c b/cipher/camellia-glue.c index dee0169..dfddb4a 100644 --- a/cipher/camellia-glue.c +++ b/cipher/camellia-glue.c @@ -141,20 +141,20 @@ extern void _gcry_camellia_aesni_avx_ocb_enc(CAMELLIA_context *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_ocb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_ocb_auth(CAMELLIA_context *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, const unsigned char *key, @@ -185,20 +185,20 @@ extern void _gcry_camellia_aesni_avx2_ocb_enc(CAMELLIA_context *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_ocb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_ocb_auth(CAMELLIA_context *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; #endif static const char *selftest(void); @@ -630,27 +630,29 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_aesni_avx2) { int did_use_aesni_avx2 = 0; - const void *Ls[32]; + u64 Ls[32]; unsigned int n = 32 - (blkn % 32); - const void **l; + u64 *l; int i; if (nblocks >= 32) { for (i = 0; i < 32; i += 8) { - Ls[(i + 0 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 32] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 32] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 32] = c->u_mode.ocb.L[3]; - Ls[(15 + n) % 32] = c->u_mode.ocb.L[4]; - Ls[(23 + n) % 32] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(31 + n) % 32]; /* Process data in 32 block chunks. */ @@ -658,7 +660,7 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 32; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 32); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 32); if (encrypt) _gcry_camellia_aesni_avx2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -691,25 +693,27 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_aesni_avx) { int did_use_aesni_avx = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -717,7 +721,7 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); if (encrypt) _gcry_camellia_aesni_avx_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -780,27 +784,29 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_aesni_avx2) { int did_use_aesni_avx2 = 0; - const void *Ls[32]; + u64 Ls[32]; unsigned int n = 32 - (blkn % 32); - const void **l; + u64 *l; int i; if (nblocks >= 32) { for (i = 0; i < 32; i += 8) { - Ls[(i + 0 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 32] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 32] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 32] = c->u_mode.ocb.L[3]; - Ls[(15 + n) % 32] = c->u_mode.ocb.L[4]; - Ls[(23 + n) % 32] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(31 + n) % 32]; /* Process data in 32 block chunks. */ @@ -808,7 +814,7 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 32; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 32); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 32); _gcry_camellia_aesni_avx2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, @@ -837,25 +843,27 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_aesni_avx) { int did_use_aesni_avx = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -863,7 +871,7 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); _gcry_camellia_aesni_avx_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, diff --git a/cipher/serpent.c b/cipher/serpent.c index fc3afa6..4ef7f52 100644 --- a/cipher/serpent.c +++ b/cipher/serpent.c @@ -125,20 +125,20 @@ extern void _gcry_serpent_sse2_ocb_enc(serpent_context_t *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_ocb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_ocb_auth(serpent_context_t *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 @@ -165,20 +165,20 @@ extern void _gcry_serpent_avx2_ocb_enc(serpent_context_t *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_ocb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_ocb_auth(serpent_context_t *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; #endif #ifdef USE_NEON @@ -1249,25 +1249,27 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_avx2) { int did_use_avx2 = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -1275,7 +1277,7 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); if (encrypt) _gcry_serpent_avx2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -1305,19 +1307,21 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, #ifdef USE_SSE2 { int did_use_sse2 = 0; - const void *Ls[8]; + u64 Ls[8]; unsigned int n = 8 - (blkn % 8); - const void **l; + u64 *l; if (nblocks >= 8) { - Ls[(0 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(1 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(2 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(3 + n) % 8] = c->u_mode.ocb.L[2]; - Ls[(4 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(5 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(6 + n) % 8] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(0 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(1 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(2 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(3 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(4 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(5 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(6 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; l = &Ls[(7 + n) % 8]; /* Process data in 8 block chunks. */ @@ -1325,7 +1329,7 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 8; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 8); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 8); if (encrypt) _gcry_serpent_sse2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -1435,25 +1439,27 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_avx2) { int did_use_avx2 = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -1461,7 +1467,7 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); _gcry_serpent_avx2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, c->u_mode.ocb.aad_sum, Ls); @@ -1486,19 +1492,21 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, #ifdef USE_SSE2 { int did_use_sse2 = 0; - const void *Ls[8]; + u64 Ls[8]; unsigned int n = 8 - (blkn % 8); - const void **l; + u64 *l; if (nblocks >= 8) { - Ls[(0 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(1 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(2 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(3 + n) % 8] = c->u_mode.ocb.L[2]; - Ls[(4 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(5 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(6 + n) % 8] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(0 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(1 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(2 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(3 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(4 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(5 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(6 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; l = &Ls[(7 + n) % 8]; /* Process data in 8 block chunks. */ @@ -1506,7 +1514,7 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 8; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 8); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 8); _gcry_serpent_sse2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, c->u_mode.ocb.aad_sum, Ls); diff --git a/cipher/twofish.c b/cipher/twofish.c index 7f361c9..f6ecd67 100644 --- a/cipher/twofish.c +++ b/cipher/twofish.c @@ -734,15 +734,15 @@ extern void _gcry_twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, extern void _gcry_twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); extern void _gcry_twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); extern void _gcry_twofish_amd64_ocb_auth(const TWOFISH_context *ctx, const byte *abuf, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS static inline void @@ -854,7 +854,7 @@ twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, static inline void twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn6(_gcry_twofish_amd64_ocb_enc, ctx, out, in, offset, checksum, Ls); @@ -865,7 +865,7 @@ twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, static inline void twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn6(_gcry_twofish_amd64_ocb_dec, ctx, out, in, offset, checksum, Ls); @@ -876,7 +876,7 @@ twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, static inline void twofish_amd64_ocb_auth(const TWOFISH_context *ctx, const byte *abuf, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn5(_gcry_twofish_amd64_ocb_auth, ctx, abuf, offset, checksum, Ls); @@ -1261,15 +1261,17 @@ _gcry_twofish_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, u64 blkn = c->u_mode.ocb.data_nblocks; { - const void *Ls[3]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + u64 Ls[3]; /* Process data in 3 block chunks. */ while (nblocks >= 3) { /* l_tmp will be used only every 65536-th block. */ - Ls[0] = ocb_get_l(c, l_tmp, blkn + 1); - Ls[1] = ocb_get_l(c, l_tmp, blkn + 2); - Ls[2] = ocb_get_l(c, l_tmp, blkn + 3); + Ls[0] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 1); + Ls[1] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 2); + Ls[2] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 3); blkn += 3; if (encrypt) @@ -1320,15 +1322,17 @@ _gcry_twofish_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, u64 blkn = c->u_mode.ocb.aad_nblocks; { - const void *Ls[3]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + u64 Ls[3]; /* Process data in 3 block chunks. */ while (nblocks >= 3) { /* l_tmp will be used only every 65536-th block. */ - Ls[0] = ocb_get_l(c, l_tmp, blkn + 1); - Ls[1] = ocb_get_l(c, l_tmp, blkn + 2); - Ls[2] = ocb_get_l(c, l_tmp, blkn + 3); + Ls[0] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 1); + Ls[1] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 2); + Ls[2] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 3); blkn += 3; twofish_amd64_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, From jussi.kivilinna at iki.fi Sat Oct 24 15:30:02 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 24 Oct 2015 16:30:02 +0300 Subject: [PATCH 2/2] hwf-x86: add detection for Intel CPUs with fast SHLD instruction In-Reply-To: <20151024132957.14704.59465.stgit@localhost6.localdomain6> References: <20151024132957.14704.59465.stgit@localhost6.localdomain6> Message-ID: <20151024133002.14704.75048.stgit@localhost6.localdomain6> * cipher/sha1.c (sha1_init): Use HWF_INTEL_FAST_SHLD instead of HWF_INTEL_CPU. * cipher/sha256.c (sha256_init, sha224_init): Ditto. * cipher/sha512.c (sha512_init, sha384_init): Ditto. * src/g10lib.h (HWF_INTEL_FAST_SHLD): New. (HWF_INTEL_BMI2, HWF_INTEL_SSSE3, HWF_INTEL_PCLMUL, HWF_INTEL_AESNI) (HWF_INTEL_RDRAND, HWF_INTEL_AVX, HWF_INTEL_AVX2) (HWF_ARM_NEON): Update. * src/hwf-x86.c (detect_x86_gnuc): Add detection of Intel Core CPUs with fast SHLD/SHRD instruction. * src/hwfeatures.c (hwflist): Add "intel-fast-shld". -- Intel Core CPUs since codename sandy-bridge have been able to execute SHLD/SHRD instructions faster than rotate instructions ROL/ROR. Since SHLD/SHRD can be used to do rotation, some optimized implementations (SHA1/SHA256/SHA512) use SHLD/SHRD instructions in-place of ROL/ROR. This patch provides more accurate detection of CPUs with fast SHLD implementation. Signed-off-by: Jussi Kivilinna --- cipher/sha1.c | 2 +- cipher/sha256.c | 4 ++-- cipher/sha512.c | 4 ++-- src/g10lib.h | 21 +++++++++++---------- src/hwf-x86.c | 34 ++++++++++++++++++++++++++++++++-- src/hwfeatures.c | 27 ++++++++++++++------------- 6 files changed, 62 insertions(+), 30 deletions(-) diff --git a/cipher/sha1.c b/cipher/sha1.c index eb42883..554d55c 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -136,7 +136,7 @@ sha1_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_BMI2 hd->use_bmi2 = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_BMI2); diff --git a/cipher/sha256.c b/cipher/sha256.c index 59ffa43..63869d5 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -124,7 +124,7 @@ sha256_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); @@ -162,7 +162,7 @@ sha224_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); diff --git a/cipher/sha512.c b/cipher/sha512.c index 029f8f0..4be1cab 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -154,7 +154,7 @@ sha512_init (void *context, unsigned int flags) ctx->use_ssse3 = (features & HWF_INTEL_SSSE3) != 0; #endif #ifdef USE_AVX - ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 ctx->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); @@ -194,7 +194,7 @@ sha384_init (void *context, unsigned int flags) ctx->use_ssse3 = (features & HWF_INTEL_SSSE3) != 0; #endif #ifdef USE_AVX - ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 ctx->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); diff --git a/src/g10lib.h b/src/g10lib.h index d1f9426..a579e94 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -197,16 +197,17 @@ int _gcry_log_verbosity( int level ); #define HWF_PADLOCK_SHA 4 #define HWF_PADLOCK_MMUL 8 -#define HWF_INTEL_CPU 16 -#define HWF_INTEL_BMI2 32 -#define HWF_INTEL_SSSE3 64 -#define HWF_INTEL_PCLMUL 128 -#define HWF_INTEL_AESNI 256 -#define HWF_INTEL_RDRAND 512 -#define HWF_INTEL_AVX 1024 -#define HWF_INTEL_AVX2 2048 - -#define HWF_ARM_NEON 4096 +#define HWF_INTEL_CPU 16 +#define HWF_INTEL_FAST_SHLD 32 +#define HWF_INTEL_BMI2 64 +#define HWF_INTEL_SSSE3 128 +#define HWF_INTEL_PCLMUL 256 +#define HWF_INTEL_AESNI 512 +#define HWF_INTEL_RDRAND 1024 +#define HWF_INTEL_AVX 2048 +#define HWF_INTEL_AVX2 4096 + +#define HWF_ARM_NEON 8192 gpg_err_code_t _gcry_disable_hw_feature (const char *name); diff --git a/src/hwf-x86.c b/src/hwf-x86.c index 399952c..fbd6331 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -174,6 +174,7 @@ detect_x86_gnuc (void) unsigned int features; unsigned int os_supports_avx_avx2_registers = 0; unsigned int max_cpuid_level; + unsigned int fms, family, model; unsigned int result = 0; (void)os_supports_avx_avx2_registers; @@ -236,8 +237,37 @@ detect_x86_gnuc (void) /* Detect Intel features, that might also be supported by other vendors. */ - /* Get CPU info and Intel feature flags (ECX). */ - get_cpuid(1, NULL, NULL, &features, NULL); + /* Get CPU family/model/stepping (EAX) and Intel feature flags (ECX). */ + get_cpuid(1, &fms, NULL, &features, NULL); + + family = ((fms & 0xf00) >> 8) + ((fms & 0xff00000) >> 20); + model = ((fms & 0xf0) >> 4) + ((fms & 0xf0000) >> 12); + + if ((result & HWF_INTEL_CPU) && family == 6) + { + /* These Intel Core processor models have SHLD/SHRD instruction that + * can do integer rotation faster actual ROL/ROR instructions. */ + switch (model) + { + case 0x2A: + case 0x2D: + case 0x3A: + case 0x3C: + case 0x3F: + case 0x45: + case 0x46: + case 0x3D: + case 0x4F: + case 0x56: + case 0x47: + case 0x4E: + case 0x5E: + case 0x55: + case 0x66: + result |= HWF_INTEL_FAST_SHLD; + break; + } + } #ifdef ENABLE_PCLMUL_SUPPORT /* Test bit 1 for PCLMUL. */ diff --git a/src/hwfeatures.c b/src/hwfeatures.c index 58099c4..e7c55cc 100644 --- a/src/hwfeatures.c +++ b/src/hwfeatures.c @@ -42,19 +42,20 @@ static struct const char *desc; } hwflist[] = { - { HWF_PADLOCK_RNG, "padlock-rng" }, - { HWF_PADLOCK_AES, "padlock-aes" }, - { HWF_PADLOCK_SHA, "padlock-sha" }, - { HWF_PADLOCK_MMUL,"padlock-mmul"}, - { HWF_INTEL_CPU, "intel-cpu" }, - { HWF_INTEL_BMI2, "intel-bmi2" }, - { HWF_INTEL_SSSE3, "intel-ssse3" }, - { HWF_INTEL_PCLMUL,"intel-pclmul" }, - { HWF_INTEL_AESNI, "intel-aesni" }, - { HWF_INTEL_RDRAND,"intel-rdrand" }, - { HWF_INTEL_AVX, "intel-avx" }, - { HWF_INTEL_AVX2, "intel-avx2" }, - { HWF_ARM_NEON, "arm-neon" } + { HWF_PADLOCK_RNG, "padlock-rng" }, + { HWF_PADLOCK_AES, "padlock-aes" }, + { HWF_PADLOCK_SHA, "padlock-sha" }, + { HWF_PADLOCK_MMUL, "padlock-mmul"}, + { HWF_INTEL_CPU, "intel-cpu" }, + { HWF_INTEL_FAST_SHLD, "intel-fast-shld" }, + { HWF_INTEL_BMI2, "intel-bmi2" }, + { HWF_INTEL_SSSE3, "intel-ssse3" }, + { HWF_INTEL_PCLMUL, "intel-pclmul" }, + { HWF_INTEL_AESNI, "intel-aesni" }, + { HWF_INTEL_RDRAND, "intel-rdrand" }, + { HWF_INTEL_AVX, "intel-avx" }, + { HWF_INTEL_AVX2, "intel-avx2" }, + { HWF_ARM_NEON, "arm-neon" } }; /* A bit vector with the hardware features which shall not be used. From jussi.kivilinna at iki.fi Sat Oct 24 16:12:00 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 24 Oct 2015 17:12:00 +0300 Subject: [PATCH] keccak: rewrite for improved performance Message-ID: <20151024141200.29930.17820.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna --- cipher/Makefile.am | 2 cipher/hash-common.h | 12 - cipher/keccak.c | 807 +++++++++++++++++++++++++++++++------------- cipher/keccak_permute_32.h | 535 +++++++++++++++++++++++++++++ cipher/keccak_permute_64.h | 290 ++++++++++++++++ 5 files changed, 1403 insertions(+), 243 deletions(-) create mode 100644 cipher/keccak_permute_32.h create mode 100644 cipher/keccak_permute_64.h diff --git a/cipher/Makefile.am b/cipher/Makefile.am index b08c9a9..be03d06 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -90,7 +90,7 @@ sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \ sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S \ -keccak.c \ +keccak.c keccak_permute_32.h keccak_permute_64.h \ stribog.c \ tiger.c \ whirlpool.c whirlpool-sse2-amd64.S \ diff --git a/cipher/hash-common.h b/cipher/hash-common.h index e1ae5a2..27d670d 100644 --- a/cipher/hash-common.h +++ b/cipher/hash-common.h @@ -33,15 +33,9 @@ typedef unsigned int (*_gcry_md_block_write_t) (void *c, const unsigned char *blks, size_t nblks); -#if defined(HAVE_U64_TYPEDEF) && (defined(USE_SHA512) || defined(USE_SHA3) || \ - defined(USE_WHIRLPOOL)) -/* SHA-512, SHA-3 and Whirlpool needs u64. SHA-512 and SHA3 need larger - * buffer. */ -# ifdef USE_SHA3 -# define MD_BLOCK_MAX_BLOCKSIZE (1152 / 8) -# else -# define MD_BLOCK_MAX_BLOCKSIZE 128 -# endif +#if defined(HAVE_U64_TYPEDEF) && (defined(USE_SHA512) || defined(USE_WHIRLPOOL)) +/* SHA-512 and Whirlpool needs u64. SHA-512 needs larger buffer. */ +# define MD_BLOCK_MAX_BLOCKSIZE 128 # define MD_NBLOCKS_TYPE u64 #else # define MD_BLOCK_MAX_BLOCKSIZE 64 diff --git a/cipher/keccak.c b/cipher/keccak.c index 4a9c1f2..efcd813 100644 --- a/cipher/keccak.c +++ b/cipher/keccak.c @@ -27,11 +27,45 @@ #include "hash-common.h" -/* The code is based on public-domain/CC0 "Keccak-readable-and-compact.c" - * implementation by the Keccak, Keyak and Ketje Teams, namely, Guido Bertoni, - * Joan Daemen, Micha?l Peeters, Gilles Van Assche and Ronny Van Keer. From: - * https://github.com/gvanas/KeccakCodePackage - */ + +/* USE_64BIT indicates whether to use 64-bit generic implementation. + * USE_32BIT indicates whether to use 32-bit generic implementation. */ +#undef USE_64BIT +#if defined(__x86_64__) || SIZEOF_UNSIGNED_LONG == 8 +# define USE_64BIT 1 +#else +# define USE_32BIT 1 +#endif + + +/* USE_64BIT_BMI2 indicates whether to compile with 64-bit Intel BMI2 code. */ +#undef USE_64BIT_BMI2 +#if defined(USE_64BIT) && defined(HAVE_GCC_INLINE_ASM_BMI2) +# define USE_64BIT_BMI2 1 +#endif + + +/* USE_64BIT_SHLD indicates whether to compile with 64-bit Intel SHLD code. */ +#undef USE_64BIT_SHLD +#if defined(USE_64BIT) && defined (__GNUC__) && defined(__x86_64__) +# define USE_64BIT_SHLD 1 +#endif + + +/* USE_32BIT_BMI2 indicates whether to compile with 32-bit Intel BMI2 code. */ +#undef USE_32BIT_BMI2 +#if defined(USE_32BIT) && defined(HAVE_GCC_INLINE_ASM_BMI2) +# define USE_32BIT_BMI2 1 +#endif + + +#ifdef USE_64BIT +# define NEED_COMMON64 1 +#endif + +#ifdef USE_32BIT +# define NEED_COMMON32BI 1 +#endif #define SHA3_DELIMITED_SUFFIX 0x06 @@ -40,220 +74,527 @@ typedef struct { - u64 state[5][5]; + union { +#ifdef NEED_COMMON64 + u64 state64[25]; +#endif +#ifdef NEED_COMMON32BI + u32 state32bi[50]; +#endif + } u; } KECCAK_STATE; typedef struct { - gcry_md_block_ctx_t bctx; + unsigned int (*permute)(KECCAK_STATE *hd); + unsigned int (*absorb)(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes); + unsigned int (*extract_inplace) (KECCAK_STATE *hd, unsigned int outlen); +} keccak_ops_t; + + +typedef struct KECCAK_CONTEXT_S +{ KECCAK_STATE state; unsigned int outlen; + unsigned int blocksize; + unsigned int count; + const keccak_ops_t *ops; } KECCAK_CONTEXT; -static inline u64 -rol64 (u64 x, unsigned int n) + +#ifdef NEED_COMMON64 + +static const u64 round_consts_64bit[24] = { - return ((x << n) | (x >> (64 - n))); -} + U64_C(0x0000000000000001), U64_C(0x0000000000008082), + U64_C(0x800000000000808A), U64_C(0x8000000080008000), + U64_C(0x000000000000808B), U64_C(0x0000000080000001), + U64_C(0x8000000080008081), U64_C(0x8000000000008009), + U64_C(0x000000000000008A), U64_C(0x0000000000000088), + U64_C(0x0000000080008009), U64_C(0x000000008000000A), + U64_C(0x000000008000808B), U64_C(0x800000000000008B), + U64_C(0x8000000000008089), U64_C(0x8000000000008003), + U64_C(0x8000000000008002), U64_C(0x8000000000000080), + U64_C(0x000000000000800A), U64_C(0x800000008000000A), + U64_C(0x8000000080008081), U64_C(0x8000000000008080), + U64_C(0x0000000080000001), U64_C(0x8000000080008008) +}; -/* Function that computes the Keccak-f[1600] permutation on the given state. */ -static unsigned int keccak_f1600_state_permute(KECCAK_STATE *hd) +static unsigned int +keccak_extract_inplace64(KECCAK_STATE *hd, unsigned int outlen) { - static const u64 round_consts[24] = - { - U64_C(0x0000000000000001), U64_C(0x0000000000008082), - U64_C(0x800000000000808A), U64_C(0x8000000080008000), - U64_C(0x000000000000808B), U64_C(0x0000000080000001), - U64_C(0x8000000080008081), U64_C(0x8000000000008009), - U64_C(0x000000000000008A), U64_C(0x0000000000000088), - U64_C(0x0000000080008009), U64_C(0x000000008000000A), - U64_C(0x000000008000808B), U64_C(0x800000000000008B), - U64_C(0x8000000000008089), U64_C(0x8000000000008003), - U64_C(0x8000000000008002), U64_C(0x8000000000000080), - U64_C(0x000000000000800A), U64_C(0x800000008000000A), - U64_C(0x8000000080008081), U64_C(0x8000000000008080), - U64_C(0x0000000080000001), U64_C(0x8000000080008008) - }; - unsigned int round; + unsigned int i; - for (round = 0; round < 24; round++) + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) { - { - /* ? step (see [Keccak Reference, Section 2.3.2]) === */ - u64 C[5], D[5]; - - /* Compute the parity of the columns */ - C[0] = hd->state[0][0] ^ hd->state[1][0] ^ hd->state[2][0] - ^ hd->state[3][0] ^ hd->state[4][0]; - C[1] = hd->state[0][1] ^ hd->state[1][1] ^ hd->state[2][1] - ^ hd->state[3][1] ^ hd->state[4][1]; - C[2] = hd->state[0][2] ^ hd->state[1][2] ^ hd->state[2][2] - ^ hd->state[3][2] ^ hd->state[4][2]; - C[3] = hd->state[0][3] ^ hd->state[1][3] ^ hd->state[2][3] - ^ hd->state[3][3] ^ hd->state[4][3]; - C[4] = hd->state[0][4] ^ hd->state[1][4] ^ hd->state[2][4] - ^ hd->state[3][4] ^ hd->state[4][4]; - - /* Compute the ? effect for a given column */ - D[0] = C[4] ^ rol64(C[1], 1); - D[1] = C[0] ^ rol64(C[2], 1); - D[2] = C[1] ^ rol64(C[3], 1); - D[3] = C[2] ^ rol64(C[4], 1); - D[4] = C[3] ^ rol64(C[0], 1); - - /* Add the ? effect to the whole column */ - hd->state[0][0] ^= D[0]; - hd->state[1][0] ^= D[0]; - hd->state[2][0] ^= D[0]; - hd->state[3][0] ^= D[0]; - hd->state[4][0] ^= D[0]; - - /* Add the ? effect to the whole column */ - hd->state[0][1] ^= D[1]; - hd->state[1][1] ^= D[1]; - hd->state[2][1] ^= D[1]; - hd->state[3][1] ^= D[1]; - hd->state[4][1] ^= D[1]; - - /* Add the ? effect to the whole column */ - hd->state[0][2] ^= D[2]; - hd->state[1][2] ^= D[2]; - hd->state[2][2] ^= D[2]; - hd->state[3][2] ^= D[2]; - hd->state[4][2] ^= D[2]; - - /* Add the ? effect to the whole column */ - hd->state[0][3] ^= D[3]; - hd->state[1][3] ^= D[3]; - hd->state[2][3] ^= D[3]; - hd->state[3][3] ^= D[3]; - hd->state[4][3] ^= D[3]; - - /* Add the ? effect to the whole column */ - hd->state[0][4] ^= D[4]; - hd->state[1][4] ^= D[4]; - hd->state[2][4] ^= D[4]; - hd->state[3][4] ^= D[4]; - hd->state[4][4] ^= D[4]; - } - - { - /* ? and ? steps (see [Keccak Reference, Sections 2.3.3 and 2.3.4]) */ - u64 current, temp; - -#define do_swap_n_rol(x, y, r) \ - temp = hd->state[y][x]; \ - hd->state[y][x] = rol64(current, r); \ - current = temp; - - /* Start at coordinates (1 0) */ - current = hd->state[0][1]; - - /* Iterate over ((0 1)(2 3))^t * (1 0) for 0 ? t ? 23 */ - do_swap_n_rol(0, 2, 1); - do_swap_n_rol(2, 1, 3); - do_swap_n_rol(1, 2, 6); - do_swap_n_rol(2, 3, 10); - do_swap_n_rol(3, 3, 15); - do_swap_n_rol(3, 0, 21); - do_swap_n_rol(0, 1, 28); - do_swap_n_rol(1, 3, 36); - do_swap_n_rol(3, 1, 45); - do_swap_n_rol(1, 4, 55); - do_swap_n_rol(4, 4, 2); - do_swap_n_rol(4, 0, 14); - do_swap_n_rol(0, 3, 27); - do_swap_n_rol(3, 4, 41); - do_swap_n_rol(4, 3, 56); - do_swap_n_rol(3, 2, 8); - do_swap_n_rol(2, 2, 25); - do_swap_n_rol(2, 0, 43); - do_swap_n_rol(0, 4, 62); - do_swap_n_rol(4, 2, 18); - do_swap_n_rol(2, 4, 39); - do_swap_n_rol(4, 1, 61); - do_swap_n_rol(1, 1, 20); - do_swap_n_rol(1, 0, 44); - -#undef do_swap_n_rol - } - - { - /* ? step (see [Keccak Reference, Section 2.3.1]) */ - u64 temp[5]; - -#define do_x_step_for_plane(y) \ - /* Take a copy of the plane */ \ - temp[0] = hd->state[y][0]; \ - temp[1] = hd->state[y][1]; \ - temp[2] = hd->state[y][2]; \ - temp[3] = hd->state[y][3]; \ - temp[4] = hd->state[y][4]; \ - \ - /* Compute ? on the plane */ \ - hd->state[y][0] = temp[0] ^ ((~temp[1]) & temp[2]); \ - hd->state[y][1] = temp[1] ^ ((~temp[2]) & temp[3]); \ - hd->state[y][2] = temp[2] ^ ((~temp[3]) & temp[4]); \ - hd->state[y][3] = temp[3] ^ ((~temp[4]) & temp[0]); \ - hd->state[y][4] = temp[4] ^ ((~temp[0]) & temp[1]); - - do_x_step_for_plane(0); - do_x_step_for_plane(1); - do_x_step_for_plane(2); - do_x_step_for_plane(3); - do_x_step_for_plane(4); - -#undef do_x_step_for_plane - } - - { - /* ? step (see [Keccak Reference, Section 2.3.5]) */ - - hd->state[0][0] ^= round_consts[round]; - } + hd->u.state64[i] = le_bswap64(hd->u.state64[i]); } - return sizeof(void *) * 4 + sizeof(u64) * 10; + return 0; } +#endif /* NEED_COMMON64 */ + + +#ifdef NEED_COMMON32BI + +static const u32 round_consts_32bit[2 * 24] = +{ + 0x00000001UL, 0x00000000UL, 0x00000000UL, 0x00000089UL, + 0x00000000UL, 0x8000008bUL, 0x00000000UL, 0x80008080UL, + 0x00000001UL, 0x0000008bUL, 0x00000001UL, 0x00008000UL, + 0x00000001UL, 0x80008088UL, 0x00000001UL, 0x80000082UL, + 0x00000000UL, 0x0000000bUL, 0x00000000UL, 0x0000000aUL, + 0x00000001UL, 0x00008082UL, 0x00000000UL, 0x00008003UL, + 0x00000001UL, 0x0000808bUL, 0x00000001UL, 0x8000000bUL, + 0x00000001UL, 0x8000008aUL, 0x00000001UL, 0x80000081UL, + 0x00000000UL, 0x80000081UL, 0x00000000UL, 0x80000008UL, + 0x00000000UL, 0x00000083UL, 0x00000000UL, 0x80008003UL, + 0x00000001UL, 0x80008088UL, 0x00000000UL, 0x80000088UL, + 0x00000001UL, 0x00008000UL, 0x00000000UL, 0x80008082UL +}; static unsigned int -transform_blk (void *context, const unsigned char *data) +keccak_extract_inplace32bi(KECCAK_STATE *hd, unsigned int outlen) { - KECCAK_CONTEXT *ctx = context; - KECCAK_STATE *hd = &ctx->state; - u64 *state = (u64 *)hd->state; - const size_t bsize = ctx->bctx.blocksize; unsigned int i; + u32 x0; + u32 x1; + u32 t; + + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + { + x0 = hd->u.state32bi[i * 2 + 0]; + x1 = hd->u.state32bi[i * 2 + 1]; + + t = (x0 & 0x0000FFFFUL) + (x1 << 16); + x1 = (x0 >> 16) + (x1 & 0xFFFF0000UL); + x0 = t; + t = (x0 ^ (x0 >> 8)) & 0x0000FF00UL; x0 = x0 ^ t ^ (t << 8); + t = (x0 ^ (x0 >> 4)) & 0x00F000F0UL; x0 = x0 ^ t ^ (t << 4); + t = (x0 ^ (x0 >> 2)) & 0x0C0C0C0CUL; x0 = x0 ^ t ^ (t << 2); + t = (x0 ^ (x0 >> 1)) & 0x22222222UL; x0 = x0 ^ t ^ (t << 1); + t = (x1 ^ (x1 >> 8)) & 0x0000FF00UL; x1 = x1 ^ t ^ (t << 8); + t = (x1 ^ (x1 >> 4)) & 0x00F000F0UL; x1 = x1 ^ t ^ (t << 4); + t = (x1 ^ (x1 >> 2)) & 0x0C0C0C0CUL; x1 = x1 ^ t ^ (t << 2); + t = (x1 ^ (x1 >> 1)) & 0x22222222UL; x1 = x1 ^ t ^ (t << 1); + + hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); + hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + } - /* Absorb input block. */ - for (i = 0; i < bsize / 8; i++) - state[i] ^= buf_get_le64(data + i * 8); + return 0; +} - return keccak_f1600_state_permute(hd) + 4 * sizeof(void *); +static inline void +keccak_absorb_lane32bi(u32 *lane, u32 x0, u32 x1) +{ + u32 t; + + t = (x0 ^ (x0 >> 1)) & 0x22222222UL; x0 = x0 ^ t ^ (t << 1); + t = (x0 ^ (x0 >> 2)) & 0x0C0C0C0CUL; x0 = x0 ^ t ^ (t << 2); + t = (x0 ^ (x0 >> 4)) & 0x00F000F0UL; x0 = x0 ^ t ^ (t << 4); + t = (x0 ^ (x0 >> 8)) & 0x0000FF00UL; x0 = x0 ^ t ^ (t << 8); + t = (x1 ^ (x1 >> 1)) & 0x22222222UL; x1 = x1 ^ t ^ (t << 1); + t = (x1 ^ (x1 >> 2)) & 0x0C0C0C0CUL; x1 = x1 ^ t ^ (t << 2); + t = (x1 ^ (x1 >> 4)) & 0x00F000F0UL; x1 = x1 ^ t ^ (t << 4); + t = (x1 ^ (x1 >> 8)) & 0x0000FF00UL; x1 = x1 ^ t ^ (t << 8); + lane[0] ^= (x0 & 0x0000FFFFUL) + (x1 << 16); + lane[1] ^= (x0 >> 16) + (x1 & 0xFFFF0000UL); } +#endif /* NEED_COMMON32BI */ + + +/* Construct generic 64-bit implementation. */ +#ifdef USE_64BIT + +# define ANDN64(x, y) (~(x) & (y)) +# define ROL64(x, n) (((x) << ((unsigned int)n & 63)) | \ + ((x) >> ((64 - (unsigned int)(n)) & 63))) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64 +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME static unsigned int -transform (void *context, const unsigned char *data, size_t nblks) +keccak_absorb_lanes64(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) { - KECCAK_CONTEXT *ctx = context; - const size_t bsize = ctx->bctx.blocksize; - unsigned int burn; + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_generic64_ops = +{ + .permute = keccak_f1600_state_permute64, + .absorb = keccak_absorb_lanes64, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT */ + + +/* Construct 64-bit Intel SHLD implementation. */ +#ifdef USE_64BIT_SHLD + +# define ANDN64(x, y) (~(x) & (y)) +# define ROL64(x, n) ({ \ + u64 tmp = (x); \ + asm ("shldq %1, %0, %0" \ + : "+r" (tmp) \ + : "J" ((n) & 63)); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64_shld +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes64_shld(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64_shld(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_shld_64_ops = +{ + .permute = keccak_f1600_state_permute64_shld, + .absorb = keccak_absorb_lanes64_shld, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT_SHLD */ + + +/* Construct 64-bit Intel BMI2 implementation. */ +#ifdef USE_64BIT_BMI2 + +# define ANDN64(x, y) ({ \ + u64 tmp; \ + asm ("andnq %2, %1, %0" \ + : "=r" (tmp) \ + : "r0" (x), "rm" (y)); \ + tmp; }) + +# define ROL64(x, n) ({ \ + u64 tmp; \ + asm ("rorxq %2, %1, %0" \ + : "=r" (tmp) \ + : "rm0" (x), "J" (64 - ((n) & 63))); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64_bmi2 +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes64_bmi2(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64_bmi2(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_bmi2_64_ops = +{ + .permute = keccak_f1600_state_permute64_bmi2, + .absorb = keccak_absorb_lanes64_bmi2, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT_BMI2 */ + + +/* Construct generic 32-bit implementation. */ +#ifdef USE_32BIT + +# define ANDN32(x, y) (~(x) & (y)) +# define ROL32(x, n) (((x) << ((unsigned int)n & 31)) | \ + ((x) >> ((32 - (unsigned int)(n)) & 31))) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute32bi +# include "keccak_permute_32.h" + +# undef ANDN32 +# undef ROL32 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes32bi(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; - /* Absorb full blocks. */ - do + while (nlanes) { - burn = transform_blk (context, data); - data += bsize; + keccak_absorb_lane32bi(&hd->u.state32bi[pos * 2], + buf_get_le32(lanes + 0), + buf_get_le32(lanes + 4)); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute32bi(hd); + pos = 0; + } } - while (--nblks); return burn; } +static const keccak_ops_t keccak_generic32bi_ops = +{ + .permute = keccak_f1600_state_permute32bi, + .absorb = keccak_absorb_lanes32bi, + .extract_inplace = keccak_extract_inplace32bi, +}; + +#endif /* USE_32BIT */ + + +/* Construct 32-bit Intel BMI2 implementation. */ +#ifdef USE_32BIT_BMI2 + +# define ANDN32(x, y) ({ \ + u32 tmp; \ + asm ("andnl %2, %1, %0" \ + : "=r" (tmp) \ + : "r0" (x), "rm" (y)); \ + tmp; }) + +# define ROL32(x, n) ({ \ + u32 tmp; \ + asm ("rorxl %2, %1, %0" \ + : "=r" (tmp) \ + : "rm0" (x), "J" (32 - ((n) & 31))); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute32bi_bmi2 +# include "keccak_permute_32.h" + +# undef ANDN32 +# undef ROL32 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static inline u32 pext(u32 x, u32 mask) +{ + u32 tmp; + asm ("pextl %2, %1, %0" : "=r" (tmp) : "r0" (x), "rm" (mask)); + return tmp; +} + +static inline u32 pdep(u32 x, u32 mask) +{ + u32 tmp; + asm ("pdepl %2, %1, %0" : "=r" (tmp) : "r0" (x), "rm" (mask)); + return tmp; +} + +static inline void +keccak_absorb_lane32bi_bmi2(u32 *lane, u32 x0, u32 x1) +{ + x0 = pdep(pext(x0, 0x55555555), 0x0000ffff) | (pext(x0, 0xaaaaaaaa) << 16); + x1 = pdep(pext(x1, 0x55555555), 0x0000ffff) | (pext(x1, 0xaaaaaaaa) << 16); + + lane[0] ^= (x0 & 0x0000FFFFUL) + (x1 << 16); + lane[1] ^= (x0 >> 16) + (x1 & 0xFFFF0000UL); +} + +static unsigned int +keccak_absorb_lanes32bi_bmi2(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + keccak_absorb_lane32bi_bmi2(&hd->u.state32bi[pos * 2], + buf_get_le32(lanes + 0), + buf_get_le32(lanes + 4)); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute32bi_bmi2(hd); + pos = 0; + } + } + + return burn; +} + +static unsigned int +keccak_extract_inplace32bi_bmi2(KECCAK_STATE *hd, unsigned int outlen) +{ + unsigned int i; + u32 x0; + u32 x1; + u32 t; + + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + { + x0 = hd->u.state32bi[i * 2 + 0]; + x1 = hd->u.state32bi[i * 2 + 1]; + + t = (x0 & 0x0000FFFFUL) + (x1 << 16); + x1 = (x0 >> 16) + (x1 & 0xFFFF0000UL); + x0 = t; + + x0 = pdep(pext(x0, 0xffff0001), 0xaaaaaaab) | pdep(x0 >> 1, 0x55555554); + x1 = pdep(pext(x1, 0xffff0001), 0xaaaaaaab) | pdep(x1 >> 1, 0x55555554); + + hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); + hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + } + + return 0; +} + +static const keccak_ops_t keccak_bmi2_32bi_ops = +{ + .permute = keccak_f1600_state_permute32bi_bmi2, + .absorb = keccak_absorb_lanes32bi_bmi2, + .extract_inplace = keccak_extract_inplace32bi_bmi2, +}; + +#endif /* USE_32BIT */ + + +static void +keccak_write (void *context, const void *inbuf_arg, size_t inlen) +{ + KECCAK_CONTEXT *ctx = context; + const size_t bsize = ctx->blocksize; + const size_t blocklanes = bsize / 8; + const byte *inbuf = inbuf_arg; + unsigned int nburn, burn = 0; + unsigned int count, i; + unsigned int pos, nlanes; + + count = ctx->count; + + if (inlen && (count % 8)) + { + byte lane[8] = { 0, }; + + /* Complete absorbing partial input lane. */ + + pos = count / 8; + + for (i = count % 8; inlen && i < 8; i++) + { + lane[i] = *inbuf++; + inlen--; + count++; + } + + if (count == bsize) + count = 0; + + nburn = ctx->ops->absorb(&ctx->state, pos, lane, 1, + (count % 8) ? -1 : blocklanes); + burn = nburn > burn ? nburn : burn; + } + + /* Absorb full input lanes. */ + + pos = count / 8; + nlanes = inlen / 8; + if (nlanes > 0) + { + nburn = ctx->ops->absorb(&ctx->state, pos, inbuf, nlanes, blocklanes); + burn = nburn > burn ? nburn : burn; + inlen -= nlanes * 8; + inbuf += nlanes * 8; + count += nlanes * 8; + count = count % bsize; + } + + if (inlen) + { + byte lane[8] = { 0, }; + + /* Absorb remaining partial input lane. */ + + pos = count / 8; + + for (i = count % 8; inlen && i < 8; i++) + { + lane[i] = *inbuf++; + inlen--; + count++; + } + + nburn = ctx->ops->absorb(&ctx->state, pos, lane, 1, -1); + burn = nburn > burn ? nburn : burn; + + gcry_assert(count < bsize); + } + + ctx->count = count; + + if (burn) + _gcry_burn_stack (burn); +} + static void keccak_init (int algo, void *context, unsigned int flags) @@ -267,29 +608,48 @@ keccak_init (int algo, void *context, unsigned int flags) memset (hd, 0, sizeof *hd); - ctx->bctx.nblocks = 0; - ctx->bctx.nblocks_high = 0; - ctx->bctx.count = 0; - ctx->bctx.bwrite = transform; + ctx->count = 0; + + /* Select generic implementation. */ +#ifdef USE_64BIT + ctx->ops = &keccak_generic64_ops; +#elif defined USE_32BIT + ctx->ops = &keccak_generic32bi_ops; +#endif + + /* Select optimized implementation based in hw features. */ + if (0) {} +#ifdef USE_64BIT_BMI2 + else if (features & HWF_INTEL_BMI2) + ctx->ops = &keccak_bmi2_64_ops; +#endif +#ifdef USE_32BIT_BMI2 + else if (features & HWF_INTEL_BMI2) + ctx->ops = &keccak_bmi2_32bi_ops; +#endif +#ifdef USE_64BIT_SHLD + else if (features & HWF_INTEL_FAST_SHLD) + ctx->ops = &keccak_shld_64_ops; +#endif /* Set input block size, in Keccak terms this is called 'rate'. */ switch (algo) { case GCRY_MD_SHA3_224: - ctx->bctx.blocksize = 1152 / 8; + ctx->blocksize = 1152 / 8; ctx->outlen = 224 / 8; break; case GCRY_MD_SHA3_256: - ctx->bctx.blocksize = 1088 / 8; + ctx->blocksize = 1088 / 8; ctx->outlen = 256 / 8; break; case GCRY_MD_SHA3_384: - ctx->bctx.blocksize = 832 / 8; + ctx->blocksize = 832 / 8; ctx->outlen = 384 / 8; break; case GCRY_MD_SHA3_512: - ctx->bctx.blocksize = 576 / 8; + ctx->blocksize = 576 / 8; ctx->outlen = 512 / 8; break; default: @@ -334,59 +694,37 @@ keccak_final (void *context) { KECCAK_CONTEXT *ctx = context; KECCAK_STATE *hd = &ctx->state; - const size_t bsize = ctx->bctx.blocksize; + const size_t bsize = ctx->blocksize; const byte suffix = SHA3_DELIMITED_SUFFIX; - u64 *state = (u64 *)hd->state; - unsigned int stack_burn_depth; + unsigned int nburn, burn = 0; unsigned int lastbytes; - unsigned int i; - byte *buf; + byte lane[8]; - _gcry_md_block_write (context, NULL, 0); /* flush */ - - buf = ctx->bctx.buf; - lastbytes = ctx->bctx.count; - - /* Absorb remaining bytes. */ - for (i = 0; i < lastbytes / 8; i++) - { - state[i] ^= buf_get_le64(buf); - buf += 8; - } - - for (i = 0; i < lastbytes % 8; i++) - { - state[lastbytes / 8] ^= (u64)*buf << (i * 8); - buf++; - } + lastbytes = ctx->count; /* Do the padding and switch to the squeezing phase */ /* Absorb the last few bits and add the first bit of padding (which coincides with the delimiter in delimited suffix) */ - state[lastbytes / 8] ^= (u64)suffix << ((lastbytes % 8) * 8); + buf_put_le64(lane, (u64)suffix << ((lastbytes % 8) * 8)); + nburn = ctx->ops->absorb(&ctx->state, lastbytes / 8, lane, 1, -1); + burn = nburn > burn ? nburn : burn; /* Add the second bit of padding. */ - state[(bsize - 1) / 8] ^= (u64)0x80 << (((bsize - 1) % 8) * 8); + buf_put_le64(lane, (u64)0x80 << (((bsize - 1) % 8) * 8)); + nburn = ctx->ops->absorb(&ctx->state, (bsize - 1) / 8, lane, 1, -1); + burn = nburn > burn ? nburn : burn; /* Switch to the squeezing phase. */ - stack_burn_depth = keccak_f1600_state_permute(hd); + nburn = ctx->ops->permute(hd); + burn = nburn > burn ? nburn : burn; /* Squeeze out all the output blocks */ if (ctx->outlen < bsize) { /* Output SHA3 digest. */ - buf = ctx->bctx.buf; - for (i = 0; i < ctx->outlen / 8; i++) - { - buf_put_le64(buf, state[i]); - buf += 8; - } - for (i = 0; i < ctx->outlen % 8; i++) - { - *buf = state[ctx->outlen / 8] >> (i * 8); - buf++; - } + nburn = ctx->ops->extract_inplace(hd, ctx->outlen); + burn = nburn > burn ? nburn : burn; } else { @@ -394,15 +732,18 @@ keccak_final (void *context) BUG(); } - _gcry_burn_stack (stack_burn_depth); + wipememory(lane, sizeof(lane)); + if (burn) + _gcry_burn_stack (burn); } static byte * keccak_read (void *context) { - KECCAK_CONTEXT *hd = (KECCAK_CONTEXT *) context; - return hd->bctx.buf; + KECCAK_CONTEXT *ctx = (KECCAK_CONTEXT *) context; + KECCAK_STATE *hd = &ctx->state; + return (byte *)&hd->u; } @@ -585,7 +926,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_224 = { GCRY_MD_SHA3_224, {0, 1}, "SHA3-224", sha3_224_asn, DIM (sha3_224_asn), oid_spec_sha3_224, 28, - sha3_224_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_224_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -593,7 +934,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_256 = { GCRY_MD_SHA3_256, {0, 1}, "SHA3-256", sha3_256_asn, DIM (sha3_256_asn), oid_spec_sha3_256, 32, - sha3_256_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_256_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -601,7 +942,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_384 = { GCRY_MD_SHA3_384, {0, 1}, "SHA3-384", sha3_384_asn, DIM (sha3_384_asn), oid_spec_sha3_384, 48, - sha3_384_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_384_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -609,7 +950,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_512 = { GCRY_MD_SHA3_512, {0, 1}, "SHA3-512", sha3_512_asn, DIM (sha3_512_asn), oid_spec_sha3_512, 64, - sha3_512_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_512_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; diff --git a/cipher/keccak_permute_32.h b/cipher/keccak_permute_32.h new file mode 100644 index 0000000..fed9383 --- /dev/null +++ b/cipher/keccak_permute_32.h @@ -0,0 +1,535 @@ +/* keccak_permute_32.h - Keccak permute function (simple 32bit bit-interleaved) + * Copyright (C) 2015 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser general Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +/* The code is based on public-domain/CC0 "keccakc1024/simple32bi/ + * Keccak-simple32BI.c" implementation by Ronny Van Keer from SUPERCOP toolkit + * package. + */ + +/* Function that computes the Keccak-f[1600] permutation on the given state. */ +static unsigned int +KECCAK_F1600_PERMUTE_FUNC_NAME(KECCAK_STATE *hd) +{ + const u32 *round_consts = round_consts_32bit; + u32 Aba0, Abe0, Abi0, Abo0, Abu0; + u32 Aba1, Abe1, Abi1, Abo1, Abu1; + u32 Aga0, Age0, Agi0, Ago0, Agu0; + u32 Aga1, Age1, Agi1, Ago1, Agu1; + u32 Aka0, Ake0, Aki0, Ako0, Aku0; + u32 Aka1, Ake1, Aki1, Ako1, Aku1; + u32 Ama0, Ame0, Ami0, Amo0, Amu0; + u32 Ama1, Ame1, Ami1, Amo1, Amu1; + u32 Asa0, Ase0, Asi0, Aso0, Asu0; + u32 Asa1, Ase1, Asi1, Aso1, Asu1; + u32 BCa0, BCe0, BCi0, BCo0, BCu0; + u32 BCa1, BCe1, BCi1, BCo1, BCu1; + u32 Da0, De0, Di0, Do0, Du0; + u32 Da1, De1, Di1, Do1, Du1; + u32 Eba0, Ebe0, Ebi0, Ebo0, Ebu0; + u32 Eba1, Ebe1, Ebi1, Ebo1, Ebu1; + u32 Ega0, Ege0, Egi0, Ego0, Egu0; + u32 Ega1, Ege1, Egi1, Ego1, Egu1; + u32 Eka0, Eke0, Eki0, Eko0, Eku0; + u32 Eka1, Eke1, Eki1, Eko1, Eku1; + u32 Ema0, Eme0, Emi0, Emo0, Emu0; + u32 Ema1, Eme1, Emi1, Emo1, Emu1; + u32 Esa0, Ese0, Esi0, Eso0, Esu0; + u32 Esa1, Ese1, Esi1, Eso1, Esu1; + u32 *state = hd->u.state32bi; + unsigned int round; + + Aba0 = state[0]; + Aba1 = state[1]; + Abe0 = state[2]; + Abe1 = state[3]; + Abi0 = state[4]; + Abi1 = state[5]; + Abo0 = state[6]; + Abo1 = state[7]; + Abu0 = state[8]; + Abu1 = state[9]; + Aga0 = state[10]; + Aga1 = state[11]; + Age0 = state[12]; + Age1 = state[13]; + Agi0 = state[14]; + Agi1 = state[15]; + Ago0 = state[16]; + Ago1 = state[17]; + Agu0 = state[18]; + Agu1 = state[19]; + Aka0 = state[20]; + Aka1 = state[21]; + Ake0 = state[22]; + Ake1 = state[23]; + Aki0 = state[24]; + Aki1 = state[25]; + Ako0 = state[26]; + Ako1 = state[27]; + Aku0 = state[28]; + Aku1 = state[29]; + Ama0 = state[30]; + Ama1 = state[31]; + Ame0 = state[32]; + Ame1 = state[33]; + Ami0 = state[34]; + Ami1 = state[35]; + Amo0 = state[36]; + Amo1 = state[37]; + Amu0 = state[38]; + Amu1 = state[39]; + Asa0 = state[40]; + Asa1 = state[41]; + Ase0 = state[42]; + Ase1 = state[43]; + Asi0 = state[44]; + Asi1 = state[45]; + Aso0 = state[46]; + Aso1 = state[47]; + Asu0 = state[48]; + Asu1 = state[49]; + + for (round = 0; round < 24; round += 2) + { + /* prepareTheta */ + BCa0 = Aba0 ^ Aga0 ^ Aka0 ^ Ama0 ^ Asa0; + BCa1 = Aba1 ^ Aga1 ^ Aka1 ^ Ama1 ^ Asa1; + BCe0 = Abe0 ^ Age0 ^ Ake0 ^ Ame0 ^ Ase0; + BCe1 = Abe1 ^ Age1 ^ Ake1 ^ Ame1 ^ Ase1; + BCi0 = Abi0 ^ Agi0 ^ Aki0 ^ Ami0 ^ Asi0; + BCi1 = Abi1 ^ Agi1 ^ Aki1 ^ Ami1 ^ Asi1; + BCo0 = Abo0 ^ Ago0 ^ Ako0 ^ Amo0 ^ Aso0; + BCo1 = Abo1 ^ Ago1 ^ Ako1 ^ Amo1 ^ Aso1; + BCu0 = Abu0 ^ Agu0 ^ Aku0 ^ Amu0 ^ Asu0; + BCu1 = Abu1 ^ Agu1 ^ Aku1 ^ Amu1 ^ Asu1; + + /* thetaRhoPiChiIota(round , A, E) */ + Da0 = BCu0 ^ ROL32(BCe1, 1); + Da1 = BCu1 ^ BCe0; + De0 = BCa0 ^ ROL32(BCi1, 1); + De1 = BCa1 ^ BCi0; + Di0 = BCe0 ^ ROL32(BCo1, 1); + Di1 = BCe1 ^ BCo0; + Do0 = BCi0 ^ ROL32(BCu1, 1); + Do1 = BCi1 ^ BCu0; + Du0 = BCo0 ^ ROL32(BCa1, 1); + Du1 = BCo1 ^ BCa0; + + Aba0 ^= Da0; + BCa0 = Aba0; + Age0 ^= De0; + BCe0 = ROL32(Age0, 22); + Aki1 ^= Di1; + BCi0 = ROL32(Aki1, 22); + Amo1 ^= Do1; + BCo0 = ROL32(Amo1, 11); + Asu0 ^= Du0; + BCu0 = ROL32(Asu0, 7); + Eba0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eba0 ^= round_consts[round * 2 + 0]; + Ebe0 = BCe0 ^ ANDN32(BCi0, BCo0); + Ebi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ebo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Ebu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Aba1 ^= Da1; + BCa1 = Aba1; + Age1 ^= De1; + BCe1 = ROL32(Age1, 22); + Aki0 ^= Di0; + BCi1 = ROL32(Aki0, 21); + Amo0 ^= Do0; + BCo1 = ROL32(Amo0, 10); + Asu1 ^= Du1; + BCu1 = ROL32(Asu1, 7); + Eba1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eba1 ^= round_consts[round * 2 + 1]; + Ebe1 = BCe1 ^ ANDN32(BCi1, BCo1); + Ebi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ebo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Ebu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abo0 ^= Do0; + BCa0 = ROL32(Abo0, 14); + Agu0 ^= Du0; + BCe0 = ROL32(Agu0, 10); + Aka1 ^= Da1; + BCi0 = ROL32(Aka1, 2); + Ame1 ^= De1; + BCo0 = ROL32(Ame1, 23); + Asi1 ^= Di1; + BCu0 = ROL32(Asi1, 31); + Ega0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ege0 = BCe0 ^ ANDN32(BCi0, BCo0); + Egi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ego0 = BCo0 ^ ANDN32(BCu0, BCa0); + Egu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abo1 ^= Do1; + BCa1 = ROL32(Abo1, 14); + Agu1 ^= Du1; + BCe1 = ROL32(Agu1, 10); + Aka0 ^= Da0; + BCi1 = ROL32(Aka0, 1); + Ame0 ^= De0; + BCo1 = ROL32(Ame0, 22); + Asi0 ^= Di0; + BCu1 = ROL32(Asi0, 30); + Ega1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ege1 = BCe1 ^ ANDN32(BCi1, BCo1); + Egi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ego1 = BCo1 ^ ANDN32(BCu1, BCa1); + Egu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abe1 ^= De1; + BCa0 = ROL32(Abe1, 1); + Agi0 ^= Di0; + BCe0 = ROL32(Agi0, 3); + Ako1 ^= Do1; + BCi0 = ROL32(Ako1, 13); + Amu0 ^= Du0; + BCo0 = ROL32(Amu0, 4); + Asa0 ^= Da0; + BCu0 = ROL32(Asa0, 9); + Eka0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eke0 = BCe0 ^ ANDN32(BCi0, BCo0); + Eki0 = BCi0 ^ ANDN32(BCo0, BCu0); + Eko0 = BCo0 ^ ANDN32(BCu0, BCa0); + Eku0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abe0 ^= De0; + BCa1 = Abe0; + Agi1 ^= Di1; + BCe1 = ROL32(Agi1, 3); + Ako0 ^= Do0; + BCi1 = ROL32(Ako0, 12); + Amu1 ^= Du1; + BCo1 = ROL32(Amu1, 4); + Asa1 ^= Da1; + BCu1 = ROL32(Asa1, 9); + Eka1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eke1 = BCe1 ^ ANDN32(BCi1, BCo1); + Eki1 = BCi1 ^ ANDN32(BCo1, BCu1); + Eko1 = BCo1 ^ ANDN32(BCu1, BCa1); + Eku1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abu1 ^= Du1; + BCa0 = ROL32(Abu1, 14); + Aga0 ^= Da0; + BCe0 = ROL32(Aga0, 18); + Ake0 ^= De0; + BCi0 = ROL32(Ake0, 5); + Ami1 ^= Di1; + BCo0 = ROL32(Ami1, 8); + Aso0 ^= Do0; + BCu0 = ROL32(Aso0, 28); + Ema0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eme0 = BCe0 ^ ANDN32(BCi0, BCo0); + Emi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Emo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Emu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abu0 ^= Du0; + BCa1 = ROL32(Abu0, 13); + Aga1 ^= Da1; + BCe1 = ROL32(Aga1, 18); + Ake1 ^= De1; + BCi1 = ROL32(Ake1, 5); + Ami0 ^= Di0; + BCo1 = ROL32(Ami0, 7); + Aso1 ^= Do1; + BCu1 = ROL32(Aso1, 28); + Ema1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eme1 = BCe1 ^ ANDN32(BCi1, BCo1); + Emi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Emo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Emu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abi0 ^= Di0; + BCa0 = ROL32(Abi0, 31); + Ago1 ^= Do1; + BCe0 = ROL32(Ago1, 28); + Aku1 ^= Du1; + BCi0 = ROL32(Aku1, 20); + Ama1 ^= Da1; + BCo0 = ROL32(Ama1, 21); + Ase0 ^= De0; + BCu0 = ROL32(Ase0, 1); + Esa0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ese0 = BCe0 ^ ANDN32(BCi0, BCo0); + Esi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Eso0 = BCo0 ^ ANDN32(BCu0, BCa0); + Esu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abi1 ^= Di1; + BCa1 = ROL32(Abi1, 31); + Ago0 ^= Do0; + BCe1 = ROL32(Ago0, 27); + Aku0 ^= Du0; + BCi1 = ROL32(Aku0, 19); + Ama0 ^= Da0; + BCo1 = ROL32(Ama0, 20); + Ase1 ^= De1; + BCu1 = ROL32(Ase1, 1); + Esa1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ese1 = BCe1 ^ ANDN32(BCi1, BCo1); + Esi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Eso1 = BCo1 ^ ANDN32(BCu1, BCa1); + Esu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + /* prepareTheta */ + BCa0 = Eba0 ^ Ega0 ^ Eka0 ^ Ema0 ^ Esa0; + BCa1 = Eba1 ^ Ega1 ^ Eka1 ^ Ema1 ^ Esa1; + BCe0 = Ebe0 ^ Ege0 ^ Eke0 ^ Eme0 ^ Ese0; + BCe1 = Ebe1 ^ Ege1 ^ Eke1 ^ Eme1 ^ Ese1; + BCi0 = Ebi0 ^ Egi0 ^ Eki0 ^ Emi0 ^ Esi0; + BCi1 = Ebi1 ^ Egi1 ^ Eki1 ^ Emi1 ^ Esi1; + BCo0 = Ebo0 ^ Ego0 ^ Eko0 ^ Emo0 ^ Eso0; + BCo1 = Ebo1 ^ Ego1 ^ Eko1 ^ Emo1 ^ Eso1; + BCu0 = Ebu0 ^ Egu0 ^ Eku0 ^ Emu0 ^ Esu0; + BCu1 = Ebu1 ^ Egu1 ^ Eku1 ^ Emu1 ^ Esu1; + + /* thetaRhoPiChiIota(round+1, E, A) */ + Da0 = BCu0 ^ ROL32(BCe1, 1); + Da1 = BCu1 ^ BCe0; + De0 = BCa0 ^ ROL32(BCi1, 1); + De1 = BCa1 ^ BCi0; + Di0 = BCe0 ^ ROL32(BCo1, 1); + Di1 = BCe1 ^ BCo0; + Do0 = BCi0 ^ ROL32(BCu1, 1); + Do1 = BCi1 ^ BCu0; + Du0 = BCo0 ^ ROL32(BCa1, 1); + Du1 = BCo1 ^ BCa0; + + Eba0 ^= Da0; + BCa0 = Eba0; + Ege0 ^= De0; + BCe0 = ROL32(Ege0, 22); + Eki1 ^= Di1; + BCi0 = ROL32(Eki1, 22); + Emo1 ^= Do1; + BCo0 = ROL32(Emo1, 11); + Esu0 ^= Du0; + BCu0 = ROL32(Esu0, 7); + Aba0 = BCa0 ^ ANDN32(BCe0, BCi0); + Aba0 ^= round_consts[round * 2 + 2]; + Abe0 = BCe0 ^ ANDN32(BCi0, BCo0); + Abi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Abo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Abu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Eba1 ^= Da1; + BCa1 = Eba1; + Ege1 ^= De1; + BCe1 = ROL32(Ege1, 22); + Eki0 ^= Di0; + BCi1 = ROL32(Eki0, 21); + Emo0 ^= Do0; + BCo1 = ROL32(Emo0, 10); + Esu1 ^= Du1; + BCu1 = ROL32(Esu1, 7); + Aba1 = BCa1 ^ ANDN32(BCe1, BCi1); + Aba1 ^= round_consts[round * 2 + 3]; + Abe1 = BCe1 ^ ANDN32(BCi1, BCo1); + Abi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Abo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Abu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebo0 ^= Do0; + BCa0 = ROL32(Ebo0, 14); + Egu0 ^= Du0; + BCe0 = ROL32(Egu0, 10); + Eka1 ^= Da1; + BCi0 = ROL32(Eka1, 2); + Eme1 ^= De1; + BCo0 = ROL32(Eme1, 23); + Esi1 ^= Di1; + BCu0 = ROL32(Esi1, 31); + Aga0 = BCa0 ^ ANDN32(BCe0, BCi0); + Age0 = BCe0 ^ ANDN32(BCi0, BCo0); + Agi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ago0 = BCo0 ^ ANDN32(BCu0, BCa0); + Agu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebo1 ^= Do1; + BCa1 = ROL32(Ebo1, 14); + Egu1 ^= Du1; + BCe1 = ROL32(Egu1, 10); + Eka0 ^= Da0; + BCi1 = ROL32(Eka0, 1); + Eme0 ^= De0; + BCo1 = ROL32(Eme0, 22); + Esi0 ^= Di0; + BCu1 = ROL32(Esi0, 30); + Aga1 = BCa1 ^ ANDN32(BCe1, BCi1); + Age1 = BCe1 ^ ANDN32(BCi1, BCo1); + Agi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ago1 = BCo1 ^ ANDN32(BCu1, BCa1); + Agu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebe1 ^= De1; + BCa0 = ROL32(Ebe1, 1); + Egi0 ^= Di0; + BCe0 = ROL32(Egi0, 3); + Eko1 ^= Do1; + BCi0 = ROL32(Eko1, 13); + Emu0 ^= Du0; + BCo0 = ROL32(Emu0, 4); + Esa0 ^= Da0; + BCu0 = ROL32(Esa0, 9); + Aka0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ake0 = BCe0 ^ ANDN32(BCi0, BCo0); + Aki0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ako0 = BCo0 ^ ANDN32(BCu0, BCa0); + Aku0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebe0 ^= De0; + BCa1 = Ebe0; + Egi1 ^= Di1; + BCe1 = ROL32(Egi1, 3); + Eko0 ^= Do0; + BCi1 = ROL32(Eko0, 12); + Emu1 ^= Du1; + BCo1 = ROL32(Emu1, 4); + Esa1 ^= Da1; + BCu1 = ROL32(Esa1, 9); + Aka1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ake1 = BCe1 ^ ANDN32(BCi1, BCo1); + Aki1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ako1 = BCo1 ^ ANDN32(BCu1, BCa1); + Aku1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebu1 ^= Du1; + BCa0 = ROL32(Ebu1, 14); + Ega0 ^= Da0; + BCe0 = ROL32(Ega0, 18); + Eke0 ^= De0; + BCi0 = ROL32(Eke0, 5); + Emi1 ^= Di1; + BCo0 = ROL32(Emi1, 8); + Eso0 ^= Do0; + BCu0 = ROL32(Eso0, 28); + Ama0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ame0 = BCe0 ^ ANDN32(BCi0, BCo0); + Ami0 = BCi0 ^ ANDN32(BCo0, BCu0); + Amo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Amu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebu0 ^= Du0; + BCa1 = ROL32(Ebu0, 13); + Ega1 ^= Da1; + BCe1 = ROL32(Ega1, 18); + Eke1 ^= De1; + BCi1 = ROL32(Eke1, 5); + Emi0 ^= Di0; + BCo1 = ROL32(Emi0, 7); + Eso1 ^= Do1; + BCu1 = ROL32(Eso1, 28); + Ama1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ame1 = BCe1 ^ ANDN32(BCi1, BCo1); + Ami1 = BCi1 ^ ANDN32(BCo1, BCu1); + Amo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Amu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebi0 ^= Di0; + BCa0 = ROL32(Ebi0, 31); + Ego1 ^= Do1; + BCe0 = ROL32(Ego1, 28); + Eku1 ^= Du1; + BCi0 = ROL32(Eku1, 20); + Ema1 ^= Da1; + BCo0 = ROL32(Ema1, 21); + Ese0 ^= De0; + BCu0 = ROL32(Ese0, 1); + Asa0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ase0 = BCe0 ^ ANDN32(BCi0, BCo0); + Asi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Aso0 = BCo0 ^ ANDN32(BCu0, BCa0); + Asu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebi1 ^= Di1; + BCa1 = ROL32(Ebi1, 31); + Ego0 ^= Do0; + BCe1 = ROL32(Ego0, 27); + Eku0 ^= Du0; + BCi1 = ROL32(Eku0, 19); + Ema0 ^= Da0; + BCo1 = ROL32(Ema0, 20); + Ese1 ^= De1; + BCu1 = ROL32(Ese1, 1); + Asa1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ase1 = BCe1 ^ ANDN32(BCi1, BCo1); + Asi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Aso1 = BCo1 ^ ANDN32(BCu1, BCa1); + Asu1 = BCu1 ^ ANDN32(BCa1, BCe1); + } + + state[0] = Aba0; + state[1] = Aba1; + state[2] = Abe0; + state[3] = Abe1; + state[4] = Abi0; + state[5] = Abi1; + state[6] = Abo0; + state[7] = Abo1; + state[8] = Abu0; + state[9] = Abu1; + state[10] = Aga0; + state[11] = Aga1; + state[12] = Age0; + state[13] = Age1; + state[14] = Agi0; + state[15] = Agi1; + state[16] = Ago0; + state[17] = Ago1; + state[18] = Agu0; + state[19] = Agu1; + state[20] = Aka0; + state[21] = Aka1; + state[22] = Ake0; + state[23] = Ake1; + state[24] = Aki0; + state[25] = Aki1; + state[26] = Ako0; + state[27] = Ako1; + state[28] = Aku0; + state[29] = Aku1; + state[30] = Ama0; + state[31] = Ama1; + state[32] = Ame0; + state[33] = Ame1; + state[34] = Ami0; + state[35] = Ami1; + state[36] = Amo0; + state[37] = Amo1; + state[38] = Amu0; + state[39] = Amu1; + state[40] = Asa0; + state[41] = Asa1; + state[42] = Ase0; + state[43] = Ase1; + state[44] = Asi0; + state[45] = Asi1; + state[46] = Aso0; + state[47] = Aso1; + state[48] = Asu0; + state[49] = Asu1; + + return sizeof(void *) * 4 + sizeof(u32) * 12 * 5 * 2; +} diff --git a/cipher/keccak_permute_64.h b/cipher/keccak_permute_64.h new file mode 100644 index 0000000..1264f19 --- /dev/null +++ b/cipher/keccak_permute_64.h @@ -0,0 +1,290 @@ +/* keccak_permute_64.h - Keccak permute function (simple 64bit) + * Copyright (C) 2015 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser general Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +/* The code is based on public-domain/CC0 "keccakc1024/simple/Keccak-simple.c" + * implementation by Ronny Van Keer from SUPERCOP toolkit package. + */ + +/* Function that computes the Keccak-f[1600] permutation on the given state. */ +static unsigned int +KECCAK_F1600_PERMUTE_FUNC_NAME(KECCAK_STATE *hd) +{ + const u64 *round_consts = round_consts_64bit; + u64 Aba, Abe, Abi, Abo, Abu; + u64 Aga, Age, Agi, Ago, Agu; + u64 Aka, Ake, Aki, Ako, Aku; + u64 Ama, Ame, Ami, Amo, Amu; + u64 Asa, Ase, Asi, Aso, Asu; + u64 BCa, BCe, BCi, BCo, BCu; + u64 Da, De, Di, Do, Du; + u64 Eba, Ebe, Ebi, Ebo, Ebu; + u64 Ega, Ege, Egi, Ego, Egu; + u64 Eka, Eke, Eki, Eko, Eku; + u64 Ema, Eme, Emi, Emo, Emu; + u64 Esa, Ese, Esi, Eso, Esu; + u64 *state = hd->u.state64; + unsigned int round; + + Aba = state[0]; + Abe = state[1]; + Abi = state[2]; + Abo = state[3]; + Abu = state[4]; + Aga = state[5]; + Age = state[6]; + Agi = state[7]; + Ago = state[8]; + Agu = state[9]; + Aka = state[10]; + Ake = state[11]; + Aki = state[12]; + Ako = state[13]; + Aku = state[14]; + Ama = state[15]; + Ame = state[16]; + Ami = state[17]; + Amo = state[18]; + Amu = state[19]; + Asa = state[20]; + Ase = state[21]; + Asi = state[22]; + Aso = state[23]; + Asu = state[24]; + + for (round = 0; round < 24; round += 2) + { + /* prepareTheta */ + BCa = Aba ^ Aga ^ Aka ^ Ama ^ Asa; + BCe = Abe ^ Age ^ Ake ^ Ame ^ Ase; + BCi = Abi ^ Agi ^ Aki ^ Ami ^ Asi; + BCo = Abo ^ Ago ^ Ako ^ Amo ^ Aso; + BCu = Abu ^ Agu ^ Aku ^ Amu ^ Asu; + + /* thetaRhoPiChiIotaPrepareTheta(round , A, E) */ + Da = BCu ^ ROL64(BCe, 1); + De = BCa ^ ROL64(BCi, 1); + Di = BCe ^ ROL64(BCo, 1); + Do = BCi ^ ROL64(BCu, 1); + Du = BCo ^ ROL64(BCa, 1); + + Aba ^= Da; + BCa = Aba; + Age ^= De; + BCe = ROL64(Age, 44); + Aki ^= Di; + BCi = ROL64(Aki, 43); + Amo ^= Do; + BCo = ROL64(Amo, 21); + Asu ^= Du; + BCu = ROL64(Asu, 14); + Eba = BCa ^ ANDN64(BCe, BCi); + Eba ^= (u64)round_consts[round]; + Ebe = BCe ^ ANDN64(BCi, BCo); + Ebi = BCi ^ ANDN64(BCo, BCu); + Ebo = BCo ^ ANDN64(BCu, BCa); + Ebu = BCu ^ ANDN64(BCa, BCe); + + Abo ^= Do; + BCa = ROL64(Abo, 28); + Agu ^= Du; + BCe = ROL64(Agu, 20); + Aka ^= Da; + BCi = ROL64(Aka, 3); + Ame ^= De; + BCo = ROL64(Ame, 45); + Asi ^= Di; + BCu = ROL64(Asi, 61); + Ega = BCa ^ ANDN64(BCe, BCi); + Ege = BCe ^ ANDN64(BCi, BCo); + Egi = BCi ^ ANDN64(BCo, BCu); + Ego = BCo ^ ANDN64(BCu, BCa); + Egu = BCu ^ ANDN64(BCa, BCe); + + Abe ^= De; + BCa = ROL64(Abe, 1); + Agi ^= Di; + BCe = ROL64(Agi, 6); + Ako ^= Do; + BCi = ROL64(Ako, 25); + Amu ^= Du; + BCo = ROL64(Amu, 8); + Asa ^= Da; + BCu = ROL64(Asa, 18); + Eka = BCa ^ ANDN64(BCe, BCi); + Eke = BCe ^ ANDN64(BCi, BCo); + Eki = BCi ^ ANDN64(BCo, BCu); + Eko = BCo ^ ANDN64(BCu, BCa); + Eku = BCu ^ ANDN64(BCa, BCe); + + Abu ^= Du; + BCa = ROL64(Abu, 27); + Aga ^= Da; + BCe = ROL64(Aga, 36); + Ake ^= De; + BCi = ROL64(Ake, 10); + Ami ^= Di; + BCo = ROL64(Ami, 15); + Aso ^= Do; + BCu = ROL64(Aso, 56); + Ema = BCa ^ ANDN64(BCe, BCi); + Eme = BCe ^ ANDN64(BCi, BCo); + Emi = BCi ^ ANDN64(BCo, BCu); + Emo = BCo ^ ANDN64(BCu, BCa); + Emu = BCu ^ ANDN64(BCa, BCe); + + Abi ^= Di; + BCa = ROL64(Abi, 62); + Ago ^= Do; + BCe = ROL64(Ago, 55); + Aku ^= Du; + BCi = ROL64(Aku, 39); + Ama ^= Da; + BCo = ROL64(Ama, 41); + Ase ^= De; + BCu = ROL64(Ase, 2); + Esa = BCa ^ ANDN64(BCe, BCi); + Ese = BCe ^ ANDN64(BCi, BCo); + Esi = BCi ^ ANDN64(BCo, BCu); + Eso = BCo ^ ANDN64(BCu, BCa); + Esu = BCu ^ ANDN64(BCa, BCe); + + /* prepareTheta */ + BCa = Eba ^ Ega ^ Eka ^ Ema ^ Esa; + BCe = Ebe ^ Ege ^ Eke ^ Eme ^ Ese; + BCi = Ebi ^ Egi ^ Eki ^ Emi ^ Esi; + BCo = Ebo ^ Ego ^ Eko ^ Emo ^ Eso; + BCu = Ebu ^ Egu ^ Eku ^ Emu ^ Esu; + + /* thetaRhoPiChiIotaPrepareTheta(round+1, E, A) */ + Da = BCu ^ ROL64(BCe, 1); + De = BCa ^ ROL64(BCi, 1); + Di = BCe ^ ROL64(BCo, 1); + Do = BCi ^ ROL64(BCu, 1); + Du = BCo ^ ROL64(BCa, 1); + + Eba ^= Da; + BCa = Eba; + Ege ^= De; + BCe = ROL64(Ege, 44); + Eki ^= Di; + BCi = ROL64(Eki, 43); + Emo ^= Do; + BCo = ROL64(Emo, 21); + Esu ^= Du; + BCu = ROL64(Esu, 14); + Aba = BCa ^ ANDN64(BCe, BCi); + Aba ^= (u64)round_consts[round + 1]; + Abe = BCe ^ ANDN64(BCi, BCo); + Abi = BCi ^ ANDN64(BCo, BCu); + Abo = BCo ^ ANDN64(BCu, BCa); + Abu = BCu ^ ANDN64(BCa, BCe); + + Ebo ^= Do; + BCa = ROL64(Ebo, 28); + Egu ^= Du; + BCe = ROL64(Egu, 20); + Eka ^= Da; + BCi = ROL64(Eka, 3); + Eme ^= De; + BCo = ROL64(Eme, 45); + Esi ^= Di; + BCu = ROL64(Esi, 61); + Aga = BCa ^ ANDN64(BCe, BCi); + Age = BCe ^ ANDN64(BCi, BCo); + Agi = BCi ^ ANDN64(BCo, BCu); + Ago = BCo ^ ANDN64(BCu, BCa); + Agu = BCu ^ ANDN64(BCa, BCe); + + Ebe ^= De; + BCa = ROL64(Ebe, 1); + Egi ^= Di; + BCe = ROL64(Egi, 6); + Eko ^= Do; + BCi = ROL64(Eko, 25); + Emu ^= Du; + BCo = ROL64(Emu, 8); + Esa ^= Da; + BCu = ROL64(Esa, 18); + Aka = BCa ^ ANDN64(BCe, BCi); + Ake = BCe ^ ANDN64(BCi, BCo); + Aki = BCi ^ ANDN64(BCo, BCu); + Ako = BCo ^ ANDN64(BCu, BCa); + Aku = BCu ^ ANDN64(BCa, BCe); + + Ebu ^= Du; + BCa = ROL64(Ebu, 27); + Ega ^= Da; + BCe = ROL64(Ega, 36); + Eke ^= De; + BCi = ROL64(Eke, 10); + Emi ^= Di; + BCo = ROL64(Emi, 15); + Eso ^= Do; + BCu = ROL64(Eso, 56); + Ama = BCa ^ ANDN64(BCe, BCi); + Ame = BCe ^ ANDN64(BCi, BCo); + Ami = BCi ^ ANDN64(BCo, BCu); + Amo = BCo ^ ANDN64(BCu, BCa); + Amu = BCu ^ ANDN64(BCa, BCe); + + Ebi ^= Di; + BCa = ROL64(Ebi, 62); + Ego ^= Do; + BCe = ROL64(Ego, 55); + Eku ^= Du; + BCi = ROL64(Eku, 39); + Ema ^= Da; + BCo = ROL64(Ema, 41); + Ese ^= De; + BCu = ROL64(Ese, 2); + Asa = BCa ^ ANDN64(BCe, BCi); + Ase = BCe ^ ANDN64(BCi, BCo); + Asi = BCi ^ ANDN64(BCo, BCu); + Aso = BCo ^ ANDN64(BCu, BCa); + Asu = BCu ^ ANDN64(BCa, BCe); + } + + state[0] = Aba; + state[1] = Abe; + state[2] = Abi; + state[3] = Abo; + state[4] = Abu; + state[5] = Aga; + state[6] = Age; + state[7] = Agi; + state[8] = Ago; + state[9] = Agu; + state[10] = Aka; + state[11] = Ake; + state[12] = Aki; + state[13] = Ako; + state[14] = Aku; + state[15] = Ama; + state[16] = Ame; + state[17] = Ami; + state[18] = Amo; + state[19] = Amu; + state[20] = Asa; + state[21] = Ase; + state[22] = Asi; + state[23] = Aso; + state[24] = Asu; + + return sizeof(void *) * 4 + sizeof(u64) * 12 * 5; +} From ametzler at bebt.de Tue Oct 27 20:08:27 2015 From: ametzler at bebt.de (Andreas Metzler) Date: Tue, 27 Oct 2015 20:08:27 +0100 Subject: build-aux/texinfo.tex in relaese tarball is old Message-ID: <9mu4gc-gvo.ln1@argenau.downhill.at.eu.org> Hello, build-aux/texinfo.tex as shipped in 1.6.4 (and GIT master) is pretty old: \def\texinfoversion{2007-05-03.09} % % Copyright (C) 1985, 1986, 1988, 1990, 1991, 1992, 1993, 1994, 1995, % 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, % 2007 Free Software Foundation, Inc. This causes problems in UTF-8 locales, at least when a current texinfo toolchain is used. (The file gcrypt.pdf is only generated once, which leads to a missing table of contents and undefined references within the document, see http://bugs.debian.org/803081) In Debian I will simple delete build-aux/texinfo.tex at build-time to let the texinfo toolchain use the system-copy. I am not sure whether this is the correct fix, but having it posted here might be helpful anyway. cu Andreas -- `What a good friend you are to him, Dr. Maturin. His other friends are so grateful to you.' `I sew his ears on from time to time, sure' From wk at gnupg.org Wed Oct 28 16:57:14 2015 From: wk at gnupg.org (Werner Koch) Date: Wed, 28 Oct 2015 16:57:14 +0100 Subject: build-aux/texinfo.tex in relaese tarball is old In-Reply-To: <9mu4gc-gvo.ln1@argenau.downhill.at.eu.org> (Andreas Metzler's message of "Tue, 27 Oct 2015 20:08:27 +0100") References: <9mu4gc-gvo.ln1@argenau.downhill.at.eu.org> Message-ID: <87h9lb6l5x.fsf@vigenere.g10code.de> On Tue, 27 Oct 2015 20:08, ametzler at bebt.de said: > In Debian I will simple delete build-aux/texinfo.tex at build-time to > let the texinfo toolchain use the system-copy. Good. I hesitate to update this stuff for 1.6 but for sure I will use fresh scripts for the release of 1.7 release. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From jussi.kivilinna at iki.fi Wed Oct 28 18:55:42 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 28 Oct 2015 19:55:42 +0200 Subject: [PATCH 1/4] md: check hmac flag in prepare_macpads Message-ID: <20151028175542.25783.44927.stgit@localhost6.localdomain6> * cipher/md.c (prepare_macpads): Check hmac flag. -- Signed-off-by: Jussi Kivilinna --- cipher/md.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cipher/md.c b/cipher/md.c index c6bf90d..948d269 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -671,6 +671,9 @@ prepare_macpads (gcry_md_hd_t a, const unsigned char *key, size_t keylen) if (!a->ctx->list) return GPG_ERR_DIGEST_ALGO; /* Might happen if no algo is enabled. */ + if (!a->ctx->flags.hmac) + return GPG_ERR_DIGEST_ALGO; /* Tried setkey for non-HMAC md. */ + for (r = a->ctx->list; r; r = r->next) { const unsigned char *k; From jussi.kivilinna at iki.fi Wed Oct 28 18:55:47 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 28 Oct 2015 19:55:47 +0200 Subject: [PATCH 2/4] md: add variable length output interface In-Reply-To: <20151028175542.25783.44927.stgit@localhost6.localdomain6> References: <20151028175542.25783.44927.stgit@localhost6.localdomain6> Message-ID: <20151028175547.25783.11167.stgit@localhost6.localdomain6> * cipher/crc.c (_gcry_digest_spec_crc32) (_gcry_digest_spec_crc32_rfc1510, _gcry_digest_spec_crc24_rfc2440): Set 'extract' NULL. * cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_94) (_gcry_digest_spec_gost3411_cp): Ditto. * cipher/keccak.c (_gcry_digest_spec_sha3_224) (_gcry_digest_spec_sha3_256, _gcry_digest_spec_sha3_384) (_gcry_digest_spec_sha3_512): Ditto. * cipher/md2.c (_gcry_digest_spec_md2): Ditto. * cipher/md4.c (_gcry_digest_spec_md4): Ditto. * cipher/md5.c (_gcry_digest_spec_md5): Ditto. * cipher/rmd160.c (_gcry_digest_spec_rmd160): Ditto. * cipher/sha1.c (_gcry_digest_spec_sha1): Ditto. * cipher/sha256.c (_gcry_digest_spec_sha224) (_gcry_digest_spec_sha256): Ditto. * cipher/sha512.c (_gcry_digest_spec_sha384) (_gcry_digest_spec_sha512): Ditto. * cipher/stribog.c (_gcry_digest_spec_stribog_256) (_gcry_digest_spec_stribog_512): Ditto. * cipher/tiger.c (_gcry_digest_spec_tiger) (_gcry_digest_spec_tiger1, _gcry_digest_spec_tiger2): Ditto. * cipher/whirlpool.c (_gcry_digest_spec_whirlpool): Ditto. * cipher/md.c (md_enable): Do not allow combination of HMAC and 'expandable-output function'. (md_final): Check if spec->read is NULL before calling. (md_read): Ditto. (md_extract, _gcry_md_extract): New. * doc/gcrypt.texi: Add SHA3 algorithms and gcry_md_extract. * src/cipher-proto.h (gcry_md_extract_t): New. (gcry_md_spec_t): Add 'extract'. * src/gcrypt.h.in (gcry_md_extract): New. * src/libgcrypt.def: Add gcry_md_extract. * src/libgcrypt.vers: Add gcry_md_extract. * src/visibility.c (gcry_md_extract): New. * src/visibility.h (gcry_md_extract): New. -- Patch adds new interface for reading output from 'expandable-output function' MD algorithms that can give variable length output (ie. SHAKE algorithms from FIPS-202). New function to read output is gpg_error_t gcry_md_extract(gcry_md_hd_t md, int algo, void *buffer, size_t length); Function implicitly finalizes algorithm so that no new input can be given. Subsequents calls of the function return more output bytes from the algorithm. Signed-off-by: Jussi Kivilinna --- cipher/crc.c | 8 ++---- cipher/gostr3411-94.c | 4 +-- cipher/keccak.c | 8 +++--- cipher/md.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++- cipher/md2.c | 2 + cipher/md4.c | 2 + cipher/md5.c | 2 + cipher/rmd160.c | 2 + cipher/sha1.c | 2 + cipher/sha256.c | 4 +-- cipher/sha512.c | 4 +-- cipher/stribog.c | 2 + cipher/tiger.c | 6 ++-- cipher/whirlpool.c | 2 + doc/gcrypt.texi | 67 +++++++++++++++++++++++++++++++++++++++++++------ src/cipher-proto.h | 4 +++ src/gcrypt.h.in | 9 +++++-- src/libgcrypt.def | 1 + src/libgcrypt.vers | 2 + src/visibility.c | 6 ++++ src/visibility.h | 2 + 21 files changed, 169 insertions(+), 37 deletions(-) diff --git a/cipher/crc.c b/cipher/crc.c index 9105dfe..46a185a 100644 --- a/cipher/crc.c +++ b/cipher/crc.c @@ -785,7 +785,7 @@ gcry_md_spec_t _gcry_digest_spec_crc32 = { GCRY_MD_CRC32, {0, 1}, "CRC32", NULL, 0, NULL, 4, - crc32_init, crc32_write, crc32_final, crc32_read, + crc32_init, crc32_write, crc32_final, crc32_read, NULL, sizeof (CRC_CONTEXT) }; @@ -793,8 +793,7 @@ gcry_md_spec_t _gcry_digest_spec_crc32_rfc1510 = { GCRY_MD_CRC32_RFC1510, {0, 1}, "CRC32RFC1510", NULL, 0, NULL, 4, - crc32rfc1510_init, crc32_write, - crc32rfc1510_final, crc32_read, + crc32rfc1510_init, crc32_write, crc32rfc1510_final, crc32_read, NULL, sizeof (CRC_CONTEXT) }; @@ -802,7 +801,6 @@ gcry_md_spec_t _gcry_digest_spec_crc24_rfc2440 = { GCRY_MD_CRC24_RFC2440, {0, 1}, "CRC24RFC2440", NULL, 0, NULL, 3, - crc24rfc2440_init, crc24rfc2440_write, - crc24rfc2440_final, crc32_read, + crc24rfc2440_init, crc24rfc2440_write, crc24rfc2440_final, crc32_read, NULL, sizeof (CRC_CONTEXT) }; diff --git a/cipher/gostr3411-94.c b/cipher/gostr3411-94.c index 7b16e61..a782427 100644 --- a/cipher/gostr3411-94.c +++ b/cipher/gostr3411-94.c @@ -343,13 +343,13 @@ gcry_md_spec_t _gcry_digest_spec_gost3411_94 = { GCRY_MD_GOSTR3411_94, {0, 0}, "GOSTR3411_94", NULL, 0, NULL, 32, - gost3411_init, _gcry_md_block_write, gost3411_final, gost3411_read, + gost3411_init, _gcry_md_block_write, gost3411_final, gost3411_read, NULL, sizeof (GOSTR3411_CONTEXT) }; gcry_md_spec_t _gcry_digest_spec_gost3411_cp = { GCRY_MD_GOSTR3411_CP, {0, 0}, "GOSTR3411_CP", asn, DIM (asn), oid_spec_gostr3411, 32, - gost3411_cp_init, _gcry_md_block_write, gost3411_final, gost3411_read, + gost3411_cp_init, _gcry_md_block_write, gost3411_final, gost3411_read, NULL, sizeof (GOSTR3411_CONTEXT) }; diff --git a/cipher/keccak.c b/cipher/keccak.c index 3a72294..d46d9cb 100644 --- a/cipher/keccak.c +++ b/cipher/keccak.c @@ -927,7 +927,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_224 = { GCRY_MD_SHA3_224, {0, 1}, "SHA3-224", sha3_224_asn, DIM (sha3_224_asn), oid_spec_sha3_224, 28, - sha3_224_init, keccak_write, keccak_final, keccak_read, + sha3_224_init, keccak_write, keccak_final, keccak_read, NULL, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -935,7 +935,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_256 = { GCRY_MD_SHA3_256, {0, 1}, "SHA3-256", sha3_256_asn, DIM (sha3_256_asn), oid_spec_sha3_256, 32, - sha3_256_init, keccak_write, keccak_final, keccak_read, + sha3_256_init, keccak_write, keccak_final, keccak_read, NULL, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -943,7 +943,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_384 = { GCRY_MD_SHA3_384, {0, 1}, "SHA3-384", sha3_384_asn, DIM (sha3_384_asn), oid_spec_sha3_384, 48, - sha3_384_init, keccak_write, keccak_final, keccak_read, + sha3_384_init, keccak_write, keccak_final, keccak_read, NULL, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -951,7 +951,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_512 = { GCRY_MD_SHA3_512, {0, 1}, "SHA3-512", sha3_512_asn, DIM (sha3_512_asn), oid_spec_sha3_512, 64, - sha3_512_init, keccak_write, keccak_final, keccak_read, + sha3_512_init, keccak_write, keccak_final, keccak_read, NULL, sizeof (KECCAK_CONTEXT), run_selftests }; diff --git a/cipher/md.c b/cipher/md.c index 948d269..6ef8fee 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -408,6 +408,12 @@ md_enable (gcry_md_hd_t hd, int algorithm) } } + if (!err && h->flags.hmac && spec->read == NULL) + { + /* Expandable output function cannot act as part of HMAC. */ + err = GPG_ERR_DIGEST_ALGO; + } + if (!err) { size_t size = (sizeof (*entry) @@ -638,11 +644,16 @@ md_final (gcry_md_hd_t a) for (r = a->ctx->list; r; r = r->next) { - byte *p = r->spec->read (&r->context.c); + byte *p; size_t dlen = r->spec->mdlen; byte *hash; gcry_err_code_t err; + if (r->spec->read == NULL) + continue; + + p = r->spec->read (&r->context.c); + if (a->ctx->flags.secure) hash = xtrymalloc_secure (dlen); else @@ -821,6 +832,8 @@ md_read( gcry_md_hd_t a, int algo ) { if (r->next) log_debug ("more than one algorithm in md_read(0)\n"); + if (r->spec->read == NULL) + return NULL; return r->spec->read (&r->context.c); } } @@ -828,7 +841,11 @@ md_read( gcry_md_hd_t a, int algo ) { for (r = a->ctx->list; r; r = r->next) if (r->spec->algo == algo) - return r->spec->read (&r->context.c); + { + if (r->spec->read == NULL) + return NULL; + return r->spec->read (&r->context.c); + } } BUG(); return NULL; @@ -850,6 +867,52 @@ _gcry_md_read (gcry_md_hd_t hd, int algo) } +/**************** + * If ALGO is null get the digest for the used algo (which should be + * only one) + */ +static gcry_err_code_t +md_extract(gcry_md_hd_t a, int algo, void *out, size_t outlen) +{ + GcryDigestEntry *r = a->ctx->list; + + if (!algo) + { + /* Return the first algorithm */ + if (r && r->spec->extract) + { + if (r->next) + log_debug ("more than one algorithm in md_extract(0)\n"); + r->spec->extract (&r->context.c, out, outlen); + return 0; + } + } + else + { + for (r = a->ctx->list; r; r = r->next) + if (r->spec->algo == algo && r->spec->extract) + { + r->spec->extract (&r->context.c, out, outlen); + return 0; + } + } + + return GPG_ERR_DIGEST_ALGO; +} + + +/* + * Expand the output from XOF class digest, this function implictly finalizes + * the hash. + */ +gcry_err_code_t +_gcry_md_extract (gcry_md_hd_t hd, int algo, void *out, size_t outlen) +{ + _gcry_md_ctl (hd, GCRYCTL_FINALIZE, NULL, 0); + return md_extract (hd, algo, out, outlen); +} + + /* * Read out an intermediate digest. Not yet functional. */ diff --git a/cipher/md2.c b/cipher/md2.c index 97682e5..e339b28 100644 --- a/cipher/md2.c +++ b/cipher/md2.c @@ -177,6 +177,6 @@ gcry_md_spec_t _gcry_digest_spec_md2 = { GCRY_MD_MD2, {0, 0}, "MD2", asn, DIM (asn), oid_spec_md2, 16, - md2_init, _gcry_md_block_write, md2_final, md2_read, + md2_init, _gcry_md_block_write, md2_final, md2_read, NULL, sizeof (MD2_CONTEXT) }; diff --git a/cipher/md4.c b/cipher/md4.c index c9b4154..afa6382 100644 --- a/cipher/md4.c +++ b/cipher/md4.c @@ -286,6 +286,6 @@ gcry_md_spec_t _gcry_digest_spec_md4 = { GCRY_MD_MD4, {0, 0}, "MD4", asn, DIM (asn), oid_spec_md4,16, - md4_init, _gcry_md_block_write, md4_final, md4_read, + md4_init, _gcry_md_block_write, md4_final, md4_read, NULL, sizeof (MD4_CONTEXT) }; diff --git a/cipher/md5.c b/cipher/md5.c index f17af7a..66cc5f6 100644 --- a/cipher/md5.c +++ b/cipher/md5.c @@ -312,6 +312,6 @@ gcry_md_spec_t _gcry_digest_spec_md5 = { GCRY_MD_MD5, {0, 1}, "MD5", asn, DIM (asn), oid_spec_md5, 16, - md5_init, _gcry_md_block_write, md5_final, md5_read, + md5_init, _gcry_md_block_write, md5_final, md5_read, NULL, sizeof (MD5_CONTEXT) }; diff --git a/cipher/rmd160.c b/cipher/rmd160.c index 2695db2..cf7531e 100644 --- a/cipher/rmd160.c +++ b/cipher/rmd160.c @@ -526,6 +526,6 @@ gcry_md_spec_t _gcry_digest_spec_rmd160 = { GCRY_MD_RMD160, {0, 0}, "RIPEMD160", asn, DIM (asn), oid_spec_rmd160, 20, - rmd160_init, _gcry_md_block_write, rmd160_final, rmd160_read, + rmd160_init, _gcry_md_block_write, rmd160_final, rmd160_read, NULL, sizeof (RMD160_CONTEXT) }; diff --git a/cipher/sha1.c b/cipher/sha1.c index 554d55c..0de8412 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -573,7 +573,7 @@ gcry_md_spec_t _gcry_digest_spec_sha1 = { GCRY_MD_SHA1, {0, 1}, "SHA1", asn, DIM (asn), oid_spec_sha1, 20, - sha1_init, _gcry_md_block_write, sha1_final, sha1_read, + sha1_init, _gcry_md_block_write, sha1_final, sha1_read, NULL, sizeof (SHA1_CONTEXT), run_selftests }; diff --git a/cipher/sha256.c b/cipher/sha256.c index 63869d5..bc326e0 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -633,7 +633,7 @@ gcry_md_spec_t _gcry_digest_spec_sha224 = { GCRY_MD_SHA224, {0, 1}, "SHA224", asn224, DIM (asn224), oid_spec_sha224, 28, - sha224_init, _gcry_md_block_write, sha256_final, sha256_read, + sha224_init, _gcry_md_block_write, sha256_final, sha256_read, NULL, sizeof (SHA256_CONTEXT), run_selftests }; @@ -642,7 +642,7 @@ gcry_md_spec_t _gcry_digest_spec_sha256 = { GCRY_MD_SHA256, {0, 1}, "SHA256", asn256, DIM (asn256), oid_spec_sha256, 32, - sha256_init, _gcry_md_block_write, sha256_final, sha256_read, + sha256_init, _gcry_md_block_write, sha256_final, sha256_read, NULL, sizeof (SHA256_CONTEXT), run_selftests }; diff --git a/cipher/sha512.c b/cipher/sha512.c index 4be1cab..1196db9 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -877,7 +877,7 @@ gcry_md_spec_t _gcry_digest_spec_sha512 = { GCRY_MD_SHA512, {0, 1}, "SHA512", sha512_asn, DIM (sha512_asn), oid_spec_sha512, 64, - sha512_init, _gcry_md_block_write, sha512_final, sha512_read, + sha512_init, _gcry_md_block_write, sha512_final, sha512_read, NULL, sizeof (SHA512_CONTEXT), run_selftests }; @@ -903,7 +903,7 @@ gcry_md_spec_t _gcry_digest_spec_sha384 = { GCRY_MD_SHA384, {0, 1}, "SHA384", sha384_asn, DIM (sha384_asn), oid_spec_sha384, 48, - sha384_init, _gcry_md_block_write, sha512_final, sha512_read, + sha384_init, _gcry_md_block_write, sha512_final, sha512_read, NULL, sizeof (SHA512_CONTEXT), run_selftests }; diff --git a/cipher/stribog.c b/cipher/stribog.c index de167a7..7f38e6f 100644 --- a/cipher/stribog.c +++ b/cipher/stribog.c @@ -1326,6 +1326,7 @@ gcry_md_spec_t _gcry_digest_spec_stribog_256 = GCRY_MD_STRIBOG256, {0, 0}, "STRIBOG256", NULL, 0, NULL, 32, stribog_init_256, _gcry_md_block_write, stribog_final, stribog_read_256, + NULL, sizeof (STRIBOG_CONTEXT) }; @@ -1334,5 +1335,6 @@ gcry_md_spec_t _gcry_digest_spec_stribog_512 = GCRY_MD_STRIBOG512, {0, 0}, "STRIBOG512", NULL, 0, NULL, 64, stribog_init_512, _gcry_md_block_write, stribog_final, stribog_read_512, + NULL, sizeof (STRIBOG_CONTEXT) }; diff --git a/cipher/tiger.c b/cipher/tiger.c index 8a08953..078133a 100644 --- a/cipher/tiger.c +++ b/cipher/tiger.c @@ -840,7 +840,7 @@ gcry_md_spec_t _gcry_digest_spec_tiger = { GCRY_MD_TIGER, {0, 0}, "TIGER192", NULL, 0, NULL, 24, - tiger_init, _gcry_md_block_write, tiger_final, tiger_read, + tiger_init, _gcry_md_block_write, tiger_final, tiger_read, NULL, sizeof (TIGER_CONTEXT) }; @@ -863,7 +863,7 @@ gcry_md_spec_t _gcry_digest_spec_tiger1 = { GCRY_MD_TIGER1, {0, 0}, "TIGER", asn1, DIM (asn1), oid_spec_tiger1, 24, - tiger1_init, _gcry_md_block_write, tiger_final, tiger_read, + tiger1_init, _gcry_md_block_write, tiger_final, tiger_read, NULL, sizeof (TIGER_CONTEXT) }; @@ -874,7 +874,7 @@ gcry_md_spec_t _gcry_digest_spec_tiger2 = { GCRY_MD_TIGER2, {0, 0}, "TIGER2", NULL, 0, NULL, 24, - tiger2_init, _gcry_md_block_write, tiger_final, tiger_read, + tiger2_init, _gcry_md_block_write, tiger_final, tiger_read, NULL, sizeof (TIGER_CONTEXT) }; diff --git a/cipher/whirlpool.c b/cipher/whirlpool.c index 5f224a1..8a06939 100644 --- a/cipher/whirlpool.c +++ b/cipher/whirlpool.c @@ -1525,6 +1525,6 @@ gcry_md_spec_t _gcry_digest_spec_whirlpool = { GCRY_MD_WHIRLPOOL, {0, 0}, "WHIRLPOOL", NULL, 0, NULL, 64, - whirlpool_init, whirlpool_write, whirlpool_final, whirlpool_read, + whirlpool_init, whirlpool_write, whirlpool_final, whirlpool_read, NULL, sizeof (whirlpool_context_t) }; diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index f13695a..3450bb2 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -3036,6 +3036,7 @@ are also supported. @c begin table of hash algorithms @cindex SHA-1 @cindex SHA-224, SHA-256, SHA-384, SHA-512 + at cindex SHA3-224, SHA3-256, SHA3-384, SHA3-512 @cindex RIPE-MD-160 @cindex MD2, MD4, MD5 @cindex TIGER, TIGER1, TIGER2 @@ -3108,6 +3109,22 @@ See FIPS 180-2 for the specification. This is the SHA-384 algorithm which yields a message digest of 64 bytes. See FIPS 180-2 for the specification. + at item GCRY_MD_SHA3_224 +This is the SHA3-224 algorithm which yields a message digest of 28 bytes. +See FIPS 202 for the specification. + + at item GCRY_MD_SHA3_256 +This is the SHA3-256 algorithm which yields a message digest of 32 bytes. +See FIPS 202 for the specification. + + at item GCRY_MD_SHA3_384 +This is the SHA3-384 algorithm which yields a message digest of 48 bytes. +See FIPS 202 for the specification. + + at item GCRY_MD_SHA3_512 +This is the SHA3-384 algorithm which yields a message digest of 64 bytes. +See FIPS 202 for the specification. + @item GCRY_MD_CRC32 This is the ISO 3309 and ITU-T V.42 cyclic redundancy check. It yields an output of 4 bytes. Note that this is not a hash algorithm in the @@ -3170,11 +3187,12 @@ this is the hashed data is highly confidential. @item GCRY_MD_FLAG_HMAC @cindex HMAC Turn the algorithm into a HMAC message authentication algorithm. This -only works if just one algorithm is enabled for the handle. Note that -the function @code{gcry_md_setkey} must be used to set the MAC key. -The size of the MAC is equal to the message digest of the underlying -hash algorithm. If you want CBC message authentication codes based on -a cipher, see @xref{Working with cipher handles}. +only works if just one algorithm is enabled for the handle and that +algorithm is not an extendable-output function. Note that the function + at code{gcry_md_setkey} must be used to set the MAC key. The size of the +MAC is equal to the message digest of the underlying hash algorithm. +If you want CBC message authentication codes based on a cipher, +see @xref{Working with cipher handles}. @item GCRY_MD_FLAG_BUGEMU1 @cindex bug emulation @@ -3293,9 +3311,9 @@ message digest or some padding. @deftypefun void gcry_md_final (gcry_md_hd_t @var{h}) Finalize the message digest calculation. This is not really needed -because @code{gcry_md_read} does this implicitly. After this has been -done no further updates (by means of @code{gcry_md_write} or - at code{gcry_md_putc} should be done; However, to mitigate timing +because @code{gcry_md_read} and @code{gcry_md_extract} do this implicitly. +After this has been done no further updates (by means of @code{gcry_md_write} +or @code{gcry_md_putc} should be done; However, to mitigate timing attacks it is sometimes useful to keep on updating the context after having stored away the actual digest. Only the first call to this function has an effect. It is implemented as a macro. @@ -3318,6 +3336,22 @@ The function does return @code{NULL} if the requested algorithm has not been enabled. @end deftypefun +The way to read output of extendable-output function is by using the +function: + + at deftypefun gpg_err_code_t gcry_md_extract (gcry_md_hd_t @var{h}, @ + int @var{algo}, void *@var{buffer}, size_t @var{length}) + + at code{gcry_mac_read} returns output from extendable-output function. +This function may be used as often as required to generate more output +byte stream from the algorithm. Function extracts the new output bytes +to @var{buffer} of the length @var{length}. Buffer will be fully +populated with new output. @var{algo} may be given as 0 to return the only +enabled message digest or it may specify one of the enabled algorithms. +The function does return non-zero value if the requested algorithm has not +been enabled. + at end deftypefun + Because it is often necessary to get the message digest of blocks of memory, two fast convenience function are available for this task: @@ -3493,6 +3527,7 @@ provided by Libgcrypt. @c begin table of MAC algorithms @cindex HMAC-SHA-1 @cindex HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, HMAC-SHA-512 + at cindex HMAC-SHA3-224, HMAC-SHA3-256, HMAC-SHA3-384, HMAC-SHA3-512 @cindex HMAC-RIPE-MD-160 @cindex HMAC-MD2, HMAC-MD4, HMAC-MD5 @cindex HMAC-TIGER1 @@ -3520,6 +3555,22 @@ algorithm. This is HMAC message authentication algorithm based on the SHA-384 hash algorithm. + at item GCRY_MAC_HMAC_SHA3_256 +This is HMAC message authentication algorithm based on the SHA3-384 hash +algorithm. + + at item GCRY_MAC_HMAC_SHA3_224 +This is HMAC message authentication algorithm based on the SHA3-224 hash +algorithm. + + at item GCRY_MAC_HMAC_SHA3_512 +This is HMAC message authentication algorithm based on the SHA3-512 hash +algorithm. + + at item GCRY_MAC_HMAC_SHA3_384 +This is HMAC message authentication algorithm based on the SHA3-384 hash +algorithm. + @item GCRY_MAC_HMAC_SHA1 This is HMAC message authentication algorithm based on the SHA-1 hash algorithm. diff --git a/src/cipher-proto.h b/src/cipher-proto.h index 8267791..3bca9c7 100644 --- a/src/cipher-proto.h +++ b/src/cipher-proto.h @@ -215,6 +215,9 @@ typedef void (*gcry_md_final_t) (void *c); /* Type for the md_read function. */ typedef unsigned char *(*gcry_md_read_t) (void *c); +/* Type for the md_extract function. */ +typedef void (*gcry_md_extract_t) (void *c, void *outbuf, size_t nbytes); + typedef struct gcry_md_oid_spec { const char *oidstring; @@ -237,6 +240,7 @@ typedef struct gcry_md_spec gcry_md_write_t write; gcry_md_final_t final; gcry_md_read_t read; + gcry_md_extract_t extract; size_t contextsize; /* allocate this amount of context */ selftest_func_t selftest; } gcry_md_spec_t; diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index 585da6a..39be37a 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -473,7 +473,7 @@ char *gcry_sexp_nth_string (gcry_sexp_t list, int number); value can't be converted to an MPI, `NULL' is returned. */ gcry_mpi_t gcry_sexp_nth_mpi (gcry_sexp_t list, int number, int mpifmt); -/* Convenience fucntion to extract parameters from an S-expression +/* Convenience function to extract parameters from an S-expression * using a list of single letter parameters. */ gpg_error_t gcry_sexp_extract_param (gcry_sexp_t sexp, const char *path, @@ -1170,7 +1170,7 @@ enum gcry_md_algos GCRY_MD_GOSTR3411_94 = 308, /* GOST R 34.11-94. */ GCRY_MD_STRIBOG256 = 309, /* GOST R 34.11-2012, 256 bit. */ GCRY_MD_STRIBOG512 = 310, /* GOST R 34.11-2012, 512 bit. */ - GCRY_MD_GOSTR3411_CP = 311, /* GOST R 34.11-94 with CryptoPro-A S-Box. */ + GCRY_MD_GOSTR3411_CP = 311, /* GOST R 34.11-94 with CryptoPro-A S-Box. */ GCRY_MD_SHA3_224 = 312, GCRY_MD_SHA3_256 = 313, GCRY_MD_SHA3_384 = 314, @@ -1239,6 +1239,11 @@ void gcry_md_write (gcry_md_hd_t hd, const void *buffer, size_t length); algorithm ALGO. */ unsigned char *gcry_md_read (gcry_md_hd_t hd, int algo); +/* Read more output from algorithm ALGO to BUFFER of size LENGTH from + * digest object HD. Algorithm needs to be 'expendable-output function'. */ +gpg_error_t gcry_md_extract (gcry_md_hd_t hd, int algo, void *buffer, + size_t length); + /* Convenience function to calculate the hash from the data in BUFFER of size LENGTH using the algorithm ALGO avoiding the creating of a hash object. The hash is returned in the caller provided buffer diff --git a/src/libgcrypt.def b/src/libgcrypt.def index 924f17f..f3e074b 100644 --- a/src/libgcrypt.def +++ b/src/libgcrypt.def @@ -278,5 +278,6 @@ EXPORTS gcry_mpi_ec_sub @244 + gcry_md_extract @245 ;; end of file with public symbols for Windows. diff --git a/src/libgcrypt.vers b/src/libgcrypt.vers index 7e8df3f..5b3d419 100644 --- a/src/libgcrypt.vers +++ b/src/libgcrypt.vers @@ -41,7 +41,7 @@ GCRYPT_1.6 { gcry_md_get_algo; gcry_md_get_algo_dlen; gcry_md_hash_buffer; gcry_md_hash_buffers; gcry_md_info; gcry_md_is_enabled; gcry_md_is_secure; - gcry_md_map_name; gcry_md_open; gcry_md_read; + gcry_md_map_name; gcry_md_open; gcry_md_read; gcry_md_extract; gcry_md_reset; gcry_md_setkey; gcry_md_write; gcry_md_debug; diff --git a/src/visibility.c b/src/visibility.c index cbf24e7..23a2705 100644 --- a/src/visibility.c +++ b/src/visibility.c @@ -1165,6 +1165,12 @@ gcry_md_read (gcry_md_hd_t hd, int algo) return _gcry_md_read (hd, algo); } +gcry_err_code_t +gcry_md_extract (gcry_md_hd_t hd, int algo, void *buffer, size_t length) +{ + return _gcry_md_extract(hd, algo, buffer, length); +} + void gcry_md_hash_buffer (int algo, void *digest, const void *buffer, size_t length) diff --git a/src/visibility.h b/src/visibility.h index fa3c763..bb25de0 100644 --- a/src/visibility.h +++ b/src/visibility.h @@ -110,6 +110,7 @@ MARK_VISIBLEX (gcry_md_is_secure) MARK_VISIBLEX (gcry_md_map_name) MARK_VISIBLEX (gcry_md_open) MARK_VISIBLEX (gcry_md_read) +MARK_VISIBLEX (gcry_md_extract) MARK_VISIBLEX (gcry_md_reset) MARK_VISIBLEX (gcry_md_setkey) MARK_VISIBLEX (gcry_md_write) @@ -374,6 +375,7 @@ MARK_VISIBLEX (_gcry_mpi_get_const) #define gcry_md_map_name _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_md_open _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_md_read _gcry_USE_THE_UNDERSCORED_FUNCTION +#define gcry_md_extract _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_md_reset _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_md_setkey _gcry_USE_THE_UNDERSCORED_FUNCTION #define gcry_md_write _gcry_USE_THE_UNDERSCORED_FUNCTION From jussi.kivilinna at iki.fi Wed Oct 28 18:55:57 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 28 Oct 2015 19:55:57 +0200 Subject: [PATCH 4/4] Few updates to documentation In-Reply-To: <20151028175542.25783.44927.stgit@localhost6.localdomain6> References: <20151028175542.25783.44927.stgit@localhost6.localdomain6> Message-ID: <20151028175557.25783.78616.stgit@localhost6.localdomain6> * doc/gcrypt.text: Add mention of new 'intel-fast-shld' hw feature flag; Add mention of x86 RDRAND support in rndhw. -- Signed-off-by: Jussi Kivilinna --- doc/gcrypt.texi | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 3450bb2..facdf65 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -556,6 +556,7 @@ are @item padlock-sha @item padlock-mmul @item intel-cpu + at item intel-fast-shld @item intel-bmi2 @item intel-ssse3 @item intel-pclmul @@ -5610,9 +5611,9 @@ that system and is the only gathering module available for that OS. @item rndhw Extra module to collect additional entropy by utilizing a hardware -random number generator. As of now the only supported hardware RNG is -the Padlock engine of VIA (Centaur) CPUs. It is not available in FIPS -mode. +random number generator. As of now the supported hardware RNG is +the Padlock engine of VIA (Centaur) CPUs and x86 CPUs with the RDRAND +instruction. It is not available in FIPS mode. @end table From jussi.kivilinna at iki.fi Wed Oct 28 18:55:52 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 28 Oct 2015 19:55:52 +0200 Subject: [PATCH 3/4] Add HMAC-SHA3 test vectors In-Reply-To: <20151028175542.25783.44927.stgit@localhost6.localdomain6> References: <20151028175542.25783.44927.stgit@localhost6.localdomain6> Message-ID: <20151028175552.25783.85864.stgit@localhost6.localdomain6> * tests/basic.c (check_mac): Add HMAC_SHA3 test vectors. -- Signed-off-by: Jussi Kivilinna --- tests/basic.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+) diff --git a/tests/basic.c b/tests/basic.c index 4ea91a9..75ff349 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -6784,6 +6784,169 @@ check_mac (void) "\xde\xbd\x71\xf8\x86\x72\x89\x86\x5d\xf5\xa3\x2d\x20\xcd\xc9\x44" "\xb6\x02\x2c\xac\x3c\x49\x82\xb1\x0d\x5e\xeb\x55\xc3\xe4\xde\x15" "\x13\x46\x76\xfb\x6d\xe0\x44\x60\x65\xc9\x74\x40\xfa\x8c\x6a\x58" }, + /* HMAC-SHA3 test vectors from + * http://wolfgang-ehrhardt.de/hmac-sha3-testvectors.html */ + { GCRY_MAC_HMAC_SHA3_224, + "Hi There", + "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" + "\x0b\x0b\x0b", + "\x3b\x16\x54\x6b\xbc\x7b\xe2\x70\x6a\x03\x1d\xca\xfd\x56\x37\x3d" + "\x98\x84\x36\x76\x41\xd8\xc5\x9a\xf3\xc8\x60\xf7" }, + { GCRY_MAC_HMAC_SHA3_256, + "Hi There", + "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" + "\x0b\x0b\x0b", + "\xba\x85\x19\x23\x10\xdf\xfa\x96\xe2\xa3\xa4\x0e\x69\x77\x43\x51" + "\x14\x0b\xb7\x18\x5e\x12\x02\xcd\xcc\x91\x75\x89\xf9\x5e\x16\xbb" }, + { GCRY_MAC_HMAC_SHA3_512, + "Hi There", + "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" + "\x0b\x0b\x0b", + "\xeb\x3f\xbd\x4b\x2e\xaa\xb8\xf5\xc5\x04\xbd\x3a\x41\x46\x5a\xac" + "\xec\x15\x77\x0a\x7c\xab\xac\x53\x1e\x48\x2f\x86\x0b\x5e\xc7\xba" + "\x47\xcc\xb2\xc6\xf2\xaf\xce\x8f\x88\xd2\x2b\x6d\xc6\x13\x80\xf2" + "\x3a\x66\x8f\xd3\x88\x8b\xb8\x05\x37\xc0\xa0\xb8\x64\x07\x68\x9e" }, + { GCRY_MAC_HMAC_SHA3_224, "what do ya want for nothing?", "Jefe", + "\x7f\xdb\x8d\xd8\x8b\xd2\xf6\x0d\x1b\x79\x86\x34\xad\x38\x68\x11" + "\xc2\xcf\xc8\x5b\xfa\xf5\xd5\x2b\xba\xce\x5e\x66" }, + { GCRY_MAC_HMAC_SHA3_256, "what do ya want for nothing?", "Jefe", + "\xc7\xd4\x07\x2e\x78\x88\x77\xae\x35\x96\xbb\xb0\xda\x73\xb8\x87" + "\xc9\x17\x1f\x93\x09\x5b\x29\x4a\xe8\x57\xfb\xe2\x64\x5e\x1b\xa5" }, + { GCRY_MAC_HMAC_SHA3_384, "what do ya want for nothing?", "Jefe", + "\xf1\x10\x1f\x8c\xbf\x97\x66\xfd\x67\x64\xd2\xed\x61\x90\x3f\x21" + "\xca\x9b\x18\xf5\x7c\xf3\xe1\xa2\x3c\xa1\x35\x08\xa9\x32\x43\xce" + "\x48\xc0\x45\xdc\x00\x7f\x26\xa2\x1b\x3f\x5e\x0e\x9d\xf4\xc2\x0a" }, + { GCRY_MAC_HMAC_SHA3_512, "what do ya want for nothing?", "Jefe", + "\x5a\x4b\xfe\xab\x61\x66\x42\x7c\x7a\x36\x47\xb7\x47\x29\x2b\x83" + "\x84\x53\x7c\xdb\x89\xaf\xb3\xbf\x56\x65\xe4\xc5\xe7\x09\x35\x0b" + "\x28\x7b\xae\xc9\x21\xfd\x7c\xa0\xee\x7a\x0c\x31\xd0\x22\xa9\x5e" + "\x1f\xc9\x2b\xa9\xd7\x7d\xf8\x83\x96\x02\x75\xbe\xb4\xe6\x20\x24" }, + { GCRY_MAC_HMAC_SHA3_224, + "Test Using Larger Than Block-Size Key - Hash Key First", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xb9\x6d\x73\x0c\x14\x8c\x2d\xaa\xd8\x64\x9d\x83\xde\xfa\xa3\x71" + "\x97\x38\xd3\x47\x75\x39\x7b\x75\x71\xc3\x85\x15" }, + { GCRY_MAC_HMAC_SHA3_256, + "Test Using Larger Than Block-Size Key - Hash Key First", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xa6\x07\x2f\x86\xde\x52\xb3\x8b\xb3\x49\xfe\x84\xcd\x6d\x97\xfb" + "\x6a\x37\xc4\xc0\xf6\x2a\xae\x93\x98\x11\x93\xa7\x22\x9d\x34\x67" }, + { GCRY_MAC_HMAC_SHA3_384, + "Test Using Larger Than Block-Size Key - Hash Key First", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\x71\x3d\xff\x03\x02\xc8\x50\x86\xec\x5a\xd0\x76\x8d\xd6\x5a\x13" + "\xdd\xd7\x90\x68\xd8\xd4\xc6\x21\x2b\x71\x2e\x41\x64\x94\x49\x11" + "\x14\x80\x23\x00\x44\x18\x5a\x99\x10\x3e\xd8\x20\x04\xdd\xbf\xcc" }, + { GCRY_MAC_HMAC_SHA3_512, + "Test Using Larger Than Block-Size Key - Hash Key First", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xb1\x48\x35\xc8\x19\xa2\x90\xef\xb0\x10\xac\xe6\xd8\x56\x8d\xc6" + "\xb8\x4d\xe6\x0b\xc4\x9b\x00\x4c\x3b\x13\xed\xa7\x63\x58\x94\x51" + "\xe5\xdd\x74\x29\x28\x84\xd1\xbd\xce\x64\xe6\xb9\x19\xdd\x61\xdc" + "\x9c\x56\xa2\x82\xa8\x1c\x0b\xd1\x4f\x1f\x36\x5b\x49\xb8\x3a\x5b" }, + { GCRY_MAC_HMAC_SHA3_224, + "This is a test using a larger than block-size key and a larger " + "than block-size data. The key needs to be hashed before being " + "used by the HMAC algorithm.", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xc7\x9c\x9b\x09\x34\x24\xe5\x88\xa9\x87\x8b\xbc\xb0\x89\xe0\x18" + "\x27\x00\x96\xe9\xb4\xb1\xa9\xe8\x22\x0c\x86\x6a" }, + { GCRY_MAC_HMAC_SHA3_256, + "This is a test using a larger than block-size key and a larger " + "than block-size data. The key needs to be hashed before being " + "used by the HMAC algorithm.", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xe6\xa3\x6d\x9b\x91\x5f\x86\xa0\x93\xca\xc7\xd1\x10\xe9\xe0\x4c" + "\xf1\xd6\x10\x0d\x30\x47\x55\x09\xc2\x47\x5f\x57\x1b\x75\x8b\x5a" }, + { GCRY_MAC_HMAC_SHA3_384, + "This is a test using a larger than block-size key and a larger " + "than block-size data. The key needs to be hashed before being " + "used by the HMAC algorithm.", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xca\xd1\x8a\x8f\xf6\xc4\xcc\x3a\xd4\x87\xb9\x5f\x97\x69\xe9\xb6" + "\x1c\x06\x2a\xef\xd6\x95\x25\x69\xe6\xe6\x42\x18\x97\x05\x4c\xfc" + "\x70\xb5\xfd\xc6\x60\x5c\x18\x45\x71\x12\xfc\x6a\xaa\xd4\x55\x85" }, + { GCRY_MAC_HMAC_SHA3_512, + "This is a test using a larger than block-size key and a larger " + "than block-size data. The key needs to be hashed before being " + "used by the HMAC algorithm.", + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa" + "\xaa\xaa\xaa", + "\xdc\x03\x0e\xe7\x88\x70\x34\xf3\x2c\xf4\x02\xdf\x34\x62\x2f\x31" + "\x1f\x3e\x6c\xf0\x48\x60\xc6\xbb\xd7\xfa\x48\x86\x74\x78\x2b\x46" + "\x59\xfd\xbd\xf3\xfd\x87\x78\x52\x88\x5c\xfe\x6e\x22\x18\x5f\xe7" + "\xb2\xee\x95\x20\x43\x62\x9b\xc9\xd5\xf3\x29\x8a\x41\xd0\x2c\x66" }, /* CMAC AES and DES test vectors from http://web.archive.org/web/20130930212819/http://csrc.nist.gov/publica\ tions/nistpubs/800-38B/Updated_CMAC_Examples.pdf */ From wk at gnupg.org Wed Oct 28 20:09:03 2015 From: wk at gnupg.org (Werner Koch) Date: Wed, 28 Oct 2015 20:09:03 +0100 Subject: [PATCH 2/4] md: add variable length output interface In-Reply-To: <20151028175547.25783.11167.stgit@localhost6.localdomain6> (Jussi Kivilinna's message of "Wed, 28 Oct 2015 19:55:47 +0200") References: <20151028175542.25783.44927.stgit@localhost6.localdomain6> <20151028175547.25783.11167.stgit@localhost6.localdomain6> Message-ID: <87si4u6ca8.fsf@vigenere.g10code.de> On Wed, 28 Oct 2015 18:55, jussi.kivilinna at iki.fi said: > Patch adds new interface for reading output from 'expandable-output > function' MD algorithms that can give variable length output (ie. > SHAKE algorithms from FIPS-202). New function to read output is > > gpg_error_t gcry_md_extract(gcry_md_hd_t md, int algo, > void *buffer, size_t length); It is good to see this new API for 1.7. Thanks. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From cvs at cvs.gnupg.org Wed Oct 28 19:11:30 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Wed, 28 Oct 2015 19:11:30 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-271-g74184c2 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 74184c28fbe7ff58cf57f0094ef957d94045da7d (commit) via 909644ef5883927262366c356eed530e55aba478 (commit) via 16fd540f4d01eb6dc23d9509ae549353617c7a67 (commit) via ae40af427fd2a856b24ec2a41323ec8b80ffc9c0 (commit) from f7505b550dd591e33d3a3fab9277c43c460f1bad (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 74184c28fbe7ff58cf57f0094ef957d94045da7d Author: Jussi Kivilinna Date: Fri Oct 23 22:30:48 2015 +0300 keccak: rewrite for improved performance * cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna diff --git a/cipher/Makefile.am b/cipher/Makefile.am index b08c9a9..be03d06 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -90,7 +90,7 @@ sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \ sha256.c sha256-ssse3-amd64.S sha256-avx-amd64.S sha256-avx2-bmi2-amd64.S \ sha512.c sha512-ssse3-amd64.S sha512-avx-amd64.S sha512-avx2-bmi2-amd64.S \ sha512-armv7-neon.S \ -keccak.c \ +keccak.c keccak_permute_32.h keccak_permute_64.h \ stribog.c \ tiger.c \ whirlpool.c whirlpool-sse2-amd64.S \ diff --git a/cipher/hash-common.h b/cipher/hash-common.h index e1ae5a2..27d670d 100644 --- a/cipher/hash-common.h +++ b/cipher/hash-common.h @@ -33,15 +33,9 @@ typedef unsigned int (*_gcry_md_block_write_t) (void *c, const unsigned char *blks, size_t nblks); -#if defined(HAVE_U64_TYPEDEF) && (defined(USE_SHA512) || defined(USE_SHA3) || \ - defined(USE_WHIRLPOOL)) -/* SHA-512, SHA-3 and Whirlpool needs u64. SHA-512 and SHA3 need larger - * buffer. */ -# ifdef USE_SHA3 -# define MD_BLOCK_MAX_BLOCKSIZE (1152 / 8) -# else -# define MD_BLOCK_MAX_BLOCKSIZE 128 -# endif +#if defined(HAVE_U64_TYPEDEF) && (defined(USE_SHA512) || defined(USE_WHIRLPOOL)) +/* SHA-512 and Whirlpool needs u64. SHA-512 needs larger buffer. */ +# define MD_BLOCK_MAX_BLOCKSIZE 128 # define MD_NBLOCKS_TYPE u64 #else # define MD_BLOCK_MAX_BLOCKSIZE 64 diff --git a/cipher/keccak.c b/cipher/keccak.c index 4a9c1f2..3a72294 100644 --- a/cipher/keccak.c +++ b/cipher/keccak.c @@ -27,11 +27,45 @@ #include "hash-common.h" -/* The code is based on public-domain/CC0 "Keccak-readable-and-compact.c" - * implementation by the Keccak, Keyak and Ketje Teams, namely, Guido Bertoni, - * Joan Daemen, Micha?l Peeters, Gilles Van Assche and Ronny Van Keer. From: - * https://github.com/gvanas/KeccakCodePackage - */ + +/* USE_64BIT indicates whether to use 64-bit generic implementation. + * USE_32BIT indicates whether to use 32-bit generic implementation. */ +#undef USE_64BIT +#if defined(__x86_64__) || SIZEOF_UNSIGNED_LONG == 8 +# define USE_64BIT 1 +#else +# define USE_32BIT 1 +#endif + + +/* USE_64BIT_BMI2 indicates whether to compile with 64-bit Intel BMI2 code. */ +#undef USE_64BIT_BMI2 +#if defined(USE_64BIT) && defined(HAVE_GCC_INLINE_ASM_BMI2) +# define USE_64BIT_BMI2 1 +#endif + + +/* USE_64BIT_SHLD indicates whether to compile with 64-bit Intel SHLD code. */ +#undef USE_64BIT_SHLD +#if defined(USE_64BIT) && defined (__GNUC__) && defined(__x86_64__) +# define USE_64BIT_SHLD 1 +#endif + + +/* USE_32BIT_BMI2 indicates whether to compile with 32-bit Intel BMI2 code. */ +#undef USE_32BIT_BMI2 +#if defined(USE_32BIT) && defined(HAVE_GCC_INLINE_ASM_BMI2) +# define USE_32BIT_BMI2 1 +#endif + + +#ifdef USE_64BIT +# define NEED_COMMON64 1 +#endif + +#ifdef USE_32BIT +# define NEED_COMMON32BI 1 +#endif #define SHA3_DELIMITED_SUFFIX 0x06 @@ -40,220 +74,528 @@ typedef struct { - u64 state[5][5]; + union { +#ifdef NEED_COMMON64 + u64 state64[25]; +#endif +#ifdef NEED_COMMON32BI + u32 state32bi[50]; +#endif + } u; } KECCAK_STATE; typedef struct { - gcry_md_block_ctx_t bctx; + unsigned int (*permute)(KECCAK_STATE *hd); + unsigned int (*absorb)(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes); + unsigned int (*extract_inplace) (KECCAK_STATE *hd, unsigned int outlen); +} keccak_ops_t; + + +typedef struct KECCAK_CONTEXT_S +{ KECCAK_STATE state; unsigned int outlen; + unsigned int blocksize; + unsigned int count; + const keccak_ops_t *ops; } KECCAK_CONTEXT; -static inline u64 -rol64 (u64 x, unsigned int n) + +#ifdef NEED_COMMON64 + +static const u64 round_consts_64bit[24] = { - return ((x << n) | (x >> (64 - n))); -} + U64_C(0x0000000000000001), U64_C(0x0000000000008082), + U64_C(0x800000000000808A), U64_C(0x8000000080008000), + U64_C(0x000000000000808B), U64_C(0x0000000080000001), + U64_C(0x8000000080008081), U64_C(0x8000000000008009), + U64_C(0x000000000000008A), U64_C(0x0000000000000088), + U64_C(0x0000000080008009), U64_C(0x000000008000000A), + U64_C(0x000000008000808B), U64_C(0x800000000000008B), + U64_C(0x8000000000008089), U64_C(0x8000000000008003), + U64_C(0x8000000000008002), U64_C(0x8000000000000080), + U64_C(0x000000000000800A), U64_C(0x800000008000000A), + U64_C(0x8000000080008081), U64_C(0x8000000000008080), + U64_C(0x0000000080000001), U64_C(0x8000000080008008) +}; -/* Function that computes the Keccak-f[1600] permutation on the given state. */ -static unsigned int keccak_f1600_state_permute(KECCAK_STATE *hd) +static unsigned int +keccak_extract_inplace64(KECCAK_STATE *hd, unsigned int outlen) { - static const u64 round_consts[24] = - { - U64_C(0x0000000000000001), U64_C(0x0000000000008082), - U64_C(0x800000000000808A), U64_C(0x8000000080008000), - U64_C(0x000000000000808B), U64_C(0x0000000080000001), - U64_C(0x8000000080008081), U64_C(0x8000000000008009), - U64_C(0x000000000000008A), U64_C(0x0000000000000088), - U64_C(0x0000000080008009), U64_C(0x000000008000000A), - U64_C(0x000000008000808B), U64_C(0x800000000000008B), - U64_C(0x8000000000008089), U64_C(0x8000000000008003), - U64_C(0x8000000000008002), U64_C(0x8000000000000080), - U64_C(0x000000000000800A), U64_C(0x800000008000000A), - U64_C(0x8000000080008081), U64_C(0x8000000000008080), - U64_C(0x0000000080000001), U64_C(0x8000000080008008) - }; - unsigned int round; + unsigned int i; - for (round = 0; round < 24; round++) + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) { - { - /* ? step (see [Keccak Reference, Section 2.3.2]) === */ - u64 C[5], D[5]; - - /* Compute the parity of the columns */ - C[0] = hd->state[0][0] ^ hd->state[1][0] ^ hd->state[2][0] - ^ hd->state[3][0] ^ hd->state[4][0]; - C[1] = hd->state[0][1] ^ hd->state[1][1] ^ hd->state[2][1] - ^ hd->state[3][1] ^ hd->state[4][1]; - C[2] = hd->state[0][2] ^ hd->state[1][2] ^ hd->state[2][2] - ^ hd->state[3][2] ^ hd->state[4][2]; - C[3] = hd->state[0][3] ^ hd->state[1][3] ^ hd->state[2][3] - ^ hd->state[3][3] ^ hd->state[4][3]; - C[4] = hd->state[0][4] ^ hd->state[1][4] ^ hd->state[2][4] - ^ hd->state[3][4] ^ hd->state[4][4]; - - /* Compute the ? effect for a given column */ - D[0] = C[4] ^ rol64(C[1], 1); - D[1] = C[0] ^ rol64(C[2], 1); - D[2] = C[1] ^ rol64(C[3], 1); - D[3] = C[2] ^ rol64(C[4], 1); - D[4] = C[3] ^ rol64(C[0], 1); - - /* Add the ? effect to the whole column */ - hd->state[0][0] ^= D[0]; - hd->state[1][0] ^= D[0]; - hd->state[2][0] ^= D[0]; - hd->state[3][0] ^= D[0]; - hd->state[4][0] ^= D[0]; - - /* Add the ? effect to the whole column */ - hd->state[0][1] ^= D[1]; - hd->state[1][1] ^= D[1]; - hd->state[2][1] ^= D[1]; - hd->state[3][1] ^= D[1]; - hd->state[4][1] ^= D[1]; - - /* Add the ? effect to the whole column */ - hd->state[0][2] ^= D[2]; - hd->state[1][2] ^= D[2]; - hd->state[2][2] ^= D[2]; - hd->state[3][2] ^= D[2]; - hd->state[4][2] ^= D[2]; - - /* Add the ? effect to the whole column */ - hd->state[0][3] ^= D[3]; - hd->state[1][3] ^= D[3]; - hd->state[2][3] ^= D[3]; - hd->state[3][3] ^= D[3]; - hd->state[4][3] ^= D[3]; - - /* Add the ? effect to the whole column */ - hd->state[0][4] ^= D[4]; - hd->state[1][4] ^= D[4]; - hd->state[2][4] ^= D[4]; - hd->state[3][4] ^= D[4]; - hd->state[4][4] ^= D[4]; - } - - { - /* ? and ? steps (see [Keccak Reference, Sections 2.3.3 and 2.3.4]) */ - u64 current, temp; - -#define do_swap_n_rol(x, y, r) \ - temp = hd->state[y][x]; \ - hd->state[y][x] = rol64(current, r); \ - current = temp; - - /* Start at coordinates (1 0) */ - current = hd->state[0][1]; - - /* Iterate over ((0 1)(2 3))^t * (1 0) for 0 ? t ? 23 */ - do_swap_n_rol(0, 2, 1); - do_swap_n_rol(2, 1, 3); - do_swap_n_rol(1, 2, 6); - do_swap_n_rol(2, 3, 10); - do_swap_n_rol(3, 3, 15); - do_swap_n_rol(3, 0, 21); - do_swap_n_rol(0, 1, 28); - do_swap_n_rol(1, 3, 36); - do_swap_n_rol(3, 1, 45); - do_swap_n_rol(1, 4, 55); - do_swap_n_rol(4, 4, 2); - do_swap_n_rol(4, 0, 14); - do_swap_n_rol(0, 3, 27); - do_swap_n_rol(3, 4, 41); - do_swap_n_rol(4, 3, 56); - do_swap_n_rol(3, 2, 8); - do_swap_n_rol(2, 2, 25); - do_swap_n_rol(2, 0, 43); - do_swap_n_rol(0, 4, 62); - do_swap_n_rol(4, 2, 18); - do_swap_n_rol(2, 4, 39); - do_swap_n_rol(4, 1, 61); - do_swap_n_rol(1, 1, 20); - do_swap_n_rol(1, 0, 44); - -#undef do_swap_n_rol - } - - { - /* ? step (see [Keccak Reference, Section 2.3.1]) */ - u64 temp[5]; - -#define do_x_step_for_plane(y) \ - /* Take a copy of the plane */ \ - temp[0] = hd->state[y][0]; \ - temp[1] = hd->state[y][1]; \ - temp[2] = hd->state[y][2]; \ - temp[3] = hd->state[y][3]; \ - temp[4] = hd->state[y][4]; \ - \ - /* Compute ? on the plane */ \ - hd->state[y][0] = temp[0] ^ ((~temp[1]) & temp[2]); \ - hd->state[y][1] = temp[1] ^ ((~temp[2]) & temp[3]); \ - hd->state[y][2] = temp[2] ^ ((~temp[3]) & temp[4]); \ - hd->state[y][3] = temp[3] ^ ((~temp[4]) & temp[0]); \ - hd->state[y][4] = temp[4] ^ ((~temp[0]) & temp[1]); - - do_x_step_for_plane(0); - do_x_step_for_plane(1); - do_x_step_for_plane(2); - do_x_step_for_plane(3); - do_x_step_for_plane(4); - -#undef do_x_step_for_plane - } - - { - /* ? step (see [Keccak Reference, Section 2.3.5]) */ - - hd->state[0][0] ^= round_consts[round]; - } + hd->u.state64[i] = le_bswap64(hd->u.state64[i]); } - return sizeof(void *) * 4 + sizeof(u64) * 10; + return 0; } +#endif /* NEED_COMMON64 */ + + +#ifdef NEED_COMMON32BI + +static const u32 round_consts_32bit[2 * 24] = +{ + 0x00000001UL, 0x00000000UL, 0x00000000UL, 0x00000089UL, + 0x00000000UL, 0x8000008bUL, 0x00000000UL, 0x80008080UL, + 0x00000001UL, 0x0000008bUL, 0x00000001UL, 0x00008000UL, + 0x00000001UL, 0x80008088UL, 0x00000001UL, 0x80000082UL, + 0x00000000UL, 0x0000000bUL, 0x00000000UL, 0x0000000aUL, + 0x00000001UL, 0x00008082UL, 0x00000000UL, 0x00008003UL, + 0x00000001UL, 0x0000808bUL, 0x00000001UL, 0x8000000bUL, + 0x00000001UL, 0x8000008aUL, 0x00000001UL, 0x80000081UL, + 0x00000000UL, 0x80000081UL, 0x00000000UL, 0x80000008UL, + 0x00000000UL, 0x00000083UL, 0x00000000UL, 0x80008003UL, + 0x00000001UL, 0x80008088UL, 0x00000000UL, 0x80000088UL, + 0x00000001UL, 0x00008000UL, 0x00000000UL, 0x80008082UL +}; static unsigned int -transform_blk (void *context, const unsigned char *data) +keccak_extract_inplace32bi(KECCAK_STATE *hd, unsigned int outlen) { - KECCAK_CONTEXT *ctx = context; - KECCAK_STATE *hd = &ctx->state; - u64 *state = (u64 *)hd->state; - const size_t bsize = ctx->bctx.blocksize; unsigned int i; + u32 x0; + u32 x1; + u32 t; + + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + { + x0 = hd->u.state32bi[i * 2 + 0]; + x1 = hd->u.state32bi[i * 2 + 1]; + + t = (x0 & 0x0000FFFFUL) + (x1 << 16); + x1 = (x0 >> 16) + (x1 & 0xFFFF0000UL); + x0 = t; + t = (x0 ^ (x0 >> 8)) & 0x0000FF00UL; x0 = x0 ^ t ^ (t << 8); + t = (x0 ^ (x0 >> 4)) & 0x00F000F0UL; x0 = x0 ^ t ^ (t << 4); + t = (x0 ^ (x0 >> 2)) & 0x0C0C0C0CUL; x0 = x0 ^ t ^ (t << 2); + t = (x0 ^ (x0 >> 1)) & 0x22222222UL; x0 = x0 ^ t ^ (t << 1); + t = (x1 ^ (x1 >> 8)) & 0x0000FF00UL; x1 = x1 ^ t ^ (t << 8); + t = (x1 ^ (x1 >> 4)) & 0x00F000F0UL; x1 = x1 ^ t ^ (t << 4); + t = (x1 ^ (x1 >> 2)) & 0x0C0C0C0CUL; x1 = x1 ^ t ^ (t << 2); + t = (x1 ^ (x1 >> 1)) & 0x22222222UL; x1 = x1 ^ t ^ (t << 1); + + hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); + hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + } - /* Absorb input block. */ - for (i = 0; i < bsize / 8; i++) - state[i] ^= buf_get_le64(data + i * 8); + return 0; +} - return keccak_f1600_state_permute(hd) + 4 * sizeof(void *); +static inline void +keccak_absorb_lane32bi(u32 *lane, u32 x0, u32 x1) +{ + u32 t; + + t = (x0 ^ (x0 >> 1)) & 0x22222222UL; x0 = x0 ^ t ^ (t << 1); + t = (x0 ^ (x0 >> 2)) & 0x0C0C0C0CUL; x0 = x0 ^ t ^ (t << 2); + t = (x0 ^ (x0 >> 4)) & 0x00F000F0UL; x0 = x0 ^ t ^ (t << 4); + t = (x0 ^ (x0 >> 8)) & 0x0000FF00UL; x0 = x0 ^ t ^ (t << 8); + t = (x1 ^ (x1 >> 1)) & 0x22222222UL; x1 = x1 ^ t ^ (t << 1); + t = (x1 ^ (x1 >> 2)) & 0x0C0C0C0CUL; x1 = x1 ^ t ^ (t << 2); + t = (x1 ^ (x1 >> 4)) & 0x00F000F0UL; x1 = x1 ^ t ^ (t << 4); + t = (x1 ^ (x1 >> 8)) & 0x0000FF00UL; x1 = x1 ^ t ^ (t << 8); + lane[0] ^= (x0 & 0x0000FFFFUL) + (x1 << 16); + lane[1] ^= (x0 >> 16) + (x1 & 0xFFFF0000UL); } +#endif /* NEED_COMMON32BI */ + + +/* Construct generic 64-bit implementation. */ +#ifdef USE_64BIT + +# define ANDN64(x, y) (~(x) & (y)) +# define ROL64(x, n) (((x) << ((unsigned int)n & 63)) | \ + ((x) >> ((64 - (unsigned int)(n)) & 63))) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64 +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME static unsigned int -transform (void *context, const unsigned char *data, size_t nblks) +keccak_absorb_lanes64(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) { - KECCAK_CONTEXT *ctx = context; - const size_t bsize = ctx->bctx.blocksize; - unsigned int burn; + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_generic64_ops = +{ + .permute = keccak_f1600_state_permute64, + .absorb = keccak_absorb_lanes64, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT */ + + +/* Construct 64-bit Intel SHLD implementation. */ +#ifdef USE_64BIT_SHLD + +# define ANDN64(x, y) (~(x) & (y)) +# define ROL64(x, n) ({ \ + u64 tmp = (x); \ + asm ("shldq %1, %0, %0" \ + : "+r" (tmp) \ + : "J" ((n) & 63) \ + : "cc"); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64_shld +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes64_shld(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64_shld(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_shld_64_ops = +{ + .permute = keccak_f1600_state_permute64_shld, + .absorb = keccak_absorb_lanes64_shld, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT_SHLD */ + + +/* Construct 64-bit Intel BMI2 implementation. */ +#ifdef USE_64BIT_BMI2 + +# define ANDN64(x, y) ({ \ + u64 tmp; \ + asm ("andnq %2, %1, %0" \ + : "=r" (tmp) \ + : "r0" (x), "rm" (y)); \ + tmp; }) + +# define ROL64(x, n) ({ \ + u64 tmp; \ + asm ("rorxq %2, %1, %0" \ + : "=r" (tmp) \ + : "rm0" (x), "J" (64 - ((n) & 63))); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute64_bmi2 +# include "keccak_permute_64.h" + +# undef ANDN64 +# undef ROL64 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes64_bmi2(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + hd->u.state64[pos] ^= buf_get_le64(lanes); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute64_bmi2(hd); + pos = 0; + } + } + + return burn; +} + +static const keccak_ops_t keccak_bmi2_64_ops = +{ + .permute = keccak_f1600_state_permute64_bmi2, + .absorb = keccak_absorb_lanes64_bmi2, + .extract_inplace = keccak_extract_inplace64, +}; + +#endif /* USE_64BIT_BMI2 */ + + +/* Construct generic 32-bit implementation. */ +#ifdef USE_32BIT + +# define ANDN32(x, y) (~(x) & (y)) +# define ROL32(x, n) (((x) << ((unsigned int)n & 31)) | \ + ((x) >> ((32 - (unsigned int)(n)) & 31))) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute32bi +# include "keccak_permute_32.h" + +# undef ANDN32 +# undef ROL32 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static unsigned int +keccak_absorb_lanes32bi(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; - /* Absorb full blocks. */ - do + while (nlanes) { - burn = transform_blk (context, data); - data += bsize; + keccak_absorb_lane32bi(&hd->u.state32bi[pos * 2], + buf_get_le32(lanes + 0), + buf_get_le32(lanes + 4)); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute32bi(hd); + pos = 0; + } } - while (--nblks); return burn; } +static const keccak_ops_t keccak_generic32bi_ops = +{ + .permute = keccak_f1600_state_permute32bi, + .absorb = keccak_absorb_lanes32bi, + .extract_inplace = keccak_extract_inplace32bi, +}; + +#endif /* USE_32BIT */ + + +/* Construct 32-bit Intel BMI2 implementation. */ +#ifdef USE_32BIT_BMI2 + +# define ANDN32(x, y) ({ \ + u32 tmp; \ + asm ("andnl %2, %1, %0" \ + : "=r" (tmp) \ + : "r0" (x), "rm" (y)); \ + tmp; }) + +# define ROL32(x, n) ({ \ + u32 tmp; \ + asm ("rorxl %2, %1, %0" \ + : "=r" (tmp) \ + : "rm0" (x), "J" (32 - ((n) & 31))); \ + tmp; }) + +# define KECCAK_F1600_PERMUTE_FUNC_NAME keccak_f1600_state_permute32bi_bmi2 +# include "keccak_permute_32.h" + +# undef ANDN32 +# undef ROL32 +# undef KECCAK_F1600_PERMUTE_FUNC_NAME + +static inline u32 pext(u32 x, u32 mask) +{ + u32 tmp; + asm ("pextl %2, %1, %0" : "=r" (tmp) : "r0" (x), "rm" (mask)); + return tmp; +} + +static inline u32 pdep(u32 x, u32 mask) +{ + u32 tmp; + asm ("pdepl %2, %1, %0" : "=r" (tmp) : "r0" (x), "rm" (mask)); + return tmp; +} + +static inline void +keccak_absorb_lane32bi_bmi2(u32 *lane, u32 x0, u32 x1) +{ + x0 = pdep(pext(x0, 0x55555555), 0x0000ffff) | (pext(x0, 0xaaaaaaaa) << 16); + x1 = pdep(pext(x1, 0x55555555), 0x0000ffff) | (pext(x1, 0xaaaaaaaa) << 16); + + lane[0] ^= (x0 & 0x0000FFFFUL) + (x1 << 16); + lane[1] ^= (x0 >> 16) + (x1 & 0xFFFF0000UL); +} + +static unsigned int +keccak_absorb_lanes32bi_bmi2(KECCAK_STATE *hd, int pos, const byte *lanes, + unsigned int nlanes, int blocklanes) +{ + unsigned int burn = 0; + + while (nlanes) + { + keccak_absorb_lane32bi_bmi2(&hd->u.state32bi[pos * 2], + buf_get_le32(lanes + 0), + buf_get_le32(lanes + 4)); + lanes += 8; + nlanes--; + + if (++pos == blocklanes) + { + burn = keccak_f1600_state_permute32bi_bmi2(hd); + pos = 0; + } + } + + return burn; +} + +static unsigned int +keccak_extract_inplace32bi_bmi2(KECCAK_STATE *hd, unsigned int outlen) +{ + unsigned int i; + u32 x0; + u32 x1; + u32 t; + + for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + { + x0 = hd->u.state32bi[i * 2 + 0]; + x1 = hd->u.state32bi[i * 2 + 1]; + + t = (x0 & 0x0000FFFFUL) + (x1 << 16); + x1 = (x0 >> 16) + (x1 & 0xFFFF0000UL); + x0 = t; + + x0 = pdep(pext(x0, 0xffff0001), 0xaaaaaaab) | pdep(x0 >> 1, 0x55555554); + x1 = pdep(pext(x1, 0xffff0001), 0xaaaaaaab) | pdep(x1 >> 1, 0x55555554); + + hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); + hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + } + + return 0; +} + +static const keccak_ops_t keccak_bmi2_32bi_ops = +{ + .permute = keccak_f1600_state_permute32bi_bmi2, + .absorb = keccak_absorb_lanes32bi_bmi2, + .extract_inplace = keccak_extract_inplace32bi_bmi2, +}; + +#endif /* USE_32BIT */ + + +static void +keccak_write (void *context, const void *inbuf_arg, size_t inlen) +{ + KECCAK_CONTEXT *ctx = context; + const size_t bsize = ctx->blocksize; + const size_t blocklanes = bsize / 8; + const byte *inbuf = inbuf_arg; + unsigned int nburn, burn = 0; + unsigned int count, i; + unsigned int pos, nlanes; + + count = ctx->count; + + if (inlen && (count % 8)) + { + byte lane[8] = { 0, }; + + /* Complete absorbing partial input lane. */ + + pos = count / 8; + + for (i = count % 8; inlen && i < 8; i++) + { + lane[i] = *inbuf++; + inlen--; + count++; + } + + if (count == bsize) + count = 0; + + nburn = ctx->ops->absorb(&ctx->state, pos, lane, 1, + (count % 8) ? -1 : blocklanes); + burn = nburn > burn ? nburn : burn; + } + + /* Absorb full input lanes. */ + + pos = count / 8; + nlanes = inlen / 8; + if (nlanes > 0) + { + nburn = ctx->ops->absorb(&ctx->state, pos, inbuf, nlanes, blocklanes); + burn = nburn > burn ? nburn : burn; + inlen -= nlanes * 8; + inbuf += nlanes * 8; + count += nlanes * 8; + count = count % bsize; + } + + if (inlen) + { + byte lane[8] = { 0, }; + + /* Absorb remaining partial input lane. */ + + pos = count / 8; + + for (i = count % 8; inlen && i < 8; i++) + { + lane[i] = *inbuf++; + inlen--; + count++; + } + + nburn = ctx->ops->absorb(&ctx->state, pos, lane, 1, -1); + burn = nburn > burn ? nburn : burn; + + gcry_assert(count < bsize); + } + + ctx->count = count; + + if (burn) + _gcry_burn_stack (burn); +} + static void keccak_init (int algo, void *context, unsigned int flags) @@ -267,29 +609,48 @@ keccak_init (int algo, void *context, unsigned int flags) memset (hd, 0, sizeof *hd); - ctx->bctx.nblocks = 0; - ctx->bctx.nblocks_high = 0; - ctx->bctx.count = 0; - ctx->bctx.bwrite = transform; + ctx->count = 0; + + /* Select generic implementation. */ +#ifdef USE_64BIT + ctx->ops = &keccak_generic64_ops; +#elif defined USE_32BIT + ctx->ops = &keccak_generic32bi_ops; +#endif + + /* Select optimized implementation based in hw features. */ + if (0) {} +#ifdef USE_64BIT_BMI2 + else if (features & HWF_INTEL_BMI2) + ctx->ops = &keccak_bmi2_64_ops; +#endif +#ifdef USE_32BIT_BMI2 + else if (features & HWF_INTEL_BMI2) + ctx->ops = &keccak_bmi2_32bi_ops; +#endif +#ifdef USE_64BIT_SHLD + else if (features & HWF_INTEL_FAST_SHLD) + ctx->ops = &keccak_shld_64_ops; +#endif /* Set input block size, in Keccak terms this is called 'rate'. */ switch (algo) { case GCRY_MD_SHA3_224: - ctx->bctx.blocksize = 1152 / 8; + ctx->blocksize = 1152 / 8; ctx->outlen = 224 / 8; break; case GCRY_MD_SHA3_256: - ctx->bctx.blocksize = 1088 / 8; + ctx->blocksize = 1088 / 8; ctx->outlen = 256 / 8; break; case GCRY_MD_SHA3_384: - ctx->bctx.blocksize = 832 / 8; + ctx->blocksize = 832 / 8; ctx->outlen = 384 / 8; break; case GCRY_MD_SHA3_512: - ctx->bctx.blocksize = 576 / 8; + ctx->blocksize = 576 / 8; ctx->outlen = 512 / 8; break; default: @@ -334,59 +695,37 @@ keccak_final (void *context) { KECCAK_CONTEXT *ctx = context; KECCAK_STATE *hd = &ctx->state; - const size_t bsize = ctx->bctx.blocksize; + const size_t bsize = ctx->blocksize; const byte suffix = SHA3_DELIMITED_SUFFIX; - u64 *state = (u64 *)hd->state; - unsigned int stack_burn_depth; + unsigned int nburn, burn = 0; unsigned int lastbytes; - unsigned int i; - byte *buf; + byte lane[8]; - _gcry_md_block_write (context, NULL, 0); /* flush */ - - buf = ctx->bctx.buf; - lastbytes = ctx->bctx.count; - - /* Absorb remaining bytes. */ - for (i = 0; i < lastbytes / 8; i++) - { - state[i] ^= buf_get_le64(buf); - buf += 8; - } - - for (i = 0; i < lastbytes % 8; i++) - { - state[lastbytes / 8] ^= (u64)*buf << (i * 8); - buf++; - } + lastbytes = ctx->count; /* Do the padding and switch to the squeezing phase */ /* Absorb the last few bits and add the first bit of padding (which coincides with the delimiter in delimited suffix) */ - state[lastbytes / 8] ^= (u64)suffix << ((lastbytes % 8) * 8); + buf_put_le64(lane, (u64)suffix << ((lastbytes % 8) * 8)); + nburn = ctx->ops->absorb(&ctx->state, lastbytes / 8, lane, 1, -1); + burn = nburn > burn ? nburn : burn; /* Add the second bit of padding. */ - state[(bsize - 1) / 8] ^= (u64)0x80 << (((bsize - 1) % 8) * 8); + buf_put_le64(lane, (u64)0x80 << (((bsize - 1) % 8) * 8)); + nburn = ctx->ops->absorb(&ctx->state, (bsize - 1) / 8, lane, 1, -1); + burn = nburn > burn ? nburn : burn; /* Switch to the squeezing phase. */ - stack_burn_depth = keccak_f1600_state_permute(hd); + nburn = ctx->ops->permute(hd); + burn = nburn > burn ? nburn : burn; /* Squeeze out all the output blocks */ if (ctx->outlen < bsize) { /* Output SHA3 digest. */ - buf = ctx->bctx.buf; - for (i = 0; i < ctx->outlen / 8; i++) - { - buf_put_le64(buf, state[i]); - buf += 8; - } - for (i = 0; i < ctx->outlen % 8; i++) - { - *buf = state[ctx->outlen / 8] >> (i * 8); - buf++; - } + nburn = ctx->ops->extract_inplace(hd, ctx->outlen); + burn = nburn > burn ? nburn : burn; } else { @@ -394,15 +733,18 @@ keccak_final (void *context) BUG(); } - _gcry_burn_stack (stack_burn_depth); + wipememory(lane, sizeof(lane)); + if (burn) + _gcry_burn_stack (burn); } static byte * keccak_read (void *context) { - KECCAK_CONTEXT *hd = (KECCAK_CONTEXT *) context; - return hd->bctx.buf; + KECCAK_CONTEXT *ctx = (KECCAK_CONTEXT *) context; + KECCAK_STATE *hd = &ctx->state; + return (byte *)&hd->u; } @@ -585,7 +927,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_224 = { GCRY_MD_SHA3_224, {0, 1}, "SHA3-224", sha3_224_asn, DIM (sha3_224_asn), oid_spec_sha3_224, 28, - sha3_224_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_224_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -593,7 +935,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_256 = { GCRY_MD_SHA3_256, {0, 1}, "SHA3-256", sha3_256_asn, DIM (sha3_256_asn), oid_spec_sha3_256, 32, - sha3_256_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_256_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -601,7 +943,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_384 = { GCRY_MD_SHA3_384, {0, 1}, "SHA3-384", sha3_384_asn, DIM (sha3_384_asn), oid_spec_sha3_384, 48, - sha3_384_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_384_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; @@ -609,7 +951,7 @@ gcry_md_spec_t _gcry_digest_spec_sha3_512 = { GCRY_MD_SHA3_512, {0, 1}, "SHA3-512", sha3_512_asn, DIM (sha3_512_asn), oid_spec_sha3_512, 64, - sha3_512_init, _gcry_md_block_write, keccak_final, keccak_read, + sha3_512_init, keccak_write, keccak_final, keccak_read, sizeof (KECCAK_CONTEXT), run_selftests }; diff --git a/cipher/keccak_permute_32.h b/cipher/keccak_permute_32.h new file mode 100644 index 0000000..fed9383 --- /dev/null +++ b/cipher/keccak_permute_32.h @@ -0,0 +1,535 @@ +/* keccak_permute_32.h - Keccak permute function (simple 32bit bit-interleaved) + * Copyright (C) 2015 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser general Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +/* The code is based on public-domain/CC0 "keccakc1024/simple32bi/ + * Keccak-simple32BI.c" implementation by Ronny Van Keer from SUPERCOP toolkit + * package. + */ + +/* Function that computes the Keccak-f[1600] permutation on the given state. */ +static unsigned int +KECCAK_F1600_PERMUTE_FUNC_NAME(KECCAK_STATE *hd) +{ + const u32 *round_consts = round_consts_32bit; + u32 Aba0, Abe0, Abi0, Abo0, Abu0; + u32 Aba1, Abe1, Abi1, Abo1, Abu1; + u32 Aga0, Age0, Agi0, Ago0, Agu0; + u32 Aga1, Age1, Agi1, Ago1, Agu1; + u32 Aka0, Ake0, Aki0, Ako0, Aku0; + u32 Aka1, Ake1, Aki1, Ako1, Aku1; + u32 Ama0, Ame0, Ami0, Amo0, Amu0; + u32 Ama1, Ame1, Ami1, Amo1, Amu1; + u32 Asa0, Ase0, Asi0, Aso0, Asu0; + u32 Asa1, Ase1, Asi1, Aso1, Asu1; + u32 BCa0, BCe0, BCi0, BCo0, BCu0; + u32 BCa1, BCe1, BCi1, BCo1, BCu1; + u32 Da0, De0, Di0, Do0, Du0; + u32 Da1, De1, Di1, Do1, Du1; + u32 Eba0, Ebe0, Ebi0, Ebo0, Ebu0; + u32 Eba1, Ebe1, Ebi1, Ebo1, Ebu1; + u32 Ega0, Ege0, Egi0, Ego0, Egu0; + u32 Ega1, Ege1, Egi1, Ego1, Egu1; + u32 Eka0, Eke0, Eki0, Eko0, Eku0; + u32 Eka1, Eke1, Eki1, Eko1, Eku1; + u32 Ema0, Eme0, Emi0, Emo0, Emu0; + u32 Ema1, Eme1, Emi1, Emo1, Emu1; + u32 Esa0, Ese0, Esi0, Eso0, Esu0; + u32 Esa1, Ese1, Esi1, Eso1, Esu1; + u32 *state = hd->u.state32bi; + unsigned int round; + + Aba0 = state[0]; + Aba1 = state[1]; + Abe0 = state[2]; + Abe1 = state[3]; + Abi0 = state[4]; + Abi1 = state[5]; + Abo0 = state[6]; + Abo1 = state[7]; + Abu0 = state[8]; + Abu1 = state[9]; + Aga0 = state[10]; + Aga1 = state[11]; + Age0 = state[12]; + Age1 = state[13]; + Agi0 = state[14]; + Agi1 = state[15]; + Ago0 = state[16]; + Ago1 = state[17]; + Agu0 = state[18]; + Agu1 = state[19]; + Aka0 = state[20]; + Aka1 = state[21]; + Ake0 = state[22]; + Ake1 = state[23]; + Aki0 = state[24]; + Aki1 = state[25]; + Ako0 = state[26]; + Ako1 = state[27]; + Aku0 = state[28]; + Aku1 = state[29]; + Ama0 = state[30]; + Ama1 = state[31]; + Ame0 = state[32]; + Ame1 = state[33]; + Ami0 = state[34]; + Ami1 = state[35]; + Amo0 = state[36]; + Amo1 = state[37]; + Amu0 = state[38]; + Amu1 = state[39]; + Asa0 = state[40]; + Asa1 = state[41]; + Ase0 = state[42]; + Ase1 = state[43]; + Asi0 = state[44]; + Asi1 = state[45]; + Aso0 = state[46]; + Aso1 = state[47]; + Asu0 = state[48]; + Asu1 = state[49]; + + for (round = 0; round < 24; round += 2) + { + /* prepareTheta */ + BCa0 = Aba0 ^ Aga0 ^ Aka0 ^ Ama0 ^ Asa0; + BCa1 = Aba1 ^ Aga1 ^ Aka1 ^ Ama1 ^ Asa1; + BCe0 = Abe0 ^ Age0 ^ Ake0 ^ Ame0 ^ Ase0; + BCe1 = Abe1 ^ Age1 ^ Ake1 ^ Ame1 ^ Ase1; + BCi0 = Abi0 ^ Agi0 ^ Aki0 ^ Ami0 ^ Asi0; + BCi1 = Abi1 ^ Agi1 ^ Aki1 ^ Ami1 ^ Asi1; + BCo0 = Abo0 ^ Ago0 ^ Ako0 ^ Amo0 ^ Aso0; + BCo1 = Abo1 ^ Ago1 ^ Ako1 ^ Amo1 ^ Aso1; + BCu0 = Abu0 ^ Agu0 ^ Aku0 ^ Amu0 ^ Asu0; + BCu1 = Abu1 ^ Agu1 ^ Aku1 ^ Amu1 ^ Asu1; + + /* thetaRhoPiChiIota(round , A, E) */ + Da0 = BCu0 ^ ROL32(BCe1, 1); + Da1 = BCu1 ^ BCe0; + De0 = BCa0 ^ ROL32(BCi1, 1); + De1 = BCa1 ^ BCi0; + Di0 = BCe0 ^ ROL32(BCo1, 1); + Di1 = BCe1 ^ BCo0; + Do0 = BCi0 ^ ROL32(BCu1, 1); + Do1 = BCi1 ^ BCu0; + Du0 = BCo0 ^ ROL32(BCa1, 1); + Du1 = BCo1 ^ BCa0; + + Aba0 ^= Da0; + BCa0 = Aba0; + Age0 ^= De0; + BCe0 = ROL32(Age0, 22); + Aki1 ^= Di1; + BCi0 = ROL32(Aki1, 22); + Amo1 ^= Do1; + BCo0 = ROL32(Amo1, 11); + Asu0 ^= Du0; + BCu0 = ROL32(Asu0, 7); + Eba0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eba0 ^= round_consts[round * 2 + 0]; + Ebe0 = BCe0 ^ ANDN32(BCi0, BCo0); + Ebi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ebo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Ebu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Aba1 ^= Da1; + BCa1 = Aba1; + Age1 ^= De1; + BCe1 = ROL32(Age1, 22); + Aki0 ^= Di0; + BCi1 = ROL32(Aki0, 21); + Amo0 ^= Do0; + BCo1 = ROL32(Amo0, 10); + Asu1 ^= Du1; + BCu1 = ROL32(Asu1, 7); + Eba1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eba1 ^= round_consts[round * 2 + 1]; + Ebe1 = BCe1 ^ ANDN32(BCi1, BCo1); + Ebi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ebo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Ebu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abo0 ^= Do0; + BCa0 = ROL32(Abo0, 14); + Agu0 ^= Du0; + BCe0 = ROL32(Agu0, 10); + Aka1 ^= Da1; + BCi0 = ROL32(Aka1, 2); + Ame1 ^= De1; + BCo0 = ROL32(Ame1, 23); + Asi1 ^= Di1; + BCu0 = ROL32(Asi1, 31); + Ega0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ege0 = BCe0 ^ ANDN32(BCi0, BCo0); + Egi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ego0 = BCo0 ^ ANDN32(BCu0, BCa0); + Egu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abo1 ^= Do1; + BCa1 = ROL32(Abo1, 14); + Agu1 ^= Du1; + BCe1 = ROL32(Agu1, 10); + Aka0 ^= Da0; + BCi1 = ROL32(Aka0, 1); + Ame0 ^= De0; + BCo1 = ROL32(Ame0, 22); + Asi0 ^= Di0; + BCu1 = ROL32(Asi0, 30); + Ega1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ege1 = BCe1 ^ ANDN32(BCi1, BCo1); + Egi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ego1 = BCo1 ^ ANDN32(BCu1, BCa1); + Egu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abe1 ^= De1; + BCa0 = ROL32(Abe1, 1); + Agi0 ^= Di0; + BCe0 = ROL32(Agi0, 3); + Ako1 ^= Do1; + BCi0 = ROL32(Ako1, 13); + Amu0 ^= Du0; + BCo0 = ROL32(Amu0, 4); + Asa0 ^= Da0; + BCu0 = ROL32(Asa0, 9); + Eka0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eke0 = BCe0 ^ ANDN32(BCi0, BCo0); + Eki0 = BCi0 ^ ANDN32(BCo0, BCu0); + Eko0 = BCo0 ^ ANDN32(BCu0, BCa0); + Eku0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abe0 ^= De0; + BCa1 = Abe0; + Agi1 ^= Di1; + BCe1 = ROL32(Agi1, 3); + Ako0 ^= Do0; + BCi1 = ROL32(Ako0, 12); + Amu1 ^= Du1; + BCo1 = ROL32(Amu1, 4); + Asa1 ^= Da1; + BCu1 = ROL32(Asa1, 9); + Eka1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eke1 = BCe1 ^ ANDN32(BCi1, BCo1); + Eki1 = BCi1 ^ ANDN32(BCo1, BCu1); + Eko1 = BCo1 ^ ANDN32(BCu1, BCa1); + Eku1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abu1 ^= Du1; + BCa0 = ROL32(Abu1, 14); + Aga0 ^= Da0; + BCe0 = ROL32(Aga0, 18); + Ake0 ^= De0; + BCi0 = ROL32(Ake0, 5); + Ami1 ^= Di1; + BCo0 = ROL32(Ami1, 8); + Aso0 ^= Do0; + BCu0 = ROL32(Aso0, 28); + Ema0 = BCa0 ^ ANDN32(BCe0, BCi0); + Eme0 = BCe0 ^ ANDN32(BCi0, BCo0); + Emi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Emo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Emu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abu0 ^= Du0; + BCa1 = ROL32(Abu0, 13); + Aga1 ^= Da1; + BCe1 = ROL32(Aga1, 18); + Ake1 ^= De1; + BCi1 = ROL32(Ake1, 5); + Ami0 ^= Di0; + BCo1 = ROL32(Ami0, 7); + Aso1 ^= Do1; + BCu1 = ROL32(Aso1, 28); + Ema1 = BCa1 ^ ANDN32(BCe1, BCi1); + Eme1 = BCe1 ^ ANDN32(BCi1, BCo1); + Emi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Emo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Emu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Abi0 ^= Di0; + BCa0 = ROL32(Abi0, 31); + Ago1 ^= Do1; + BCe0 = ROL32(Ago1, 28); + Aku1 ^= Du1; + BCi0 = ROL32(Aku1, 20); + Ama1 ^= Da1; + BCo0 = ROL32(Ama1, 21); + Ase0 ^= De0; + BCu0 = ROL32(Ase0, 1); + Esa0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ese0 = BCe0 ^ ANDN32(BCi0, BCo0); + Esi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Eso0 = BCo0 ^ ANDN32(BCu0, BCa0); + Esu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Abi1 ^= Di1; + BCa1 = ROL32(Abi1, 31); + Ago0 ^= Do0; + BCe1 = ROL32(Ago0, 27); + Aku0 ^= Du0; + BCi1 = ROL32(Aku0, 19); + Ama0 ^= Da0; + BCo1 = ROL32(Ama0, 20); + Ase1 ^= De1; + BCu1 = ROL32(Ase1, 1); + Esa1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ese1 = BCe1 ^ ANDN32(BCi1, BCo1); + Esi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Eso1 = BCo1 ^ ANDN32(BCu1, BCa1); + Esu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + /* prepareTheta */ + BCa0 = Eba0 ^ Ega0 ^ Eka0 ^ Ema0 ^ Esa0; + BCa1 = Eba1 ^ Ega1 ^ Eka1 ^ Ema1 ^ Esa1; + BCe0 = Ebe0 ^ Ege0 ^ Eke0 ^ Eme0 ^ Ese0; + BCe1 = Ebe1 ^ Ege1 ^ Eke1 ^ Eme1 ^ Ese1; + BCi0 = Ebi0 ^ Egi0 ^ Eki0 ^ Emi0 ^ Esi0; + BCi1 = Ebi1 ^ Egi1 ^ Eki1 ^ Emi1 ^ Esi1; + BCo0 = Ebo0 ^ Ego0 ^ Eko0 ^ Emo0 ^ Eso0; + BCo1 = Ebo1 ^ Ego1 ^ Eko1 ^ Emo1 ^ Eso1; + BCu0 = Ebu0 ^ Egu0 ^ Eku0 ^ Emu0 ^ Esu0; + BCu1 = Ebu1 ^ Egu1 ^ Eku1 ^ Emu1 ^ Esu1; + + /* thetaRhoPiChiIota(round+1, E, A) */ + Da0 = BCu0 ^ ROL32(BCe1, 1); + Da1 = BCu1 ^ BCe0; + De0 = BCa0 ^ ROL32(BCi1, 1); + De1 = BCa1 ^ BCi0; + Di0 = BCe0 ^ ROL32(BCo1, 1); + Di1 = BCe1 ^ BCo0; + Do0 = BCi0 ^ ROL32(BCu1, 1); + Do1 = BCi1 ^ BCu0; + Du0 = BCo0 ^ ROL32(BCa1, 1); + Du1 = BCo1 ^ BCa0; + + Eba0 ^= Da0; + BCa0 = Eba0; + Ege0 ^= De0; + BCe0 = ROL32(Ege0, 22); + Eki1 ^= Di1; + BCi0 = ROL32(Eki1, 22); + Emo1 ^= Do1; + BCo0 = ROL32(Emo1, 11); + Esu0 ^= Du0; + BCu0 = ROL32(Esu0, 7); + Aba0 = BCa0 ^ ANDN32(BCe0, BCi0); + Aba0 ^= round_consts[round * 2 + 2]; + Abe0 = BCe0 ^ ANDN32(BCi0, BCo0); + Abi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Abo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Abu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Eba1 ^= Da1; + BCa1 = Eba1; + Ege1 ^= De1; + BCe1 = ROL32(Ege1, 22); + Eki0 ^= Di0; + BCi1 = ROL32(Eki0, 21); + Emo0 ^= Do0; + BCo1 = ROL32(Emo0, 10); + Esu1 ^= Du1; + BCu1 = ROL32(Esu1, 7); + Aba1 = BCa1 ^ ANDN32(BCe1, BCi1); + Aba1 ^= round_consts[round * 2 + 3]; + Abe1 = BCe1 ^ ANDN32(BCi1, BCo1); + Abi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Abo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Abu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebo0 ^= Do0; + BCa0 = ROL32(Ebo0, 14); + Egu0 ^= Du0; + BCe0 = ROL32(Egu0, 10); + Eka1 ^= Da1; + BCi0 = ROL32(Eka1, 2); + Eme1 ^= De1; + BCo0 = ROL32(Eme1, 23); + Esi1 ^= Di1; + BCu0 = ROL32(Esi1, 31); + Aga0 = BCa0 ^ ANDN32(BCe0, BCi0); + Age0 = BCe0 ^ ANDN32(BCi0, BCo0); + Agi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ago0 = BCo0 ^ ANDN32(BCu0, BCa0); + Agu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebo1 ^= Do1; + BCa1 = ROL32(Ebo1, 14); + Egu1 ^= Du1; + BCe1 = ROL32(Egu1, 10); + Eka0 ^= Da0; + BCi1 = ROL32(Eka0, 1); + Eme0 ^= De0; + BCo1 = ROL32(Eme0, 22); + Esi0 ^= Di0; + BCu1 = ROL32(Esi0, 30); + Aga1 = BCa1 ^ ANDN32(BCe1, BCi1); + Age1 = BCe1 ^ ANDN32(BCi1, BCo1); + Agi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ago1 = BCo1 ^ ANDN32(BCu1, BCa1); + Agu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebe1 ^= De1; + BCa0 = ROL32(Ebe1, 1); + Egi0 ^= Di0; + BCe0 = ROL32(Egi0, 3); + Eko1 ^= Do1; + BCi0 = ROL32(Eko1, 13); + Emu0 ^= Du0; + BCo0 = ROL32(Emu0, 4); + Esa0 ^= Da0; + BCu0 = ROL32(Esa0, 9); + Aka0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ake0 = BCe0 ^ ANDN32(BCi0, BCo0); + Aki0 = BCi0 ^ ANDN32(BCo0, BCu0); + Ako0 = BCo0 ^ ANDN32(BCu0, BCa0); + Aku0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebe0 ^= De0; + BCa1 = Ebe0; + Egi1 ^= Di1; + BCe1 = ROL32(Egi1, 3); + Eko0 ^= Do0; + BCi1 = ROL32(Eko0, 12); + Emu1 ^= Du1; + BCo1 = ROL32(Emu1, 4); + Esa1 ^= Da1; + BCu1 = ROL32(Esa1, 9); + Aka1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ake1 = BCe1 ^ ANDN32(BCi1, BCo1); + Aki1 = BCi1 ^ ANDN32(BCo1, BCu1); + Ako1 = BCo1 ^ ANDN32(BCu1, BCa1); + Aku1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebu1 ^= Du1; + BCa0 = ROL32(Ebu1, 14); + Ega0 ^= Da0; + BCe0 = ROL32(Ega0, 18); + Eke0 ^= De0; + BCi0 = ROL32(Eke0, 5); + Emi1 ^= Di1; + BCo0 = ROL32(Emi1, 8); + Eso0 ^= Do0; + BCu0 = ROL32(Eso0, 28); + Ama0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ame0 = BCe0 ^ ANDN32(BCi0, BCo0); + Ami0 = BCi0 ^ ANDN32(BCo0, BCu0); + Amo0 = BCo0 ^ ANDN32(BCu0, BCa0); + Amu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebu0 ^= Du0; + BCa1 = ROL32(Ebu0, 13); + Ega1 ^= Da1; + BCe1 = ROL32(Ega1, 18); + Eke1 ^= De1; + BCi1 = ROL32(Eke1, 5); + Emi0 ^= Di0; + BCo1 = ROL32(Emi0, 7); + Eso1 ^= Do1; + BCu1 = ROL32(Eso1, 28); + Ama1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ame1 = BCe1 ^ ANDN32(BCi1, BCo1); + Ami1 = BCi1 ^ ANDN32(BCo1, BCu1); + Amo1 = BCo1 ^ ANDN32(BCu1, BCa1); + Amu1 = BCu1 ^ ANDN32(BCa1, BCe1); + + Ebi0 ^= Di0; + BCa0 = ROL32(Ebi0, 31); + Ego1 ^= Do1; + BCe0 = ROL32(Ego1, 28); + Eku1 ^= Du1; + BCi0 = ROL32(Eku1, 20); + Ema1 ^= Da1; + BCo0 = ROL32(Ema1, 21); + Ese0 ^= De0; + BCu0 = ROL32(Ese0, 1); + Asa0 = BCa0 ^ ANDN32(BCe0, BCi0); + Ase0 = BCe0 ^ ANDN32(BCi0, BCo0); + Asi0 = BCi0 ^ ANDN32(BCo0, BCu0); + Aso0 = BCo0 ^ ANDN32(BCu0, BCa0); + Asu0 = BCu0 ^ ANDN32(BCa0, BCe0); + + Ebi1 ^= Di1; + BCa1 = ROL32(Ebi1, 31); + Ego0 ^= Do0; + BCe1 = ROL32(Ego0, 27); + Eku0 ^= Du0; + BCi1 = ROL32(Eku0, 19); + Ema0 ^= Da0; + BCo1 = ROL32(Ema0, 20); + Ese1 ^= De1; + BCu1 = ROL32(Ese1, 1); + Asa1 = BCa1 ^ ANDN32(BCe1, BCi1); + Ase1 = BCe1 ^ ANDN32(BCi1, BCo1); + Asi1 = BCi1 ^ ANDN32(BCo1, BCu1); + Aso1 = BCo1 ^ ANDN32(BCu1, BCa1); + Asu1 = BCu1 ^ ANDN32(BCa1, BCe1); + } + + state[0] = Aba0; + state[1] = Aba1; + state[2] = Abe0; + state[3] = Abe1; + state[4] = Abi0; + state[5] = Abi1; + state[6] = Abo0; + state[7] = Abo1; + state[8] = Abu0; + state[9] = Abu1; + state[10] = Aga0; + state[11] = Aga1; + state[12] = Age0; + state[13] = Age1; + state[14] = Agi0; + state[15] = Agi1; + state[16] = Ago0; + state[17] = Ago1; + state[18] = Agu0; + state[19] = Agu1; + state[20] = Aka0; + state[21] = Aka1; + state[22] = Ake0; + state[23] = Ake1; + state[24] = Aki0; + state[25] = Aki1; + state[26] = Ako0; + state[27] = Ako1; + state[28] = Aku0; + state[29] = Aku1; + state[30] = Ama0; + state[31] = Ama1; + state[32] = Ame0; + state[33] = Ame1; + state[34] = Ami0; + state[35] = Ami1; + state[36] = Amo0; + state[37] = Amo1; + state[38] = Amu0; + state[39] = Amu1; + state[40] = Asa0; + state[41] = Asa1; + state[42] = Ase0; + state[43] = Ase1; + state[44] = Asi0; + state[45] = Asi1; + state[46] = Aso0; + state[47] = Aso1; + state[48] = Asu0; + state[49] = Asu1; + + return sizeof(void *) * 4 + sizeof(u32) * 12 * 5 * 2; +} diff --git a/cipher/keccak_permute_64.h b/cipher/keccak_permute_64.h new file mode 100644 index 0000000..1264f19 --- /dev/null +++ b/cipher/keccak_permute_64.h @@ -0,0 +1,290 @@ +/* keccak_permute_64.h - Keccak permute function (simple 64bit) + * Copyright (C) 2015 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser general Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +/* The code is based on public-domain/CC0 "keccakc1024/simple/Keccak-simple.c" + * implementation by Ronny Van Keer from SUPERCOP toolkit package. + */ + +/* Function that computes the Keccak-f[1600] permutation on the given state. */ +static unsigned int +KECCAK_F1600_PERMUTE_FUNC_NAME(KECCAK_STATE *hd) +{ + const u64 *round_consts = round_consts_64bit; + u64 Aba, Abe, Abi, Abo, Abu; + u64 Aga, Age, Agi, Ago, Agu; + u64 Aka, Ake, Aki, Ako, Aku; + u64 Ama, Ame, Ami, Amo, Amu; + u64 Asa, Ase, Asi, Aso, Asu; + u64 BCa, BCe, BCi, BCo, BCu; + u64 Da, De, Di, Do, Du; + u64 Eba, Ebe, Ebi, Ebo, Ebu; + u64 Ega, Ege, Egi, Ego, Egu; + u64 Eka, Eke, Eki, Eko, Eku; + u64 Ema, Eme, Emi, Emo, Emu; + u64 Esa, Ese, Esi, Eso, Esu; + u64 *state = hd->u.state64; + unsigned int round; + + Aba = state[0]; + Abe = state[1]; + Abi = state[2]; + Abo = state[3]; + Abu = state[4]; + Aga = state[5]; + Age = state[6]; + Agi = state[7]; + Ago = state[8]; + Agu = state[9]; + Aka = state[10]; + Ake = state[11]; + Aki = state[12]; + Ako = state[13]; + Aku = state[14]; + Ama = state[15]; + Ame = state[16]; + Ami = state[17]; + Amo = state[18]; + Amu = state[19]; + Asa = state[20]; + Ase = state[21]; + Asi = state[22]; + Aso = state[23]; + Asu = state[24]; + + for (round = 0; round < 24; round += 2) + { + /* prepareTheta */ + BCa = Aba ^ Aga ^ Aka ^ Ama ^ Asa; + BCe = Abe ^ Age ^ Ake ^ Ame ^ Ase; + BCi = Abi ^ Agi ^ Aki ^ Ami ^ Asi; + BCo = Abo ^ Ago ^ Ako ^ Amo ^ Aso; + BCu = Abu ^ Agu ^ Aku ^ Amu ^ Asu; + + /* thetaRhoPiChiIotaPrepareTheta(round , A, E) */ + Da = BCu ^ ROL64(BCe, 1); + De = BCa ^ ROL64(BCi, 1); + Di = BCe ^ ROL64(BCo, 1); + Do = BCi ^ ROL64(BCu, 1); + Du = BCo ^ ROL64(BCa, 1); + + Aba ^= Da; + BCa = Aba; + Age ^= De; + BCe = ROL64(Age, 44); + Aki ^= Di; + BCi = ROL64(Aki, 43); + Amo ^= Do; + BCo = ROL64(Amo, 21); + Asu ^= Du; + BCu = ROL64(Asu, 14); + Eba = BCa ^ ANDN64(BCe, BCi); + Eba ^= (u64)round_consts[round]; + Ebe = BCe ^ ANDN64(BCi, BCo); + Ebi = BCi ^ ANDN64(BCo, BCu); + Ebo = BCo ^ ANDN64(BCu, BCa); + Ebu = BCu ^ ANDN64(BCa, BCe); + + Abo ^= Do; + BCa = ROL64(Abo, 28); + Agu ^= Du; + BCe = ROL64(Agu, 20); + Aka ^= Da; + BCi = ROL64(Aka, 3); + Ame ^= De; + BCo = ROL64(Ame, 45); + Asi ^= Di; + BCu = ROL64(Asi, 61); + Ega = BCa ^ ANDN64(BCe, BCi); + Ege = BCe ^ ANDN64(BCi, BCo); + Egi = BCi ^ ANDN64(BCo, BCu); + Ego = BCo ^ ANDN64(BCu, BCa); + Egu = BCu ^ ANDN64(BCa, BCe); + + Abe ^= De; + BCa = ROL64(Abe, 1); + Agi ^= Di; + BCe = ROL64(Agi, 6); + Ako ^= Do; + BCi = ROL64(Ako, 25); + Amu ^= Du; + BCo = ROL64(Amu, 8); + Asa ^= Da; + BCu = ROL64(Asa, 18); + Eka = BCa ^ ANDN64(BCe, BCi); + Eke = BCe ^ ANDN64(BCi, BCo); + Eki = BCi ^ ANDN64(BCo, BCu); + Eko = BCo ^ ANDN64(BCu, BCa); + Eku = BCu ^ ANDN64(BCa, BCe); + + Abu ^= Du; + BCa = ROL64(Abu, 27); + Aga ^= Da; + BCe = ROL64(Aga, 36); + Ake ^= De; + BCi = ROL64(Ake, 10); + Ami ^= Di; + BCo = ROL64(Ami, 15); + Aso ^= Do; + BCu = ROL64(Aso, 56); + Ema = BCa ^ ANDN64(BCe, BCi); + Eme = BCe ^ ANDN64(BCi, BCo); + Emi = BCi ^ ANDN64(BCo, BCu); + Emo = BCo ^ ANDN64(BCu, BCa); + Emu = BCu ^ ANDN64(BCa, BCe); + + Abi ^= Di; + BCa = ROL64(Abi, 62); + Ago ^= Do; + BCe = ROL64(Ago, 55); + Aku ^= Du; + BCi = ROL64(Aku, 39); + Ama ^= Da; + BCo = ROL64(Ama, 41); + Ase ^= De; + BCu = ROL64(Ase, 2); + Esa = BCa ^ ANDN64(BCe, BCi); + Ese = BCe ^ ANDN64(BCi, BCo); + Esi = BCi ^ ANDN64(BCo, BCu); + Eso = BCo ^ ANDN64(BCu, BCa); + Esu = BCu ^ ANDN64(BCa, BCe); + + /* prepareTheta */ + BCa = Eba ^ Ega ^ Eka ^ Ema ^ Esa; + BCe = Ebe ^ Ege ^ Eke ^ Eme ^ Ese; + BCi = Ebi ^ Egi ^ Eki ^ Emi ^ Esi; + BCo = Ebo ^ Ego ^ Eko ^ Emo ^ Eso; + BCu = Ebu ^ Egu ^ Eku ^ Emu ^ Esu; + + /* thetaRhoPiChiIotaPrepareTheta(round+1, E, A) */ + Da = BCu ^ ROL64(BCe, 1); + De = BCa ^ ROL64(BCi, 1); + Di = BCe ^ ROL64(BCo, 1); + Do = BCi ^ ROL64(BCu, 1); + Du = BCo ^ ROL64(BCa, 1); + + Eba ^= Da; + BCa = Eba; + Ege ^= De; + BCe = ROL64(Ege, 44); + Eki ^= Di; + BCi = ROL64(Eki, 43); + Emo ^= Do; + BCo = ROL64(Emo, 21); + Esu ^= Du; + BCu = ROL64(Esu, 14); + Aba = BCa ^ ANDN64(BCe, BCi); + Aba ^= (u64)round_consts[round + 1]; + Abe = BCe ^ ANDN64(BCi, BCo); + Abi = BCi ^ ANDN64(BCo, BCu); + Abo = BCo ^ ANDN64(BCu, BCa); + Abu = BCu ^ ANDN64(BCa, BCe); + + Ebo ^= Do; + BCa = ROL64(Ebo, 28); + Egu ^= Du; + BCe = ROL64(Egu, 20); + Eka ^= Da; + BCi = ROL64(Eka, 3); + Eme ^= De; + BCo = ROL64(Eme, 45); + Esi ^= Di; + BCu = ROL64(Esi, 61); + Aga = BCa ^ ANDN64(BCe, BCi); + Age = BCe ^ ANDN64(BCi, BCo); + Agi = BCi ^ ANDN64(BCo, BCu); + Ago = BCo ^ ANDN64(BCu, BCa); + Agu = BCu ^ ANDN64(BCa, BCe); + + Ebe ^= De; + BCa = ROL64(Ebe, 1); + Egi ^= Di; + BCe = ROL64(Egi, 6); + Eko ^= Do; + BCi = ROL64(Eko, 25); + Emu ^= Du; + BCo = ROL64(Emu, 8); + Esa ^= Da; + BCu = ROL64(Esa, 18); + Aka = BCa ^ ANDN64(BCe, BCi); + Ake = BCe ^ ANDN64(BCi, BCo); + Aki = BCi ^ ANDN64(BCo, BCu); + Ako = BCo ^ ANDN64(BCu, BCa); + Aku = BCu ^ ANDN64(BCa, BCe); + + Ebu ^= Du; + BCa = ROL64(Ebu, 27); + Ega ^= Da; + BCe = ROL64(Ega, 36); + Eke ^= De; + BCi = ROL64(Eke, 10); + Emi ^= Di; + BCo = ROL64(Emi, 15); + Eso ^= Do; + BCu = ROL64(Eso, 56); + Ama = BCa ^ ANDN64(BCe, BCi); + Ame = BCe ^ ANDN64(BCi, BCo); + Ami = BCi ^ ANDN64(BCo, BCu); + Amo = BCo ^ ANDN64(BCu, BCa); + Amu = BCu ^ ANDN64(BCa, BCe); + + Ebi ^= Di; + BCa = ROL64(Ebi, 62); + Ego ^= Do; + BCe = ROL64(Ego, 55); + Eku ^= Du; + BCi = ROL64(Eku, 39); + Ema ^= Da; + BCo = ROL64(Ema, 41); + Ese ^= De; + BCu = ROL64(Ese, 2); + Asa = BCa ^ ANDN64(BCe, BCi); + Ase = BCe ^ ANDN64(BCi, BCo); + Asi = BCi ^ ANDN64(BCo, BCu); + Aso = BCo ^ ANDN64(BCu, BCa); + Asu = BCu ^ ANDN64(BCa, BCe); + } + + state[0] = Aba; + state[1] = Abe; + state[2] = Abi; + state[3] = Abo; + state[4] = Abu; + state[5] = Aga; + state[6] = Age; + state[7] = Agi; + state[8] = Ago; + state[9] = Agu; + state[10] = Aka; + state[11] = Ake; + state[12] = Aki; + state[13] = Ako; + state[14] = Aku; + state[15] = Ama; + state[16] = Ame; + state[17] = Ami; + state[18] = Amo; + state[19] = Amu; + state[20] = Asa; + state[21] = Ase; + state[22] = Asi; + state[23] = Aso; + state[24] = Asu; + + return sizeof(void *) * 4 + sizeof(u64) * 12 * 5; +} commit 909644ef5883927262366c356eed530e55aba478 Author: Jussi Kivilinna Date: Fri Oct 23 22:39:47 2015 +0300 hwf-x86: add detection for Intel CPUs with fast SHLD instruction * cipher/sha1.c (sha1_init): Use HWF_INTEL_FAST_SHLD instead of HWF_INTEL_CPU. * cipher/sha256.c (sha256_init, sha224_init): Ditto. * cipher/sha512.c (sha512_init, sha384_init): Ditto. * src/g10lib.h (HWF_INTEL_FAST_SHLD): New. (HWF_INTEL_BMI2, HWF_INTEL_SSSE3, HWF_INTEL_PCLMUL, HWF_INTEL_AESNI) (HWF_INTEL_RDRAND, HWF_INTEL_AVX, HWF_INTEL_AVX2) (HWF_ARM_NEON): Update. * src/hwf-x86.c (detect_x86_gnuc): Add detection of Intel Core CPUs with fast SHLD/SHRD instruction. * src/hwfeatures.c (hwflist): Add "intel-fast-shld". -- Intel Core CPUs since codename sandy-bridge have been able to execute SHLD/SHRD instructions faster than rotate instructions ROL/ROR. Since SHLD/SHRD can be used to do rotation, some optimized implementations (SHA1/SHA256/SHA512) use SHLD/SHRD instructions in-place of ROL/ROR. This patch provides more accurate detection of CPUs with fast SHLD implementation. Signed-off-by: Jussi Kivilinna diff --git a/cipher/sha1.c b/cipher/sha1.c index eb42883..554d55c 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -136,7 +136,7 @@ sha1_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_BMI2 hd->use_bmi2 = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_BMI2); diff --git a/cipher/sha256.c b/cipher/sha256.c index 59ffa43..63869d5 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -124,7 +124,7 @@ sha256_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); @@ -162,7 +162,7 @@ sha224_init (void *context, unsigned int flags) #ifdef USE_AVX /* AVX implementation uses SHLD which is known to be slow on non-Intel CPUs. * Therefore use this implementation on Intel CPUs only. */ - hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + hd->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 hd->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); diff --git a/cipher/sha512.c b/cipher/sha512.c index 029f8f0..4be1cab 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -154,7 +154,7 @@ sha512_init (void *context, unsigned int flags) ctx->use_ssse3 = (features & HWF_INTEL_SSSE3) != 0; #endif #ifdef USE_AVX - ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 ctx->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); @@ -194,7 +194,7 @@ sha384_init (void *context, unsigned int flags) ctx->use_ssse3 = (features & HWF_INTEL_SSSE3) != 0; #endif #ifdef USE_AVX - ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_CPU); + ctx->use_avx = (features & HWF_INTEL_AVX) && (features & HWF_INTEL_FAST_SHLD); #endif #ifdef USE_AVX2 ctx->use_avx2 = (features & HWF_INTEL_AVX2) && (features & HWF_INTEL_BMI2); diff --git a/src/g10lib.h b/src/g10lib.h index d1f9426..a579e94 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -197,16 +197,17 @@ int _gcry_log_verbosity( int level ); #define HWF_PADLOCK_SHA 4 #define HWF_PADLOCK_MMUL 8 -#define HWF_INTEL_CPU 16 -#define HWF_INTEL_BMI2 32 -#define HWF_INTEL_SSSE3 64 -#define HWF_INTEL_PCLMUL 128 -#define HWF_INTEL_AESNI 256 -#define HWF_INTEL_RDRAND 512 -#define HWF_INTEL_AVX 1024 -#define HWF_INTEL_AVX2 2048 - -#define HWF_ARM_NEON 4096 +#define HWF_INTEL_CPU 16 +#define HWF_INTEL_FAST_SHLD 32 +#define HWF_INTEL_BMI2 64 +#define HWF_INTEL_SSSE3 128 +#define HWF_INTEL_PCLMUL 256 +#define HWF_INTEL_AESNI 512 +#define HWF_INTEL_RDRAND 1024 +#define HWF_INTEL_AVX 2048 +#define HWF_INTEL_AVX2 4096 + +#define HWF_ARM_NEON 8192 gpg_err_code_t _gcry_disable_hw_feature (const char *name); diff --git a/src/hwf-x86.c b/src/hwf-x86.c index 399952c..fbd6331 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -174,6 +174,7 @@ detect_x86_gnuc (void) unsigned int features; unsigned int os_supports_avx_avx2_registers = 0; unsigned int max_cpuid_level; + unsigned int fms, family, model; unsigned int result = 0; (void)os_supports_avx_avx2_registers; @@ -236,8 +237,37 @@ detect_x86_gnuc (void) /* Detect Intel features, that might also be supported by other vendors. */ - /* Get CPU info and Intel feature flags (ECX). */ - get_cpuid(1, NULL, NULL, &features, NULL); + /* Get CPU family/model/stepping (EAX) and Intel feature flags (ECX). */ + get_cpuid(1, &fms, NULL, &features, NULL); + + family = ((fms & 0xf00) >> 8) + ((fms & 0xff00000) >> 20); + model = ((fms & 0xf0) >> 4) + ((fms & 0xf0000) >> 12); + + if ((result & HWF_INTEL_CPU) && family == 6) + { + /* These Intel Core processor models have SHLD/SHRD instruction that + * can do integer rotation faster actual ROL/ROR instructions. */ + switch (model) + { + case 0x2A: + case 0x2D: + case 0x3A: + case 0x3C: + case 0x3F: + case 0x45: + case 0x46: + case 0x3D: + case 0x4F: + case 0x56: + case 0x47: + case 0x4E: + case 0x5E: + case 0x55: + case 0x66: + result |= HWF_INTEL_FAST_SHLD; + break; + } + } #ifdef ENABLE_PCLMUL_SUPPORT /* Test bit 1 for PCLMUL. */ diff --git a/src/hwfeatures.c b/src/hwfeatures.c index 58099c4..e7c55cc 100644 --- a/src/hwfeatures.c +++ b/src/hwfeatures.c @@ -42,19 +42,20 @@ static struct const char *desc; } hwflist[] = { - { HWF_PADLOCK_RNG, "padlock-rng" }, - { HWF_PADLOCK_AES, "padlock-aes" }, - { HWF_PADLOCK_SHA, "padlock-sha" }, - { HWF_PADLOCK_MMUL,"padlock-mmul"}, - { HWF_INTEL_CPU, "intel-cpu" }, - { HWF_INTEL_BMI2, "intel-bmi2" }, - { HWF_INTEL_SSSE3, "intel-ssse3" }, - { HWF_INTEL_PCLMUL,"intel-pclmul" }, - { HWF_INTEL_AESNI, "intel-aesni" }, - { HWF_INTEL_RDRAND,"intel-rdrand" }, - { HWF_INTEL_AVX, "intel-avx" }, - { HWF_INTEL_AVX2, "intel-avx2" }, - { HWF_ARM_NEON, "arm-neon" } + { HWF_PADLOCK_RNG, "padlock-rng" }, + { HWF_PADLOCK_AES, "padlock-aes" }, + { HWF_PADLOCK_SHA, "padlock-sha" }, + { HWF_PADLOCK_MMUL, "padlock-mmul"}, + { HWF_INTEL_CPU, "intel-cpu" }, + { HWF_INTEL_FAST_SHLD, "intel-fast-shld" }, + { HWF_INTEL_BMI2, "intel-bmi2" }, + { HWF_INTEL_SSSE3, "intel-ssse3" }, + { HWF_INTEL_PCLMUL, "intel-pclmul" }, + { HWF_INTEL_AESNI, "intel-aesni" }, + { HWF_INTEL_RDRAND, "intel-rdrand" }, + { HWF_INTEL_AVX, "intel-avx" }, + { HWF_INTEL_AVX2, "intel-avx2" }, + { HWF_ARM_NEON, "arm-neon" } }; /* A bit vector with the hardware features which shall not be used. commit 16fd540f4d01eb6dc23d9509ae549353617c7a67 Author: Jussi Kivilinna Date: Sat Oct 24 12:41:23 2015 +0300 Fix OCB amd64 assembly implementations for x32 * cipher/camellia-glue.c (_gcry_camellia_aesni_avx_ocb_enc) (_gcry_camellia_aesni_avx_ocb_dec, _gcry_camellia_aesni_avx_ocb_auth) (_gcry_camellia_aesni_avx2_ocb_enc, _gcry_camellia_aesni_avx2_ocb_dec) (_gcry_camellia_aesni_avx2_ocb_auth, _gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_auth): Change 'Ls' from pointer array to u64 array. * cipher/serpent.c (_gcry_serpent_sse2_ocb_enc) (_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth) (_gcry_serpent_avx2_ocb_enc, _gcry_serpent_avx2_ocb_dec) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Ditto. * cipher/twofish.c (_gcry_twofish_amd64_ocb_enc) (_gcry_twofish_amd64_ocb_dec, _gcry_twofish_amd64_ocb_auth) (twofish_amd64_ocb_enc, twofish_amd64_ocb_dec, twofish_amd64_ocb_auth) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Ditto. -- Pointers on x32 are 32-bit, but amd64 assembly implementations expect 64-bit pointers. Pass 'Ls' array to 64-bit integers so that input arrays has correct format for assembly functions. Signed-off-by: Jussi Kivilinna diff --git a/cipher/camellia-glue.c b/cipher/camellia-glue.c index dee0169..dfddb4a 100644 --- a/cipher/camellia-glue.c +++ b/cipher/camellia-glue.c @@ -141,20 +141,20 @@ extern void _gcry_camellia_aesni_avx_ocb_enc(CAMELLIA_context *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_ocb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_ocb_auth(CAMELLIA_context *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, const unsigned char *key, @@ -185,20 +185,20 @@ extern void _gcry_camellia_aesni_avx2_ocb_enc(CAMELLIA_context *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_ocb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_ocb_auth(CAMELLIA_context *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[32]) ASM_FUNC_ABI; + const u64 Ls[32]) ASM_FUNC_ABI; #endif static const char *selftest(void); @@ -630,27 +630,29 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_aesni_avx2) { int did_use_aesni_avx2 = 0; - const void *Ls[32]; + u64 Ls[32]; unsigned int n = 32 - (blkn % 32); - const void **l; + u64 *l; int i; if (nblocks >= 32) { for (i = 0; i < 32; i += 8) { - Ls[(i + 0 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 32] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 32] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 32] = c->u_mode.ocb.L[3]; - Ls[(15 + n) % 32] = c->u_mode.ocb.L[4]; - Ls[(23 + n) % 32] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(31 + n) % 32]; /* Process data in 32 block chunks. */ @@ -658,7 +660,7 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 32; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 32); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 32); if (encrypt) _gcry_camellia_aesni_avx2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -691,25 +693,27 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_aesni_avx) { int did_use_aesni_avx = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -717,7 +721,7 @@ _gcry_camellia_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); if (encrypt) _gcry_camellia_aesni_avx_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -780,27 +784,29 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_aesni_avx2) { int did_use_aesni_avx2 = 0; - const void *Ls[32]; + u64 Ls[32]; unsigned int n = 32 - (blkn % 32); - const void **l; + u64 *l; int i; if (nblocks >= 32) { for (i = 0; i < 32; i += 8) { - Ls[(i + 0 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 32] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 32] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 32] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 32] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 32] = c->u_mode.ocb.L[3]; - Ls[(15 + n) % 32] = c->u_mode.ocb.L[4]; - Ls[(23 + n) % 32] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + Ls[(15 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[4]; + Ls[(23 + n) % 32] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(31 + n) % 32]; /* Process data in 32 block chunks. */ @@ -808,7 +814,7 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 32; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 32); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 32); _gcry_camellia_aesni_avx2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, @@ -837,25 +843,27 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_aesni_avx) { int did_use_aesni_avx = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -863,7 +871,7 @@ _gcry_camellia_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); _gcry_camellia_aesni_avx_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, diff --git a/cipher/serpent.c b/cipher/serpent.c index fc3afa6..4ef7f52 100644 --- a/cipher/serpent.c +++ b/cipher/serpent.c @@ -125,20 +125,20 @@ extern void _gcry_serpent_sse2_ocb_enc(serpent_context_t *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_ocb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_ocb_auth(serpent_context_t *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[8]) ASM_FUNC_ABI; + const u64 Ls[8]) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 @@ -165,20 +165,20 @@ extern void _gcry_serpent_avx2_ocb_enc(serpent_context_t *ctx, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_ocb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_ocb_auth(serpent_context_t *ctx, const unsigned char *abuf, unsigned char *offset, unsigned char *checksum, - const void *Ls[16]) ASM_FUNC_ABI; + const u64 Ls[16]) ASM_FUNC_ABI; #endif #ifdef USE_NEON @@ -1249,25 +1249,27 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, if (ctx->use_avx2) { int did_use_avx2 = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -1275,7 +1277,7 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); if (encrypt) _gcry_serpent_avx2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -1305,19 +1307,21 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, #ifdef USE_SSE2 { int did_use_sse2 = 0; - const void *Ls[8]; + u64 Ls[8]; unsigned int n = 8 - (blkn % 8); - const void **l; + u64 *l; if (nblocks >= 8) { - Ls[(0 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(1 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(2 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(3 + n) % 8] = c->u_mode.ocb.L[2]; - Ls[(4 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(5 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(6 + n) % 8] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(0 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(1 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(2 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(3 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(4 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(5 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(6 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; l = &Ls[(7 + n) % 8]; /* Process data in 8 block chunks. */ @@ -1325,7 +1329,7 @@ _gcry_serpent_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 8; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 8); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 8); if (encrypt) _gcry_serpent_sse2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, @@ -1435,25 +1439,27 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (ctx->use_avx2) { int did_use_avx2 = 0; - const void *Ls[16]; + u64 Ls[16]; unsigned int n = 16 - (blkn % 16); - const void **l; + u64 *l; int i; if (nblocks >= 16) { for (i = 0; i < 16; i += 8) { - Ls[(i + 0 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 1 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 2 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 3 + n) % 16] = c->u_mode.ocb.L[2]; - Ls[(i + 4 + n) % 16] = c->u_mode.ocb.L[0]; - Ls[(i + 5 + n) % 16] = c->u_mode.ocb.L[1]; - Ls[(i + 6 + n) % 16] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; } - Ls[(7 + n) % 16] = c->u_mode.ocb.L[3]; + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; l = &Ls[(15 + n) % 16]; /* Process data in 16 block chunks. */ @@ -1461,7 +1467,7 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 16; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 16); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 16); _gcry_serpent_avx2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, c->u_mode.ocb.aad_sum, Ls); @@ -1486,19 +1492,21 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, #ifdef USE_SSE2 { int did_use_sse2 = 0; - const void *Ls[8]; + u64 Ls[8]; unsigned int n = 8 - (blkn % 8); - const void **l; + u64 *l; if (nblocks >= 8) { - Ls[(0 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(1 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(2 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(3 + n) % 8] = c->u_mode.ocb.L[2]; - Ls[(4 + n) % 8] = c->u_mode.ocb.L[0]; - Ls[(5 + n) % 8] = c->u_mode.ocb.L[1]; - Ls[(6 + n) % 8] = c->u_mode.ocb.L[0]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(0 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(1 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(2 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(3 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(4 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(5 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(6 + n) % 8] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; l = &Ls[(7 + n) % 8]; /* Process data in 8 block chunks. */ @@ -1506,7 +1514,7 @@ _gcry_serpent_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, { /* l_tmp will be used only every 65536-th block. */ blkn += 8; - *l = ocb_get_l(c, l_tmp, blkn - blkn % 8); + *l = (uintptr_t)(void *)ocb_get_l(c, l_tmp, blkn - blkn % 8); _gcry_serpent_sse2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, c->u_mode.ocb.aad_sum, Ls); diff --git a/cipher/twofish.c b/cipher/twofish.c index 7f361c9..f6ecd67 100644 --- a/cipher/twofish.c +++ b/cipher/twofish.c @@ -734,15 +734,15 @@ extern void _gcry_twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, extern void _gcry_twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); extern void _gcry_twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); extern void _gcry_twofish_amd64_ocb_auth(const TWOFISH_context *ctx, const byte *abuf, byte *offset, - byte *checksum, const void *Ls[3]); + byte *checksum, const u64 Ls[3]); #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS static inline void @@ -854,7 +854,7 @@ twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, static inline void twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn6(_gcry_twofish_amd64_ocb_enc, ctx, out, in, offset, checksum, Ls); @@ -865,7 +865,7 @@ twofish_amd64_ocb_enc(const TWOFISH_context *ctx, byte *out, const byte *in, static inline void twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn6(_gcry_twofish_amd64_ocb_dec, ctx, out, in, offset, checksum, Ls); @@ -876,7 +876,7 @@ twofish_amd64_ocb_dec(const TWOFISH_context *ctx, byte *out, const byte *in, static inline void twofish_amd64_ocb_auth(const TWOFISH_context *ctx, const byte *abuf, - byte *offset, byte *checksum, const void *Ls[3]) + byte *offset, byte *checksum, const u64 Ls[3]) { #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS call_sysv_fn5(_gcry_twofish_amd64_ocb_auth, ctx, abuf, offset, checksum, Ls); @@ -1261,15 +1261,17 @@ _gcry_twofish_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, u64 blkn = c->u_mode.ocb.data_nblocks; { - const void *Ls[3]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + u64 Ls[3]; /* Process data in 3 block chunks. */ while (nblocks >= 3) { /* l_tmp will be used only every 65536-th block. */ - Ls[0] = ocb_get_l(c, l_tmp, blkn + 1); - Ls[1] = ocb_get_l(c, l_tmp, blkn + 2); - Ls[2] = ocb_get_l(c, l_tmp, blkn + 3); + Ls[0] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 1); + Ls[1] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 2); + Ls[2] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 3); blkn += 3; if (encrypt) @@ -1320,15 +1322,17 @@ _gcry_twofish_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, u64 blkn = c->u_mode.ocb.aad_nblocks; { - const void *Ls[3]; + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + u64 Ls[3]; /* Process data in 3 block chunks. */ while (nblocks >= 3) { /* l_tmp will be used only every 65536-th block. */ - Ls[0] = ocb_get_l(c, l_tmp, blkn + 1); - Ls[1] = ocb_get_l(c, l_tmp, blkn + 2); - Ls[2] = ocb_get_l(c, l_tmp, blkn + 3); + Ls[0] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 1); + Ls[1] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 2); + Ls[2] = (uintptr_t)(const void *)ocb_get_l(c, l_tmp, blkn + 3); blkn += 3; twofish_amd64_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, commit ae40af427fd2a856b24ec2a41323ec8b80ffc9c0 Author: Jussi Kivilinna Date: Fri Oct 23 22:24:47 2015 +0300 bench-slope: add KDF/PBKDF2 benchmark * tests/bench-slope.c (bench_kdf_mode, bench_kdf_init, bench_kdf_free) (bench_kdf_do_bench, kdf_ops, kdf_bench_one, kdf_bench): New. (print_help): Add 'kdf'. (main): Add KDF benchmarks. -- Introduce KDF benchmarking to bench-slope. Output is given as nanosecs/iter (and cycles/iter if --cpu-mhz used). Only PBKDF2 is support with this initial patch. For example, below shows output of KDF bench-slope before and after commit "md: keep contexts for HMAC in GcryDigestEntry", on Intel Core i5-4570 @ 3.2 Ghz: Before: $ tests/bench-slope --cpu-mhz 3201 kdf KDF: | nanosecs/iter cycles/iter PBKDF2-HMAC-MD5 | 882.4 2824.7 PBKDF2-HMAC-SHA1 | 832.6 2665.0 PBKDF2-HMAC-RIPEMD160 | 1148.3 3675.6 PBKDF2-HMAC-TIGER192 | 1339.6 4288.2 PBKDF2-HMAC-SHA256 | 1460.5 4675.1 PBKDF2-HMAC-SHA384 | 1723.2 5515.8 PBKDF2-HMAC-SHA512 | 1729.1 5534.7 PBKDF2-HMAC-SHA224 | 1424.0 4558.3 PBKDF2-HMAC-WHIRLPOOL | 2459.7 7873.5 PBKDF2-HMAC-TIGER | 1350.2 4322.1 PBKDF2-HMAC-TIGER2 | 1348.7 4317.3 PBKDF2-HMAC-GOSTR3411_94 | 7374.1 23604.4 PBKDF2-HMAC-STRIBOG256 | 6060.0 19398.1 PBKDF2-HMAC-STRIBOG512 | 7512.8 24048.3 PBKDF2-HMAC-GOSTR3411_CP | 7378.3 23618.0 PBKDF2-HMAC-SHA3-224 | 2789.6 8929.5 PBKDF2-HMAC-SHA3-256 | 2785.1 8915.0 PBKDF2-HMAC-SHA3-384 | 2955.5 9460.5 PBKDF2-HMAC-SHA3-512 | 2859.7 9153.9 = After: $ tests/bench-slope --cpu-mhz 3201 kdf KDF: | nanosecs/iter cycles/iter PBKDF2-HMAC-MD5 | 405.9 1299.2 PBKDF2-HMAC-SHA1 | 392.1 1255.0 PBKDF2-HMAC-RIPEMD160 | 540.9 1731.5 PBKDF2-HMAC-TIGER192 | 637.1 2039.4 PBKDF2-HMAC-SHA256 | 691.8 2214.3 PBKDF2-HMAC-SHA384 | 848.0 2714.3 PBKDF2-HMAC-SHA512 | 875.7 2803.1 PBKDF2-HMAC-SHA224 | 689.2 2206.0 PBKDF2-HMAC-WHIRLPOOL | 1535.6 4915.5 PBKDF2-HMAC-TIGER | 636.3 2036.7 PBKDF2-HMAC-TIGER2 | 636.6 2037.7 PBKDF2-HMAC-GOSTR3411_94 | 5311.5 17002.2 PBKDF2-HMAC-STRIBOG256 | 4308.0 13790.0 PBKDF2-HMAC-STRIBOG512 | 5767.4 18461.4 PBKDF2-HMAC-GOSTR3411_CP | 5309.4 16995.4 PBKDF2-HMAC-SHA3-224 | 1333.1 4267.2 PBKDF2-HMAC-SHA3-256 | 1327.8 4250.4 PBKDF2-HMAC-SHA3-384 | 1392.8 4458.3 PBKDF2-HMAC-SHA3-512 | 1428.5 4572.7 = Signed-off-by: Jussi Kivilinna diff --git a/tests/bench-slope.c b/tests/bench-slope.c index 394d7fc..2679556 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -1571,13 +1571,176 @@ mac_bench (char **argv, int argc) } +/************************************************************ KDF benchmarks. */ + +struct bench_kdf_mode +{ + struct bench_ops *ops; + + int algo; + int subalgo; +}; + + +static int +bench_kdf_init (struct bench_obj *obj) +{ + struct bench_kdf_mode *mode = obj->priv; + + if (mode->algo == GCRY_KDF_PBKDF2) + { + obj->min_bufsize = 2; + obj->max_bufsize = 2 * 32; + obj->step_size = 2; + } + + obj->num_measure_repetitions = num_measurement_repetitions; + + return 0; +} + +static void +bench_kdf_free (struct bench_obj *obj) +{ + (void)obj; +} + +static void +bench_kdf_do_bench (struct bench_obj *obj, void *buf, size_t buflen) +{ + struct bench_kdf_mode *mode = obj->priv; + char keybuf[16]; + + (void)buf; + + if (mode->algo == GCRY_KDF_PBKDF2) + { + gcry_kdf_derive("qwerty", 6, mode->algo, mode->subalgo, "01234567", 8, + buflen, sizeof(keybuf), keybuf); + } +} + +static struct bench_ops kdf_ops = { + &bench_kdf_init, + &bench_kdf_free, + &bench_kdf_do_bench +}; + + +static void +kdf_bench_one (int algo, int subalgo) +{ + struct bench_kdf_mode mode = { &kdf_ops }; + struct bench_obj obj = { 0 }; + double nsecs_per_iteration; + double cycles_per_iteration; + char algo_name[32]; + char nsecpiter_buf[16]; + char cpiter_buf[16]; + + mode.algo = algo; + mode.subalgo = subalgo; + + switch (subalgo) + { + case GCRY_MD_CRC32: + case GCRY_MD_CRC32_RFC1510: + case GCRY_MD_CRC24_RFC2440: + case GCRY_MD_MD4: + /* Skip CRC32s. */ + return; + } + + *algo_name = 0; + + if (algo == GCRY_KDF_PBKDF2) + { + snprintf (algo_name, sizeof(algo_name), "PBKDF2-HMAC-%s", + gcry_md_algo_name (subalgo)); + } + + bench_print_algo (-24, algo_name); + + obj.ops = mode.ops; + obj.priv = &mode; + + nsecs_per_iteration = do_slope_benchmark (&obj); + + strcpy(cpiter_buf, csv_mode ? "" : "-"); + + double_to_str (nsecpiter_buf, sizeof (nsecpiter_buf), nsecs_per_iteration); + + /* If user didn't provide CPU speed, we cannot show cycles/iter results. */ + if (cpu_ghz > 0.0) + { + cycles_per_iteration = nsecs_per_iteration * cpu_ghz; + double_to_str (cpiter_buf, sizeof (cpiter_buf), cycles_per_iteration); + } + + if (csv_mode) + { + printf ("%s,%s,%s,,,,,,,,,%s,ns/iter,%s,c/iter\n", + current_section_name, + current_algo_name ? current_algo_name : "", + current_mode_name ? current_mode_name : "", + nsecpiter_buf, + cpiter_buf); + } + else + { + printf ("%14s %13s\n", nsecpiter_buf, cpiter_buf); + } +} + +void +kdf_bench (char **argv, int argc) +{ + char algo_name[32]; + int i, j; + + bench_print_section ("kdf", "KDF"); + + if (!csv_mode) + { + printf (" %-*s | ", 24, ""); + printf ("%14s %13s\n", "nanosecs/iter", "cycles/iter"); + } + + if (argv && argc) + { + for (i = 0; i < argc; i++) + { + for (j = 1; j < 400; j++) + { + if (gcry_md_test_algo (j)) + continue; + + snprintf (algo_name, sizeof(algo_name), "PBKDF2-HMAC-%s", + gcry_md_algo_name (j)); + + if (!strcmp(argv[i], algo_name)) + kdf_bench_one (GCRY_KDF_PBKDF2, j); + } + } + } + else + { + for (i = 1; i < 400; i++) + if (!gcry_md_test_algo (i)) + kdf_bench_one (GCRY_KDF_PBKDF2, i); + } + + bench_print_footer (24); +} + + /************************************************************** Main program. */ void print_help (void) { static const char *help_lines[] = { - "usage: bench-slope [options] [hash|mac|cipher [algonames]]", + "usage: bench-slope [options] [hash|mac|cipher|kdf [algonames]]", "", " options:", " --cpu-mhz Set CPU speed for calculating cycles", @@ -1744,6 +1907,7 @@ main (int argc, char **argv) hash_bench (NULL, 0); mac_bench (NULL, 0); cipher_bench (NULL, 0); + kdf_bench (NULL, 0); } else if (!strcmp (*argv, "hash")) { @@ -1769,6 +1933,14 @@ main (int argc, char **argv) warm_up_cpu (); cipher_bench ((argc == 0) ? NULL : argv, argc); } + else if (!strcmp (*argv, "kdf")) + { + argc--; + argv++; + + warm_up_cpu (); + kdf_bench ((argc == 0) ? NULL : argv, argc); + } else { fprintf (stderr, PGM ": unknown argument: %s\n", *argv); ----------------------------------------------------------------------- Summary of changes: cipher/Makefile.am | 2 +- cipher/camellia-glue.c | 116 ++++--- cipher/hash-common.h | 12 +- cipher/keccak.c | 808 ++++++++++++++++++++++++++++++++------------- cipher/keccak_permute_32.h | 535 ++++++++++++++++++++++++++++++ cipher/keccak_permute_64.h | 290 ++++++++++++++++ cipher/serpent.c | 104 +++--- cipher/sha1.c | 2 +- cipher/sha256.c | 4 +- cipher/sha512.c | 4 +- cipher/twofish.c | 32 +- src/g10lib.h | 21 +- src/hwf-x86.c | 34 +- src/hwfeatures.c | 27 +- tests/bench-slope.c | 174 +++++++++- 15 files changed, 1775 insertions(+), 390 deletions(-) create mode 100644 cipher/keccak_permute_32.h create mode 100644 cipher/keccak_permute_64.h hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Sat Oct 31 13:39:34 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 31 Oct 2015 14:39:34 +0200 Subject: [PATCH] Keccak: Add SHAKE Extendable-Output Functions Message-ID: <20151031123934.21336.19559.stgit@localhost6.localdomain6> * src/hash-common.c (_gcry_hash_selftest_check_one): Add handling for XOFs. * src/keccak.c (keccak_ops_t): Rename 'extract_inplace' to 'extract' and add 'pos' argument. (KECCAK_CONTEXT): Add 'suffix'. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi): Rename to... (keccak_extract32bi): ...this; Add handling for 'pos' argument. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi_bmi2): Rename to... (keccak_extract32bi_bmi2): ...this; Add handling for 'pos' argument. (keccak_init): Setup 'suffix'; add SHAKE128 & SHAKE256. (shake128_init, shake256_init): New. (keccak_final): Do not initial permute for SHAKE output; use correct suffix for SHAKE. (keccak_extract): New. (keccak_selftests_keccak): Add SHAKE128 & SHAKE256 test-vectors. (run_selftests): Add SHAKE128 & SHAKE256. (shake128_asn, oid_spec_shake128, shake256_asn, oid_spec_shake256) (_gcry_digest_spec_shake128, _gcry_digest_spec_shake256): New. * cipher/md.c (digest_list): Add SHAKE128 & SHAKE256. * doc/gcrypt.texi: Ditto. * src/cipher.h (_gcry_digest_spec_shake128) (_gcry_digest_spec_shake256): New. * src/gcrypt.h.in (GCRY_MD_SHAKE128, GCRY_MD_SHAKE256): New. * tests/basic.c (check_one_md): Add XOF check; Add 'elen' argument. (check_one_md_multi): Skip if algo is XOF. (check_digests): Add SHAKE128 & SHAKE256 test vectors. * tests/bench-slope.c (kdf_bench_one): Skip XOFs. -- Signed-off-by: Jussi Kivilinna --- cipher/hash-common.c | 28 +++ cipher/keccak.c | 275 +++++++++++++++++++++++++++++---- cipher/md.c | 2 doc/gcrypt.texi | 12 + src/cipher.h | 2 src/gcrypt.h.in | 4 tests/basic.c | 423 ++++++++++++++++++++++++++++++++++++++++++++++++-- tests/bench-slope.c | 6 + 8 files changed, 700 insertions(+), 52 deletions(-) diff --git a/cipher/hash-common.c b/cipher/hash-common.c index 6743f09..a750d644 100644 --- a/cipher/hash-common.c +++ b/cipher/hash-common.c @@ -49,8 +49,12 @@ _gcry_hash_selftest_check_one (int algo, gcry_error_t err = 0; gcry_md_hd_t hd; unsigned char *digest; + char aaa[1000]; + int xof = 0; - if (_gcry_md_get_algo_dlen (algo) != expectlen) + if (_gcry_md_get_algo_dlen (algo) == 0) + xof = 1; + else if (_gcry_md_get_algo_dlen (algo) != expectlen) return "digest size does not match expected size"; err = _gcry_md_open (&hd, algo, 0); @@ -65,7 +69,6 @@ _gcry_hash_selftest_check_one (int algo, case 1: /* Hash one million times an "a". */ { - char aaa[1000]; int i; /* Write in odd size chunks so that we test the buffering. */ @@ -81,10 +84,23 @@ _gcry_hash_selftest_check_one (int algo, if (!result) { - digest = _gcry_md_read (hd, algo); - - if ( memcmp (digest, expect, expectlen) ) - result = "digest mismatch"; + if (!xof) + { + digest = _gcry_md_read (hd, algo); + + if ( memcmp (digest, expect, expectlen) ) + result = "digest mismatch"; + } + else + { + gcry_assert(expectlen <= sizeof(aaa)); + + err = _gcry_md_extract (hd, algo, aaa, expectlen); + if (err) + result = "error extracting output from XOF"; + else if ( memcmp (aaa, expect, expectlen) ) + result = "digest mismatch"; + } } _gcry_md_close (hd); diff --git a/cipher/keccak.c b/cipher/keccak.c index d46d9cb..f4f0ef3 100644 --- a/cipher/keccak.c +++ b/cipher/keccak.c @@ -90,7 +90,8 @@ typedef struct unsigned int (*permute)(KECCAK_STATE *hd); unsigned int (*absorb)(KECCAK_STATE *hd, int pos, const byte *lanes, unsigned int nlanes, int blocklanes); - unsigned int (*extract_inplace) (KECCAK_STATE *hd, unsigned int outlen); + unsigned int (*extract) (KECCAK_STATE *hd, unsigned int pos, byte *outbuf, + unsigned int outlen); } keccak_ops_t; @@ -100,6 +101,7 @@ typedef struct KECCAK_CONTEXT_S unsigned int outlen; unsigned int blocksize; unsigned int count; + unsigned int suffix; const keccak_ops_t *ops; } KECCAK_CONTEXT; @@ -124,13 +126,18 @@ static const u64 round_consts_64bit[24] = }; static unsigned int -keccak_extract_inplace64(KECCAK_STATE *hd, unsigned int outlen) +keccak_extract64(KECCAK_STATE *hd, unsigned int pos, byte *outbuf, + unsigned int outlen) { unsigned int i; - for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + /* NOTE: when pos == 0, hd and outbuf may point to same memory (SHA-3). */ + + for (i = pos; i < pos + outlen / 8 + !!(outlen % 8); i++) { - hd->u.state64[i] = le_bswap64(hd->u.state64[i]); + u64 tmp = hd->u.state64[i]; + buf_put_le64(outbuf, tmp); + outbuf += 8; } return 0; @@ -158,14 +165,17 @@ static const u32 round_consts_32bit[2 * 24] = }; static unsigned int -keccak_extract_inplace32bi(KECCAK_STATE *hd, unsigned int outlen) +keccak_extract32bi(KECCAK_STATE *hd, unsigned int pos, byte *outbuf, + unsigned int outlen) { unsigned int i; u32 x0; u32 x1; u32 t; - for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + /* NOTE: when pos == 0, hd and outbuf may point to same memory (SHA-3). */ + + for (i = pos; i < pos + outlen / 8 + !!(outlen % 8); i++) { x0 = hd->u.state32bi[i * 2 + 0]; x1 = hd->u.state32bi[i * 2 + 1]; @@ -182,8 +192,9 @@ keccak_extract_inplace32bi(KECCAK_STATE *hd, unsigned int outlen) t = (x1 ^ (x1 >> 2)) & 0x0C0C0C0CUL; x1 = x1 ^ t ^ (t << 2); t = (x1 ^ (x1 >> 1)) & 0x22222222UL; x1 = x1 ^ t ^ (t << 1); - hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); - hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + buf_put_le32(&outbuf[0], x0); + buf_put_le32(&outbuf[4], x1); + outbuf += 8; } return 0; @@ -249,7 +260,7 @@ static const keccak_ops_t keccak_generic64_ops = { .permute = keccak_f1600_state_permute64, .absorb = keccak_absorb_lanes64, - .extract_inplace = keccak_extract_inplace64, + .extract = keccak_extract64, }; #endif /* USE_64BIT */ @@ -300,7 +311,7 @@ static const keccak_ops_t keccak_shld_64_ops = { .permute = keccak_f1600_state_permute64_shld, .absorb = keccak_absorb_lanes64_shld, - .extract_inplace = keccak_extract_inplace64, + .extract = keccak_extract64, }; #endif /* USE_64BIT_SHLD */ @@ -356,7 +367,7 @@ static const keccak_ops_t keccak_bmi2_64_ops = { .permute = keccak_f1600_state_permute64_bmi2, .absorb = keccak_absorb_lanes64_bmi2, - .extract_inplace = keccak_extract_inplace64, + .extract = keccak_extract64, }; #endif /* USE_64BIT_BMI2 */ @@ -404,7 +415,7 @@ static const keccak_ops_t keccak_generic32bi_ops = { .permute = keccak_f1600_state_permute32bi, .absorb = keccak_absorb_lanes32bi, - .extract_inplace = keccak_extract_inplace32bi, + .extract = keccak_extract32bi, }; #endif /* USE_32BIT */ @@ -483,14 +494,17 @@ keccak_absorb_lanes32bi_bmi2(KECCAK_STATE *hd, int pos, const byte *lanes, } static unsigned int -keccak_extract_inplace32bi_bmi2(KECCAK_STATE *hd, unsigned int outlen) +keccak_extract32bi_bmi2(KECCAK_STATE *hd, unsigned int pos, byte *outbuf, + unsigned int outlen) { unsigned int i; u32 x0; u32 x1; u32 t; - for (i = 0; i < outlen / 8 + !!(outlen % 8); i++) + /* NOTE: when pos == 0, hd and outbuf may point to same memory (SHA-3). */ + + for (i = pos; i < pos + outlen / 8 + !!(outlen % 8); i++) { x0 = hd->u.state32bi[i * 2 + 0]; x1 = hd->u.state32bi[i * 2 + 1]; @@ -502,8 +516,9 @@ keccak_extract_inplace32bi_bmi2(KECCAK_STATE *hd, unsigned int outlen) x0 = pdep(pext(x0, 0xffff0001), 0xaaaaaaab) | pdep(x0 >> 1, 0x55555554); x1 = pdep(pext(x1, 0xffff0001), 0xaaaaaaab) | pdep(x1 >> 1, 0x55555554); - hd->u.state32bi[i * 2 + 0] = le_bswap32(x0); - hd->u.state32bi[i * 2 + 1] = le_bswap32(x1); + buf_put_le32(&outbuf[0], x0); + buf_put_le32(&outbuf[4], x1); + outbuf += 8; } return 0; @@ -513,7 +528,7 @@ static const keccak_ops_t keccak_bmi2_32bi_ops = { .permute = keccak_f1600_state_permute32bi_bmi2, .absorb = keccak_absorb_lanes32bi_bmi2, - .extract_inplace = keccak_extract_inplace32bi_bmi2, + .extract = keccak_extract32bi_bmi2, }; #endif /* USE_32BIT */ @@ -638,21 +653,35 @@ keccak_init (int algo, void *context, unsigned int flags) switch (algo) { case GCRY_MD_SHA3_224: + ctx->suffix = SHA3_DELIMITED_SUFFIX; ctx->blocksize = 1152 / 8; ctx->outlen = 224 / 8; break; case GCRY_MD_SHA3_256: + ctx->suffix = SHA3_DELIMITED_SUFFIX; ctx->blocksize = 1088 / 8; ctx->outlen = 256 / 8; break; case GCRY_MD_SHA3_384: + ctx->suffix = SHA3_DELIMITED_SUFFIX; ctx->blocksize = 832 / 8; ctx->outlen = 384 / 8; break; case GCRY_MD_SHA3_512: + ctx->suffix = SHA3_DELIMITED_SUFFIX; ctx->blocksize = 576 / 8; ctx->outlen = 512 / 8; break; + case GCRY_MD_SHAKE128: + ctx->suffix = SHAKE_DELIMITED_SUFFIX; + ctx->blocksize = 1344 / 8; + ctx->outlen = 0; + break; + case GCRY_MD_SHAKE256: + ctx->suffix = SHAKE_DELIMITED_SUFFIX; + ctx->blocksize = 1088 / 8; + ctx->outlen = 0; + break; default: BUG(); } @@ -682,6 +711,17 @@ sha3_512_init (void *context, unsigned int flags) keccak_init (GCRY_MD_SHA3_512, context, flags); } +static void +shake128_init (void *context, unsigned int flags) +{ + keccak_init (GCRY_MD_SHAKE128, context, flags); +} + +static void +shake256_init (void *context, unsigned int flags) +{ + keccak_init (GCRY_MD_SHAKE256, context, flags); +} /* The routine final terminates the computation and * returns the digest. @@ -696,7 +736,7 @@ keccak_final (void *context) KECCAK_CONTEXT *ctx = context; KECCAK_STATE *hd = &ctx->state; const size_t bsize = ctx->blocksize; - const byte suffix = SHA3_DELIMITED_SUFFIX; + const byte suffix = ctx->suffix; unsigned int nburn, burn = 0; unsigned int lastbytes; byte lane[8]; @@ -716,21 +756,21 @@ keccak_final (void *context) nburn = ctx->ops->absorb(&ctx->state, (bsize - 1) / 8, lane, 1, -1); burn = nburn > burn ? nburn : burn; - /* Switch to the squeezing phase. */ - nburn = ctx->ops->permute(hd); - burn = nburn > burn ? nburn : burn; - - /* Squeeze out all the output blocks */ - if (ctx->outlen < bsize) + if (suffix == SHA3_DELIMITED_SUFFIX) { - /* Output SHA3 digest. */ - nburn = ctx->ops->extract_inplace(hd, ctx->outlen); + /* Switch to the squeezing phase. */ + nburn = ctx->ops->permute(hd); + burn = nburn > burn ? nburn : burn; + + /* Squeeze out the SHA3 digest. */ + nburn = ctx->ops->extract(hd, 0, (void *)hd, ctx->outlen); burn = nburn > burn ? nburn : burn; } else { - /* Output SHAKE digest. */ - BUG(); + /* Output for SHAKE can now be read with md_extract(). */ + + ctx->count = 0; } wipememory(lane, sizeof(lane)); @@ -748,6 +788,124 @@ keccak_read (void *context) } +static void +keccak_extract (void *context, void *out, size_t outlen) +{ + KECCAK_CONTEXT *ctx = context; + KECCAK_STATE *hd = &ctx->state; + const size_t bsize = ctx->blocksize; + unsigned int nburn, burn = 0; + byte *outbuf = out; + unsigned int nlanes; + unsigned int nleft; + unsigned int count; + unsigned int i; + byte lane[8]; + + count = ctx->count; + + while (count && outlen && (outlen < 8 || count % 8)) + { + /* Extract partial lane. */ + nburn = ctx->ops->extract(hd, count / 8, lane, 8); + burn = nburn > burn ? nburn : burn; + + for (i = count % 8; outlen && i < 8; i++) + { + *outbuf++ = lane[i]; + outlen--; + count++; + } + + gcry_assert(count <= bsize); + + if (count == bsize) + count = 0; + } + + if (outlen >= 8 && count) + { + /* Extract tail of partial block. */ + nlanes = outlen / 8; + nleft = (bsize - count) / 8; + nlanes = nlanes < nleft ? nlanes : nleft; + + nburn = ctx->ops->extract(hd, count / 8, outbuf, nlanes * 8); + burn = nburn > burn ? nburn : burn; + outlen -= nlanes * 8; + outbuf += nlanes * 8; + count += nlanes * 8; + + gcry_assert(count <= bsize); + + if (count == bsize) + count = 0; + } + + while (outlen >= bsize) + { + gcry_assert(count == 0); + + /* Squeeze more. */ + nburn = ctx->ops->permute(hd); + burn = nburn > burn ? nburn : burn; + + /* Extract full block. */ + nburn = ctx->ops->extract(hd, 0, outbuf, bsize); + burn = nburn > burn ? nburn : burn; + + outlen -= bsize; + outbuf += bsize; + } + + if (outlen) + { + gcry_assert(outlen < bsize); + + if (count == 0) + { + /* Squeeze more. */ + nburn = ctx->ops->permute(hd); + burn = nburn > burn ? nburn : burn; + } + + if (outlen >= 8) + { + /* Extract head of partial block. */ + nlanes = outlen / 8; + nburn = ctx->ops->extract(hd, count / 8, outbuf, nlanes * 8); + burn = nburn > burn ? nburn : burn; + outlen -= nlanes * 8; + outbuf += nlanes * 8; + count += nlanes * 8; + + gcry_assert(count < bsize); + } + + if (outlen) + { + /* Extract head of partial lane. */ + nburn = ctx->ops->extract(hd, count / 8, lane, 8); + burn = nburn > burn ? nburn : burn; + + for (i = count % 8; outlen && i < 8; i++) + { + *outbuf++ = lane[i]; + outlen--; + count++; + } + + gcry_assert(count < bsize); + } + } + + ctx->count = count; + + if (burn) + _gcry_burn_stack (burn); +} + + /* Self-test section. @@ -829,6 +987,32 @@ selftests_keccak (int algo, int extended, selftest_report_func_t report) "\xa8\xaa\x18\xac\xe8\x28\x2a\x0e\x0d\xb5\x96\xc9\x0b\x0a\x7b\x87"; hash_len = 64; break; + + case GCRY_MD_SHAKE128: + short_hash = + "\x58\x81\x09\x2d\xd8\x18\xbf\x5c\xf8\xa3\xdd\xb7\x93\xfb\xcb\xa7" + "\x40\x97\xd5\xc5\x26\xa6\xd3\x5f\x97\xb8\x33\x51\x94\x0f\x2c\xc8"; + long_hash = + "\x7b\x6d\xf6\xff\x18\x11\x73\xb6\xd7\x89\x8d\x7f\xf6\x3f\xb0\x7b" + "\x7c\x23\x7d\xaf\x47\x1a\x5a\xe5\x60\x2a\xdb\xcc\xef\x9c\xcf\x4b"; + one_million_a_hash = + "\x9d\x22\x2c\x79\xc4\xff\x9d\x09\x2c\xf6\xca\x86\x14\x3a\xa4\x11" + "\xe3\x69\x97\x38\x08\xef\x97\x09\x32\x55\x82\x6c\x55\x72\xef\x58"; + hash_len = 32; + break; + + case GCRY_MD_SHAKE256: + short_hash = + "\x48\x33\x66\x60\x13\x60\xa8\x77\x1c\x68\x63\x08\x0c\xc4\x11\x4d" + "\x8d\xb4\x45\x30\xf8\xf1\xe1\xee\x4f\x94\xea\x37\xe7\x8b\x57\x39"; + long_hash = + "\x98\xbe\x04\x51\x6c\x04\xcc\x73\x59\x3f\xef\x3e\xd0\x35\x2e\xa9" + "\xf6\x44\x39\x42\xd6\x95\x0e\x29\xa3\x72\xa6\x81\xc3\xde\xaf\x45"; + one_million_a_hash = + "\x35\x78\xa7\xa4\xca\x91\x37\x56\x9c\xdf\x76\xed\x61\x7d\x31\xbb" + "\x99\x4f\xca\x9c\x1b\xbf\x8b\x18\x40\x13\xde\x82\x34\xdf\xd1\x3a"; + hash_len = 32; + break; } what = "short string"; @@ -876,6 +1060,8 @@ run_selftests (int algo, int extended, selftest_report_func_t report) case GCRY_MD_SHA3_256: case GCRY_MD_SHA3_384: case GCRY_MD_SHA3_512: + case GCRY_MD_SHAKE128: + case GCRY_MD_SHAKE256: ec = selftests_keccak (algo, extended, report); break; default: @@ -921,7 +1107,22 @@ static gcry_md_oid_spec_t oid_spec_sha3_512[] = { "?" }, { NULL } }; - +static byte shake128_asn[] = { 0x30 }; +static gcry_md_oid_spec_t oid_spec_shake128[] = + { + { "2.16.840.1.101.3.4.2.11" }, + /* PKCS#1 shake128WithRSAEncryption */ + { "?" }, + { NULL } + }; +static byte shake256_asn[] = { 0x30 }; +static gcry_md_oid_spec_t oid_spec_shake256[] = + { + { "2.16.840.1.101.3.4.2.12" }, + /* PKCS#1 shake256WithRSAEncryption */ + { "?" }, + { NULL } + }; gcry_md_spec_t _gcry_digest_spec_sha3_224 = { @@ -955,3 +1156,19 @@ gcry_md_spec_t _gcry_digest_spec_sha3_512 = sizeof (KECCAK_CONTEXT), run_selftests }; +gcry_md_spec_t _gcry_digest_spec_shake128 = + { + GCRY_MD_SHAKE128, {0, 1}, + "SHAKE128", shake128_asn, DIM (shake128_asn), oid_spec_shake128, 0, + shake128_init, keccak_write, keccak_final, NULL, keccak_extract, + sizeof (KECCAK_CONTEXT), + run_selftests + }; +gcry_md_spec_t _gcry_digest_spec_shake256 = + { + GCRY_MD_SHAKE256, {0, 1}, + "SHAKE256", shake256_asn, DIM (shake256_asn), oid_spec_shake256, 0, + shake256_init, keccak_write, keccak_final, NULL, keccak_extract, + sizeof (KECCAK_CONTEXT), + run_selftests + }; diff --git a/cipher/md.c b/cipher/md.c index 6ef8fee..15d944d 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -56,6 +56,8 @@ static gcry_md_spec_t *digest_list[] = &_gcry_digest_spec_sha3_256, &_gcry_digest_spec_sha3_384, &_gcry_digest_spec_sha3_512, + &_gcry_digest_spec_shake128, + &_gcry_digest_spec_shake256, #endif #ifdef USE_GOST_R_3411_94 &_gcry_digest_spec_gost3411_94, diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index facdf65..cdb7644 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -3037,7 +3037,7 @@ are also supported. @c begin table of hash algorithms @cindex SHA-1 @cindex SHA-224, SHA-256, SHA-384, SHA-512 - at cindex SHA3-224, SHA3-256, SHA3-384, SHA3-512 + at cindex SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, SHAKE256 @cindex RIPE-MD-160 @cindex MD2, MD4, MD5 @cindex TIGER, TIGER1, TIGER2 @@ -3126,6 +3126,16 @@ See FIPS 202 for the specification. This is the SHA3-384 algorithm which yields a message digest of 64 bytes. See FIPS 202 for the specification. + at item GCRY_MD_SHAKE128 +This is the SHAKE128 extendable-output function (XOF) algorithm with 128 bit +security strength. +See FIPS 202 for the specification. + + at item GCRY_MD_SHAKE256 +This is the SHAKE256 extendable-output function (XOF) algorithm with 256 bit +security strength. +See FIPS 202 for the specification. + @item GCRY_MD_CRC32 This is the ISO 3309 and ITU-T V.42 cyclic redundancy check. It yields an output of 4 bytes. Note that this is not a hash algorithm in the diff --git a/src/cipher.h b/src/cipher.h index d96fdb9..c4b306a 100644 --- a/src/cipher.h +++ b/src/cipher.h @@ -295,6 +295,8 @@ extern gcry_md_spec_t _gcry_digest_spec_sha3_224; extern gcry_md_spec_t _gcry_digest_spec_sha3_256; extern gcry_md_spec_t _gcry_digest_spec_sha3_512; extern gcry_md_spec_t _gcry_digest_spec_sha3_384; +extern gcry_md_spec_t _gcry_digest_spec_shake128; +extern gcry_md_spec_t _gcry_digest_spec_shake256; extern gcry_md_spec_t _gcry_digest_spec_tiger; extern gcry_md_spec_t _gcry_digest_spec_tiger1; extern gcry_md_spec_t _gcry_digest_spec_tiger2; diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index 39be37a..5ddeee3 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -1174,7 +1174,9 @@ enum gcry_md_algos GCRY_MD_SHA3_224 = 312, GCRY_MD_SHA3_256 = 313, GCRY_MD_SHA3_384 = 314, - GCRY_MD_SHA3_512 = 315 + GCRY_MD_SHA3_512 = 315, + GCRY_MD_SHAKE128 = 316, + GCRY_MD_SHAKE256 = 317 }; /* Flags used with the open function. */ diff --git a/tests/basic.c b/tests/basic.c index 75ff349..0762a89 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5265,13 +5265,15 @@ check_cipher_modes(void) fprintf (stderr, "Completed Cipher Mode checks.\n"); } + static void -check_one_md (int algo, const char *data, int len, const char *expect) +check_one_md (int algo, const char *data, int len, const char *expect, int elen) { gcry_md_hd_t hd, hd2; unsigned char *p; int mdlen; int i; + int xof = 0; gcry_error_t err = 0; err = gcry_md_open (&hd, algo, 0); @@ -5284,8 +5286,15 @@ check_one_md (int algo, const char *data, int len, const char *expect) mdlen = gcry_md_get_algo_dlen (algo); if (mdlen < 1 || mdlen > 500) { - fail ("algo %d, gcry_md_get_algo_dlen failed: %d\n", algo, mdlen); - return; + if (mdlen == 0 && (algo == GCRY_MD_SHAKE128 || algo == GCRY_MD_SHAKE256)) + { + xof = 1; + } + else + { + fail ("algo %d, gcry_md_get_algo_dlen failed: %d\n", algo, mdlen); + return; + } } if (*data == '!' && !data[1]) @@ -5326,19 +5335,168 @@ check_one_md (int algo, const char *data, int len, const char *expect) gcry_md_close (hd); - p = gcry_md_read (hd2, algo); + if (!xof) + { + p = gcry_md_read (hd2, algo); - if (memcmp (p, expect, mdlen)) + if (memcmp (p, expect, mdlen)) + { + printf ("computed: "); + for (i = 0; i < mdlen; i++) + printf ("%02x ", p[i] & 0xFF); + printf ("\nexpected: "); + for (i = 0; i < mdlen; i++) + printf ("%02x ", expect[i] & 0xFF); + printf ("\n"); + + fail ("algo %d, digest mismatch\n", algo); + } + + } + else { - printf ("computed: "); - for (i = 0; i < mdlen; i++) - printf ("%02x ", p[i] & 0xFF); - printf ("\nexpected: "); - for (i = 0; i < mdlen; i++) - printf ("%02x ", expect[i] & 0xFF); - printf ("\n"); + char buf[1000]; + int outmax = sizeof(buf) > elen ? elen : sizeof(buf); - fail ("algo %d, digest mismatch\n", algo); + err = gcry_md_copy (&hd, hd2); + if (err) + { + fail ("algo %d, gcry_md_copy failed: %s\n", algo, gpg_strerror (err)); + } + + err = gcry_md_extract(hd2, algo, buf, outmax); + if (err) + { + fail ("algo %d, gcry_md_extract failed: %s\n", algo, gpg_strerror (err)); + } + + if (memcmp (buf, expect, outmax)) + { + printf ("computed: "); + for (i = 0; i < outmax; i++) + printf ("%02x ", buf[i] & 0xFF); + printf ("\nexpected: "); + for (i = 0; i < outmax; i++) + printf ("%02x ", expect[i] & 0xFF); + printf ("\n"); + + fail ("algo %d, digest mismatch\n", algo); + } + + memset(buf, 0, sizeof(buf)); + + /* Extract one byte at time. */ + for (i = 0; i < outmax && !err; i++) + err = gcry_md_extract(hd, algo, &buf[i], 1); + if (err) + { + fail ("algo %d, gcry_md_extract failed: %s\n", algo, gpg_strerror (err)); + } + + if (memcmp (buf, expect, outmax)) + { + printf ("computed: "); + for (i = 0; i < outmax; i++) + printf ("%02x ", buf[i] & 0xFF); + printf ("\nexpected: "); + for (i = 0; i < outmax; i++) + printf ("%02x ", expect[i] & 0xFF); + printf ("\n"); + + fail ("algo %d, digest mismatch\n", algo); + } + + if (*data == '!' && !data[1]) + { + int crcalgo = GCRY_MD_RMD160; + gcry_md_hd_t crc1, crc2; + size_t startlen; + size_t piecelen; + size_t left; + const unsigned char *p1, *p2; + int crclen; + + crclen = gcry_md_get_algo_dlen (crcalgo); + + err = gcry_md_open (&crc1, crcalgo, 0); + if (err) + { + fail ("algo %d, crcalgo: %d, gcry_md_open failed: %s\n", algo, + crcalgo, gpg_strerror (err)); + return; + } + + err = gcry_md_open (&crc2, crcalgo, 0); + if (err) + { + fail ("algo %d, crcalgo: %d, gcry_md_open failed: %s\n", algo, + crcalgo, gpg_strerror (err)); + return; + } + + /* Extract large chucks, total 1000000 additional bytes. */ + for (i = 0; i < 1000; i++) + { + err = gcry_md_extract(hd, algo, buf, 1000); + if (!err) + gcry_md_write(crc1, buf, 1000); + } + if (err) + { + fail ("algo %d, gcry_md_extract failed: %s\n", algo, + gpg_strerror (err)); + } + + /* Extract in odd size chunks, total 1000000 additional bytes. */ + left = 1000 * 1000; + startlen = 1; + piecelen = startlen; + + while (!err && left > 0) + { + if (piecelen > sizeof(buf)) + piecelen = sizeof(buf); + if (piecelen > left) + piecelen = left; + + err = gcry_md_extract (hd2, algo, buf, piecelen); + if (!err) + gcry_md_write(crc2, buf, piecelen); + if (err) + { + fail ("algo %d, gcry_md_extract failed: %s\n", algo, + gpg_strerror (err)); + } + + left -= piecelen; + + if (piecelen == sizeof(buf)) + piecelen = ++startlen; + else + piecelen = piecelen * 2 - ((piecelen != startlen) ? startlen : 0); + } + + p1 = gcry_md_read (crc1, crcalgo); + p2 = gcry_md_read (crc2, crcalgo); + + if (memcmp (p1, p2, crclen)) + { + printf ("computed: "); + for (i = 0; i < crclen; i++) + printf ("%02x ", p2[i] & 0xFF); + printf ("\nexpected: "); + for (i = 0; i < crclen; i++) + printf ("%02x ", p1[i] & 0xFF); + printf ("\n"); + + fail ("algo %d, large xof output mismatch\n", algo); + } + + gcry_md_close (crc1); + gcry_md_close (crc2); + } + + gcry_md_close (hd); } gcry_md_close (hd2); @@ -5358,6 +5516,9 @@ check_one_md_multi (int algo, const char *data, int len, const char *expect) mdlen = gcry_md_get_algo_dlen (algo); if (mdlen < 1 || mdlen > 64) { + if (mdlen == 0 && (algo == GCRY_MD_SHAKE128 || algo == GCRY_MD_SHAKE256)) + return; + fail ("check_one_md_multi: algo %d, gcry_md_get_algo_dlen failed: %d\n", algo, mdlen); return; @@ -5420,6 +5581,7 @@ check_digests (void) const char *data; const char *expect; int datalen; + int expectlen; } algos[] = { { GCRY_MD_MD2, "", @@ -5917,7 +6079,238 @@ check_digests (void) #include "./sha3-256.h" #include "./sha3-384.h" #include "./sha3-512.h" - { 0 } + { GCRY_MD_SHAKE128, + "", + "\x7F\x9C\x2B\xA4\xE8\x8F\x82\x7D\x61\x60\x45\x50\x76\x05\x85\x3E" + "\xD7\x3B\x80\x93\xF6\xEF\xBC\x88\xEB\x1A\x6E\xAC\xFA\x66\xEF\x26" + "\x3C\xB1\xEE\xA9\x88\x00\x4B\x93\x10\x3C\xFB\x0A\xEE\xFD\x2A\x68" + "\x6E\x01\xFA\x4A\x58\xE8\xA3\x63\x9C\xA8\xA1\xE3\xF9\xAE\x57\xE2" + "\x35\xB8\xCC\x87\x3C\x23\xDC\x62\xB8\xD2\x60\x16\x9A\xFA\x2F\x75" + "\xAB\x91\x6A\x58\xD9\x74\x91\x88\x35\xD2\x5E\x6A\x43\x50\x85\xB2" + "\xBA\xDF\xD6\xDF\xAA\xC3\x59\xA5\xEF\xBB\x7B\xCC\x4B\x59\xD5\x38" + "\xDF\x9A\x04\x30\x2E\x10\xC8\xBC\x1C\xBF\x1A\x0B\x3A\x51\x20\xEA" + "\x17\xCD\xA7\xCF\xAD\x76\x5F\x56\x23\x47\x4D\x36\x8C\xCC\xA8\xAF" + "\x00\x07\xCD\x9F\x5E\x4C\x84\x9F\x16\x7A\x58\x0B\x14\xAA\xBD\xEF" + "\xAE\xE7\xEE\xF4\x7C\xB0\xFC\xA9\x76\x7B\xE1\xFD\xA6\x94\x19\xDF" + "\xB9\x27\xE9\xDF\x07\x34\x8B\x19\x66\x91\xAB\xAE\xB5\x80\xB3\x2D" + "\xEF\x58\x53\x8B\x8D\x23\xF8\x77\x32\xEA\x63\xB0\x2B\x4F\xA0\xF4" + "\x87\x33\x60\xE2\x84\x19\x28\xCD\x60\xDD\x4C\xEE\x8C\xC0\xD4\xC9" + "\x22\xA9\x61\x88\xD0\x32\x67\x5C\x8A\xC8\x50\x93\x3C\x7A\xFF\x15" + "\x33\xB9\x4C\x83\x4A\xDB\xB6\x9C\x61\x15\xBA\xD4\x69\x2D\x86\x19" + "\xF9\x0B\x0C\xDF\x8A\x7B\x9C\x26\x40\x29\xAC\x18\x5B\x70\xB8\x3F" + "\x28\x01\xF2\xF4\xB3\xF7\x0C\x59\x3E\xA3\xAE\xEB\x61\x3A\x7F\x1B" + "\x1D\xE3\x3F\xD7\x50\x81\xF5\x92\x30\x5F\x2E\x45\x26\xED\xC0\x96" + "\x31\xB1\x09\x58\xF4\x64\xD8\x89\xF3\x1B\xA0\x10\x25\x0F\xDA\x7F" + "\x13\x68\xEC\x29\x67\xFC\x84\xEF\x2A\xE9\xAF\xF2\x68\xE0\xB1\x70" + "\x0A\xFF\xC6\x82\x0B\x52\x3A\x3D\x91\x71\x35\xF2\xDF\xF2\xEE\x06" + "\xBF\xE7\x2B\x31\x24\x72\x1D\x4A\x26\xC0\x4E\x53\xA7\x5E\x30\xE7" + "\x3A\x7A\x9C\x4A\x95\xD9\x1C\x55\xD4\x95\xE9\xF5\x1D\xD0\xB5\xE9" + "\xD8\x3C\x6D\x5E\x8C\xE8\x03\xAA\x62\xB8\xD6\x54\xDB\x53\xD0\x9B" + "\x8D\xCF\xF2\x73\xCD\xFE\xB5\x73\xFA\xD8\xBC\xD4\x55\x78\xBE\xC2" + "\xE7\x70\xD0\x1E\xFD\xE8\x6E\x72\x1A\x3F\x7C\x6C\xCE\x27\x5D\xAB" + "\xE6\xE2\x14\x3F\x1A\xF1\x8D\xA7\xEF\xDD\xC4\xC7\xB7\x0B\x5E\x34" + "\x5D\xB9\x3C\xC9\x36\xBE\xA3\x23\x49\x1C\xCB\x38\xA3\x88\xF5\x46" + "\xA9\xFF\x00\xDD\x4E\x13\x00\xB9\xB2\x15\x3D\x20\x41\xD2\x05\xB4" + "\x43\xE4\x1B\x45\xA6\x53\xF2\xA5\xC4\x49\x2C\x1A\xDD\x54\x45\x12" + "\xDD\xA2\x52\x98\x33\x46\x2B\x71\xA4\x1A\x45\xBE\x97\x29\x0B\x6F", + 0, 512, }, + { GCRY_MD_SHAKE128, + "\x5A\xAB\x62\x75\x6D\x30\x7A\x66\x9D\x14\x6A\xBA\x98\x8D\x90\x74" + "\xC5\xA1\x59\xB3\xDE\x85\x15\x1A\x81\x9B\x11\x7C\xA1\xFF\x65\x97" + "\xF6\x15\x6E\x80\xFD\xD2\x8C\x9C\x31\x76\x83\x51\x64\xD3\x7D\xA7" + "\xDA\x11\xD9\x4E\x09\xAD\xD7\x70\xB6\x8A\x6E\x08\x1C\xD2\x2C\xA0" + "\xC0\x04\xBF\xE7\xCD\x28\x3B\xF4\x3A\x58\x8D\xA9\x1F\x50\x9B\x27" + "\xA6\x58\x4C\x47\x4A\x4A\x2F\x3E\xE0\xF1\xF5\x64\x47\x37\x92\x40" + "\xA5\xAB\x1F\xB7\x7F\xDC\xA4\x9B\x30\x5F\x07\xBA\x86\xB6\x27\x56" + "\xFB\x9E\xFB\x4F\xC2\x25\xC8\x68\x45\xF0\x26\xEA\x54\x20\x76\xB9" + "\x1A\x0B\xC2\xCD\xD1\x36\xE1\x22\xC6\x59\xBE\x25\x9D\x98\xE5\x84" + "\x1D\xF4\xC2\xF6\x03\x30\xD4\xD8\xCD\xEE\x7B\xF1\xA0\xA2\x44\x52" + "\x4E\xEC\xC6\x8F\xF2\xAE\xF5\xBF\x00\x69\xC9\xE8\x7A\x11\xC6\xE5" + "\x19\xDE\x1A\x40\x62\xA1\x0C\x83\x83\x73\x88\xF7\xEF\x58\x59\x8A" + "\x38\x46\xF4\x9D\x49\x96\x82\xB6\x83\xC4\xA0\x62\xB4\x21\x59\x4F" + "\xAF\xBC\x13\x83\xC9\x43\xBA\x83\xBD\xEF\x51\x5E\xFC\xF1\x0D", + "\xF0\x71\x5D\xE3\x56\x92\xFD\x70\x12\x3D\xC6\x83\x68\xD0\xFE\xEC" + "\x06\xA0\xC7\x4C\xF8\xAD\xB0\x5D\xDC\x25\x54\x87\xB1\xA8\xD4\xD1" + "\x21\x3E\x9E\xAB\xAF\x41\xF1\x16\x17\x19\xD0\x65\xD7\x94\xB7\x50" + "\xF8\x4B\xE3\x2A\x32\x34\xB4\xD5\x36\x46\x0D\x55\x20\x68\x8A\x5A" + "\x79\xA1\x7A\x4B\xA8\x98\x7F\xCB\x61\xBF\x7D\xAA\x8B\x54\x7B\xF5" + "\xC1\xCE\x36\xB5\x6A\x73\x25\x7D\xBB\xF1\xBA\xBB\x64\xF2\x49\xBD" + "\xCE\xB6\x7B\xA1\xC8\x88\x37\x0A\x96\x3D\xFD\x6B\x6A\x2A\xDE\x2C" + "\xEF\xD1\x4C\x32\x52\xCB\x37\x58\x52\x0F\x0C\x65\xF4\x52\x46\x82" + "\x77\x24\x99\x46\x3A\xE1\xA3\x41\x80\x01\x83\xAA\x60\xEF\xA0\x51" + "\x18\xA2\x82\x01\x74\x4F\x7B\xA0\xB0\xA3\x92\x8D\xD7\xC0\x26\x3F" + "\xD2\x64\xB7\xCD\x7B\x2E\x2E\x09\xB3\x22\xBF\xCE\xA8\xEE\xD0\x42" + "\x75\x79\x5B\xE7\xC0\xF0\x0E\x11\x38\x27\x37\x0D\x05\x1D\x50\x26" + "\x95\x80\x30\x00\x05\xAC\x12\x88\xFE\xA6\xCD\x9A\xE9\xF4\xF3\x7C" + "\xE0\xF8\xAC\xE8\xBF\x3E\xBE\x1D\x70\x56\x25\x59\x54\xC7\x61\x93" + "\x1D\x3C\x42\xED\x62\xF7\xF1\xCE\x1B\x94\x5C\xDE\xCC\x0A\x74\x32" + "\x2D\x7F\x64\xD6\x00\x4F\xF2\x16\x84\x14\x93\x07\x28\x8B\x44\x8E" + "\x45\x43\x34\x75\xB1\xEA\x13\x14\xB0\x0F\x1F\xC4\x50\x08\x9A\x9D" + "\x1F\x77\x10\xC6\xD7\x65\x2E\xCF\x65\x4F\x3B\x48\x7D\x02\x83\xD4" + "\xD8\xA2\x8E\xFB\x50\x66\xC4\x25\x0D\x5A\xD6\x98\xE1\x5D\xBA\x88" + "\xE9\x25\xE4\xDE\x99\xB6\x9B\xC3\x83\xAC\x80\x45\xB7\xF1\x02\x2A" + "\xDD\x39\xD4\x43\x54\x6A\xE0\x92\x4F\x13\xF4\x89\x60\x96\xDF\xDF" + "\x37\xCA\x72\x20\x79\x87\xC4\xA7\x70\x5A\x7A\xBE\x72\x4B\x7F\xA1" + "\x0C\x90\x9F\x39\x25\x44\x9F\x01\x0D\x61\xE2\x07\xAD\xD9\x52\x19" + "\x07\x1A\xCE\xED\xB9\xB9\xDC\xED\x32\xA9\xE1\x23\x56\x1D\x60\x82" + "\xD4\x6A\xEF\xAE\x07\xEE\x1B\xD1\x32\x76\x5E\x3E\x51\x3C\x66\x50" + "\x1B\x38\x7A\xB2\xEE\x09\xA0\x4A\xE6\x3E\x25\x80\x85\x17\xAF\xEA" + "\x3E\x05\x11\x69\xCF\xD2\xFF\xF8\xC5\x85\x8E\x2D\x96\x23\x89\x7C" + "\x9E\x85\x17\x5A\xC5\xA8\x63\x94\xCD\x0A\x32\xA0\xA6\x2A\x8F\x5D" + "\x6C\xCC\xBF\x49\x3D\xAA\x43\xF7\x83\x62\xBB\xCA\x40\xAD\xF7\x33" + "\xF8\x71\xE0\xC0\x09\x98\xD9\xBF\xD6\x88\x06\x56\x66\x6C\xD7\xBE" + "\x4F\xE9\x89\x2C\x61\xDC\xD5\xCD\x23\xA5\xE4\x27\x7E\xEE\x8B\x4A" + "\xFD\x29\xB6\x9B\xBA\x55\x66\x0A\x21\x71\x12\xFF\x6E\x34\x56\xB1", + 223, 512, }, + { GCRY_MD_SHAKE128, + "!", + "\x9d\x22\x2c\x79\xc4\xff\x9d\x09\x2c\xf6\xca\x86\x14\x3a\xa4\x11" + "\xe3\x69\x97\x38\x08\xef\x97\x09\x32\x55\x82\x6c\x55\x72\xef\x58" + "\x42\x4c\x4b\x5c\x28\x47\x5f\xfd\xcf\x98\x16\x63\x86\x7f\xec\x63" + "\x21\xc1\x26\x2e\x38\x7b\xcc\xf8\xca\x67\x68\x84\xc4\xa9\xd0\xc1" + "\x3b\xfa\x68\x69\x76\x3d\x5a\xe4\xbb\xc9\xb3\xcc\xd0\x9d\x1c\xa5" + "\xea\x74\x46\x53\x8d\x69\xb3\xfb\x98\xc7\x2b\x59\xa2\xb4\x81\x7d" + "\xb5\xea\xdd\x90\x11\xf9\x0f\xa7\x10\x91\x93\x1f\x81\x34\xf4\xf0" + "\x0b\x56\x2e\x2f\xe1\x05\x93\x72\x70\x36\x1c\x19\x09\x86\x2a\xd4" + "\x50\x46\xe3\x93\x2f\x5d\xd3\x11\xec\x72\xfe\xc5\xf8\xfb\x8f\x60" + "\xb4\x5a\x3b\xee\x3f\x85\xbb\xf7\xfc\xed\xc6\xa5\x55\x67\x76\x48" + "\xe0\x65\x4b\x38\x19\x41\xa8\x6b\xd3\xe5\x12\x65\x7b\x0d\x57\xa7" + "\x99\x1f\xc4\x54\x3f\x89\xd8\x29\x04\x92\x22\x2c\xe4\xa3\x3e\x17" + "\x60\x2b\x3b\x99\xc0\x09\xf7\x65\x5f\x87\x53\x5c\xda\xa3\x71\x6f" + "\x58\xc4\x7b\x8a\x15\x7a\xd1\x95\xf0\x28\x09\xf2\x75\x00\xb9\x25" + "\x49\x79\x31\x1c\x6b\xb4\x15\x96\x8c\xd1\x04\x31\x16\x9a\x27\xd5" + "\xa8\xd6\x1e\x13\xa6\xb8\xb7\x7a\xf1\xf8\xb6\xdd\x2e\xef\xde\xa0" + "\x40\x78\x96\x80\x49\x0b\x5e\xdc\xb1\xd3\xe5\x38\xa4\x66\xf7\x57" + "\xad\x71\x8f\xe1\xfd\x9f\xae\xef\xa4\x72\x46\xad\x5e\x36\x7f\x87" + "\xd3\xb4\x85\x0d\x44\x86\xeb\x21\x99\xe9\x4a\x79\x79\xe2\x09\x1a" + "\xbc\xdf\x3b\xc1\x33\x79\xc8\x96\xdc\xeb\x79\xa8\xfd\x08\xf1\x10" + "\x73\xf3\x3e\x3f\x99\x23\x22\xb3\x12\x02\xde\xe2\x34\x33\x0c\xf3" + "\x30\x4a\x58\x8f\x0d\x59\xda\xe4\xe6\x3b\xa2\xac\x3c\xe6\x82\xcc" + "\x19\xd4\xe3\x41\x67\x8c\xc3\xa6\x7a\x47\xc1\x13\xb4\xdb\x89\x0f" + "\x30\xa9\x2a\xa0\x8a\x1f\x6d\xc8\xfb\x64\x63\xf8\x03\x8c\x2b\x40" + "\xb2\x53\x00\x77\xb2\x36\xce\x88\xaf\xcc\xcd\xa0\x8a\xd6\xd7\x5e" + "\xee\x18\x99\xb1\x0c\xd8\x00\xc2\xce\x53\x72\xbf\xf2\x2e\xe3\xa3" + "\x39\xd4\xb9\xc1\xa2\xf5\xf4\xb8\x20\xf6\x87\xe5\x51\x9b\xd0\x5b" + "\x1f\xc5\xda\x0e\xb4\x53\x36\x81\x4f\x48\x13\x2c\x64\x0e\x66\xc3" + "\xa0\x2a\x22\xe6\x35\x98\xf9\x4f\x22\xf3\x51\x84\x11\x04\x46\xb6" + "\x48\xcf\x84\x74\xf3\x0c\x43\xea\xd5\x83\x09\xfb\x25\x90\x16\x09" + "\xe2\x41\x87\xe8\x01\xc8\x09\x56\x1a\x64\x80\x94\x50\xe6\x03\xc4" + "\xa8\x03\x95\x25\xc4\x76\xb5\x8e\x32\xce\x2c\x47\xb3\x7d\xa5\x91", + 0, 512, }, + { GCRY_MD_SHAKE256, + "", + "\x46\xB9\xDD\x2B\x0B\xA8\x8D\x13\x23\x3B\x3F\xEB\x74\x3E\xEB\x24" + "\x3F\xCD\x52\xEA\x62\xB8\x1B\x82\xB5\x0C\x27\x64\x6E\xD5\x76\x2F" + "\xD7\x5D\xC4\xDD\xD8\xC0\xF2\x00\xCB\x05\x01\x9D\x67\xB5\x92\xF6" + "\xFC\x82\x1C\x49\x47\x9A\xB4\x86\x40\x29\x2E\xAC\xB3\xB7\xC4\xBE" + "\x14\x1E\x96\x61\x6F\xB1\x39\x57\x69\x2C\xC7\xED\xD0\xB4\x5A\xE3" + "\xDC\x07\x22\x3C\x8E\x92\x93\x7B\xEF\x84\xBC\x0E\xAB\x86\x28\x53" + "\x34\x9E\xC7\x55\x46\xF5\x8F\xB7\xC2\x77\x5C\x38\x46\x2C\x50\x10" + "\xD8\x46\xC1\x85\xC1\x51\x11\xE5\x95\x52\x2A\x6B\xCD\x16\xCF\x86" + "\xF3\xD1\x22\x10\x9E\x3B\x1F\xDD\x94\x3B\x6A\xEC\x46\x8A\x2D\x62" + "\x1A\x7C\x06\xC6\xA9\x57\xC6\x2B\x54\xDA\xFC\x3B\xE8\x75\x67\xD6" + "\x77\x23\x13\x95\xF6\x14\x72\x93\xB6\x8C\xEA\xB7\xA9\xE0\xC5\x8D" + "\x86\x4E\x8E\xFD\xE4\xE1\xB9\xA4\x6C\xBE\x85\x47\x13\x67\x2F\x5C" + "\xAA\xAE\x31\x4E\xD9\x08\x3D\xAB\x4B\x09\x9F\x8E\x30\x0F\x01\xB8" + "\x65\x0F\x1F\x4B\x1D\x8F\xCF\x3F\x3C\xB5\x3F\xB8\xE9\xEB\x2E\xA2" + "\x03\xBD\xC9\x70\xF5\x0A\xE5\x54\x28\xA9\x1F\x7F\x53\xAC\x26\x6B" + "\x28\x41\x9C\x37\x78\xA1\x5F\xD2\x48\xD3\x39\xED\xE7\x85\xFB\x7F" + "\x5A\x1A\xAA\x96\xD3\x13\xEA\xCC\x89\x09\x36\xC1\x73\xCD\xCD\x0F" + "\xAB\x88\x2C\x45\x75\x5F\xEB\x3A\xED\x96\xD4\x77\xFF\x96\x39\x0B" + "\xF9\xA6\x6D\x13\x68\xB2\x08\xE2\x1F\x7C\x10\xD0\x4A\x3D\xBD\x4E" + "\x36\x06\x33\xE5\xDB\x4B\x60\x26\x01\xC1\x4C\xEA\x73\x7D\xB3\xDC" + "\xF7\x22\x63\x2C\xC7\x78\x51\xCB\xDD\xE2\xAA\xF0\xA3\x3A\x07\xB3" + "\x73\x44\x5D\xF4\x90\xCC\x8F\xC1\xE4\x16\x0F\xF1\x18\x37\x8F\x11" + "\xF0\x47\x7D\xE0\x55\xA8\x1A\x9E\xDA\x57\xA4\xA2\xCF\xB0\xC8\x39" + "\x29\xD3\x10\x91\x2F\x72\x9E\xC6\xCF\xA3\x6C\x6A\xC6\xA7\x58\x37" + "\x14\x30\x45\xD7\x91\xCC\x85\xEF\xF5\xB2\x19\x32\xF2\x38\x61\xBC" + "\xF2\x3A\x52\xB5\xDA\x67\xEA\xF7\xBA\xAE\x0F\x5F\xB1\x36\x9D\xB7" + "\x8F\x3A\xC4\x5F\x8C\x4A\xC5\x67\x1D\x85\x73\x5C\xDD\xDB\x09\xD2" + "\xB1\xE3\x4A\x1F\xC0\x66\xFF\x4A\x16\x2C\xB2\x63\xD6\x54\x12\x74" + "\xAE\x2F\xCC\x86\x5F\x61\x8A\xBE\x27\xC1\x24\xCD\x8B\x07\x4C\xCD" + "\x51\x63\x01\xB9\x18\x75\x82\x4D\x09\x95\x8F\x34\x1E\xF2\x74\xBD" + "\xAB\x0B\xAE\x31\x63\x39\x89\x43\x04\xE3\x58\x77\xB0\xC2\x8A\x9B" + "\x1F\xD1\x66\xC7\x96\xB9\xCC\x25\x8A\x06\x4A\x8F\x57\xE2\x7F\x2A", + 0, 512, }, + { GCRY_MD_SHAKE256, + "\xB3\x2D\x95\xB0\xB9\xAA\xD2\xA8\x81\x6D\xE6\xD0\x6D\x1F\x86\x00" + "\x85\x05\xBD\x8C\x14\x12\x4F\x6E\x9A\x16\x3B\x5A\x2A\xDE\x55\xF8" + "\x35\xD0\xEC\x38\x80\xEF\x50\x70\x0D\x3B\x25\xE4\x2C\xC0\xAF\x05" + "\x0C\xCD\x1B\xE5\xE5\x55\xB2\x30\x87\xE0\x4D\x7B\xF9\x81\x36\x22" + "\x78\x0C\x73\x13\xA1\x95\x4F\x87\x40\xB6\xEE\x2D\x3F\x71\xF7\x68" + "\xDD\x41\x7F\x52\x04\x82\xBD\x3A\x08\xD4\xF2\x22\xB4\xEE\x9D\xBD" + "\x01\x54\x47\xB3\x35\x07\xDD\x50\xF3\xAB\x42\x47\xC5\xDE\x9A\x8A" + "\xBD\x62\xA8\xDE\xCE\xA0\x1E\x3B\x87\xC8\xB9\x27\xF5\xB0\x8B\xEB" + "\x37\x67\x4C\x6F\x8E\x38\x0C\x04", + "\xCC\x2E\xAA\x04\xEE\xF8\x47\x9C\xDA\xE8\x56\x6E\xB8\xFF\xA1\x10" + "\x0A\x40\x79\x95\xBF\x99\x9A\xE9\x7E\xDE\x52\x66\x81\xDC\x34\x90" + "\x61\x6F\x28\x44\x2D\x20\xDA\x92\x12\x4C\xE0\x81\x58\x8B\x81\x49" + "\x1A\xED\xF6\x5C\xAA\xF0\xD2\x7E\x82\xA4\xB0\xE1\xD1\xCA\xB2\x38" + "\x33\x32\x8F\x1B\x8D\xA4\x30\xC8\xA0\x87\x66\xA8\x63\x70\xFA\x84" + "\x8A\x79\xB5\x99\x8D\xB3\xCF\xFD\x05\x7B\x96\xE1\xE2\xEE\x0E\xF2" + "\x29\xEC\xA1\x33\xC1\x55\x48\xF9\x83\x99\x02\x04\x37\x30\xE4\x4B" + "\xC5\x2C\x39\xFA\xDC\x1D\xDE\xEA\xD9\x5F\x99\x39\xF2\x20\xCA\x30" + "\x06\x61\x54\x0D\xF7\xED\xD9\xAF\x37\x8A\x5D\x4A\x19\xB2\xB9\x3E" + "\x6C\x78\xF4\x9C\x35\x33\x43\xA0\xB5\xF1\x19\x13\x2B\x53\x12\xD0" + "\x04\x83\x1D\x01\x76\x9A\x31\x6D\x2F\x51\xBF\x64\xCC\xB2\x0A\x21" + "\xC2\xCF\x7A\xC8\xFB\x6F\x6E\x90\x70\x61\x26\xBD\xAE\x06\x11\xDD" + "\x13\x96\x2E\x8B\x53\xD6\xEA\xE2\x6C\x7B\x0D\x25\x51\xDA\xF6\x24" + "\x8E\x9D\x65\x81\x73\x82\xB0\x4D\x23\x39\x2D\x10\x8E\x4D\x34\x43" + "\xDE\x5A\xDC\x72\x73\xC7\x21\xA8\xF8\x32\x0E\xCF\xE8\x17\x7A\xC0" + "\x67\xCA\x8A\x50\x16\x9A\x6E\x73\x00\x0E\xBC\xDC\x1E\x4E\xE6\x33" + "\x9F\xC8\x67\xC3\xD7\xAE\xAB\x84\x14\x63\x98\xD7\xBA\xDE\x12\x1D" + "\x19\x89\xFA\x45\x73\x35\x56\x4E\x97\x57\x70\xA3\xA0\x02\x59\xCA" + "\x08\x70\x61\x08\x26\x1A\xA2\xD3\x4D\xE0\x0F\x8C\xAC\x7D\x45\xD3" + "\x5E\x5A\xA6\x3E\xA6\x9E\x1D\x1A\x2F\x7D\xAB\x39\x00\xD5\x1E\x0B" + "\xC6\x53\x48\xA2\x55\x54\x00\x70\x39\xA5\x2C\x3C\x30\x99\x80\xD1" + "\x7C\xAD\x20\xF1\x15\x63\x10\xA3\x9C\xD3\x93\x76\x0C\xFE\x58\xF6" + "\xF8\xAD\xE4\x21\x31\x28\x82\x80\xA3\x5E\x1D\xB8\x70\x81\x83\xB9" + "\x1C\xFA\xF5\x82\x7E\x96\xB0\xF7\x74\xC4\x50\x93\xB4\x17\xAF\xF9" + "\xDD\x64\x17\xE5\x99\x64\xA0\x1B\xD2\xA6\x12\xFF\xCF\xBA\x18\xA0" + "\xF1\x93\xDB\x29\x7B\x9A\x6C\xC1\xD2\x70\xD9\x7A\xAE\x8F\x8A\x3A" + "\x6B\x26\x69\x5A\xB6\x64\x31\xC2\x02\xE1\x39\xD6\x3D\xD3\xA2\x47" + "\x78\x67\x6C\xEF\xE3\xE2\x1B\x02\xEC\x4E\x8F\x5C\xFD\x66\x58\x7A" + "\x12\xB4\x40\x78\xFC\xD3\x9E\xEE\x44\xBB\xEF\x4A\x94\x9A\x63\xC0" + "\xDF\xD5\x8C\xF2\xFB\x2C\xD5\xF0\x02\xE2\xB0\x21\x92\x66\xCF\xC0" + "\x31\x81\x74\x86\xDE\x70\xB4\x28\x5A\x8A\x70\xF3\xD3\x8A\x61\xD3" + "\x15\x5D\x99\xAA\xF4\xC2\x53\x90\xD7\x36\x45\xAB\x3E\x8D\x80\xF0", + 136, 512, }, + { GCRY_MD_SHAKE256, + "!", + "\x35\x78\xa7\xa4\xca\x91\x37\x56\x9c\xdf\x76\xed\x61\x7d\x31\xbb" + "\x99\x4f\xca\x9c\x1b\xbf\x8b\x18\x40\x13\xde\x82\x34\xdf\xd1\x3a" + "\x3f\xd1\x24\xd4\xdf\x76\xc0\xa5\x39\xee\x7d\xd2\xf6\xe1\xec\x34" + "\x61\x24\xc8\x15\xd9\x41\x0e\x14\x5e\xb5\x61\xbc\xd9\x7b\x18\xab" + "\x6c\xe8\xd5\x55\x3e\x0e\xab\x3d\x1f\x7d\xfb\x8f\x9d\xee\xfe\x16" + "\x84\x7e\x21\x92\xf6\xf6\x1f\xb8\x2f\xb9\x0d\xde\x60\xb1\x90\x63" + "\xc5\x6a\x4c\x55\xcd\xd7\xb6\x72\xb7\x5b\xf5\x15\xad\xbf\xe2\x04" + "\x90\x3c\x8c\x00\x36\xde\x54\xa2\x99\x9a\x92\x0d\xe9\x0f\x66\xd7" + "\xff\x6e\xc8\xe4\xc9\x3d\x24\xae\x34\x6f\xdc\xb3\xa5\xa5\xbd\x57" + "\x39\xec\x15\xa6\xed\xdb\x5c\xe5\xb0\x2d\xa5\x30\x39\xfa\xc6\x3e" + "\x19\x55\x5f\xaa\x2e\xdd\xc6\x93\xb1\xf0\xc2\xa6\xfc\xbe\x7c\x0a" + "\x0a\x09\x1d\x0e\xe7\x00\xd7\x32\x2e\x4b\x0f\xf0\x95\x90\xde\x16" + "\x64\x22\xf9\xea\xd5\xda\x4c\x99\x3d\x60\x5f\xe4\xd9\xc6\x34\x84" + "\x3a\xa1\x78\xb1\x76\x72\xc6\x56\x8c\x8a\x2e\x62\xab\xeb\xea\x2c" + "\x21\xc3\x02\xbd\x36\x6a\xd6\x98\x95\x9e\x1f\x6e\x43\x4a\xf1\x55" + "\x56\x8b\x27\x34\xd8\x37\x9f\xcd\x3f\xfe\x64\x89\xba\xff\xa6\xd7" + "\x11\x09\x44\x2e\x1b\x34\x4f\x13\x8a\x09\xca\xe3\xe2\xd3\x94\x2e" + "\xee\x82\x8f\xc4\x7e\x64\xde\xb5\xe0\x0a\x02\x4a\xe1\xf2\xc0\x77" + "\xe6\xb7\xb1\x33\xf6\xc1\xde\x91\x30\x92\xd4\xe8\x29\xec\xd2\xb2" + "\xef\x28\xca\x80\x20\x82\x1e\x2b\x8b\xe5\x17\xd9\x3e\xd0\x88\x36" + "\xf6\xf0\x66\xcc\x3d\x03\xb6\x25\xd8\x49\x7f\x29\xdb\xc1\xc3\x9e" + "\x6f\xe4\x63\x22\x6f\x85\xc1\x28\xa2\xc2\x98\x88\x11\x2e\x06\xa9" + "\x9c\x5d\x17\xb2\x5e\x90\x0d\x20\x4f\x39\x72\x31\xcd\xf7\x9c\x31" + "\x34\x46\x53\x2d\xad\x07\xf4\xc0\xbd\x9f\xba\x1d\xd4\x13\xd8\xa7" + "\xe6\xcb\xc0\xa0\x86\x2c\xc7\x69\x23\x9a\x89\xf9\xdb\x08\x5b\x78" + "\xa0\x54\x59\x6a\xd7\x08\x0d\xdf\x96\x01\x9b\x73\x99\xb5\x03\x48" + "\x0e\x5a\x65\xa2\x20\x8d\x74\x72\x4c\x98\x7d\x32\x5e\x9b\x0e\x82" + "\xfe\xcd\x4f\x27\xf3\x13\x5b\x1d\x9e\x27\xb4\x8e\x69\xdd\x6f\x59" + "\x62\xb8\xa6\x3b\x48\x92\x1e\xc8\xee\x53\x86\x9f\x1a\xc1\xc8\x18" + "\x23\x87\xee\x0d\x6c\xfe\xf6\x53\xff\x8b\xf6\x05\xf1\x47\x04\xb7" + "\x1b\xeb\x65\x53\xf2\x81\xfa\x75\x69\x48\xc4\x38\x49\x4b\x19\xb4" + "\xee\x69\xa5\x43\x6b\x22\x2b\xc9\x88\xed\xa4\xac\x60\x00\x24\xc9", + 0, 512, }, + { 0 } }; gcry_error_t err; int i; @@ -5950,7 +6343,7 @@ check_digests (void) check_one_md (algos[i].md, algos[i].data, algos[i].datalen > 0 ? algos[i].datalen : strlen (algos[i].data), - algos[i].expect); + algos[i].expect, algos[i].expectlen); check_one_md_multi (algos[i].md, algos[i].data, algos[i].datalen > 0 ? algos[i].datalen : strlen (algos[i].data), diff --git a/tests/bench-slope.c b/tests/bench-slope.c index 2679556..3a2aa38 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -1651,6 +1651,12 @@ kdf_bench_one (int algo, int subalgo) return; } + if (gcry_md_get_algo_dlen (subalgo) == 0) + { + /* Skip XOFs */ + return; + } + *algo_name = 0; if (algo == GCRY_KDF_PBKDF2)