From cvs at cvs.gnupg.org Fri May 1 15:03:42 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Fri, 01 May 2015 15:03:42 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-188-g124dfce Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 124dfce7c5a2d9405fa2b2832e91ac1267943830 (commit) from f88266c0f868d7bf51a215d5531bb9f2b4dad19e (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 124dfce7c5a2d9405fa2b2832e91ac1267943830 Author: Jussi Kivilinna Date: Thu Apr 30 16:57:57 2015 +0300 Fix buggy RC4 AMD64 assembly and add test to notice similar issues * cipher/arcfour-amd64.S (_gcry_arcfour_amd64): Fix swapped store of 'x' and 'y'. * tests/basic.c (get_algo_mode_blklen): New. (check_one_cipher_core): Add new tests for split buffer input on encryption and decryption. -- Reported-by: Dima Kukulniak Signed-off-by: Jussi Kivilinna diff --git a/cipher/arcfour-amd64.S b/cipher/arcfour-amd64.S index c32cd6f..8b8031a 100644 --- a/cipher/arcfour-amd64.S +++ b/cipher/arcfour-amd64.S @@ -85,8 +85,8 @@ _gcry_arcfour_amd64: .Lfinished: dec %rcx # x-- - movb %dl, (4*256)(%rbp) # key->y = y - movb %cl, (4*256+4)(%rbp) # key->x = x + movb %cl, (4*256)(%rbp) # key->y = y + movb %dl, (4*256+4)(%rbp) # key->x = x pop %rbx pop %rbp ret diff --git a/tests/basic.c b/tests/basic.c index 1175b38..07fd4d0 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -4676,6 +4676,26 @@ check_bulk_cipher_modes (void) } +static unsigned int get_algo_mode_blklen(int algo, int mode) +{ + unsigned int blklen = gcry_cipher_get_algo_blklen(algo); + + /* Some modes override blklen. */ + switch (mode) + { + case GCRY_CIPHER_MODE_STREAM: + case GCRY_CIPHER_MODE_OFB: + case GCRY_CIPHER_MODE_CTR: + case GCRY_CIPHER_MODE_CCM: + case GCRY_CIPHER_MODE_GCM: + case GCRY_CIPHER_MODE_POLY1305: + return 1; + } + + return blklen; +} + + /* The core of the cipher check. In addition to the parameters passed to check_one_cipher it also receives the KEY and the plain data. PASS is printed with error messages. The function returns 0 on @@ -4688,14 +4708,27 @@ check_one_cipher_core (int algo, int mode, int flags, { gcry_cipher_hd_t hd; unsigned char in_buffer[1040+1], out_buffer[1040+1]; + unsigned char enc_result[1040]; unsigned char *in, *out; int keylen; gcry_error_t err = 0; + unsigned int blklen; + unsigned int piecelen; + unsigned int pos; + + blklen = get_algo_mode_blklen(algo, mode); assert (nkey == 32); assert (nplain == 1040); assert (sizeof(in_buffer) == nplain + 1); assert (sizeof(out_buffer) == sizeof(in_buffer)); + assert (blklen > 0); + + if (mode == GCRY_CIPHER_MODE_CBC && (flags & GCRY_CIPHER_CBC_CTS)) + { + /* TODO: examine why CBC with CTS fails. */ + blklen = nplain; + } if (!bufshift) { @@ -4758,6 +4791,8 @@ check_one_cipher_core (int algo, int mode, int flags, return -1; } + memcpy (enc_result, out, nplain); + gcry_cipher_reset (hd); err = gcry_cipher_decrypt (hd, in, nplain, out, nplain); @@ -4787,6 +4822,10 @@ check_one_cipher_core (int algo, int mode, int flags, return -1; } + if (memcmp (enc_result, out, nplain)) + fail ("pass %d, algo %d, mode %d, in-place, encrypt mismatch\n", + pass, algo, mode); + gcry_cipher_reset (hd); err = gcry_cipher_decrypt (hd, out, nplain, NULL, 0); @@ -4803,6 +4842,119 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, in-place, encrypt-decrypt mismatch\n", pass, algo, mode); + /* Again, splitting encryption in multiple operations. */ + gcry_cipher_reset (hd); + + piecelen = blklen; + pos = 0; + while (pos < nplain) + { + if (piecelen > nplain - pos) + piecelen = nplain - pos; + + err = gcry_cipher_encrypt (hd, out + pos, piecelen, plain + pos, + piecelen); + if (err) + { + fail ("pass %d, algo %d, mode %d, split-buffer (pos: %d, " + "piecelen: %d), gcry_cipher_encrypt failed: %s\n", + pass, algo, mode, pos, piecelen, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + + pos += piecelen; + piecelen = piecelen * 2 - ((piecelen != blklen) ? blklen : 0); + } + + if (memcmp (enc_result, out, nplain)) + fail ("pass %d, algo %d, mode %d, split-buffer, encrypt mismatch\n", + pass, algo, mode); + + gcry_cipher_reset (hd); + + piecelen = blklen; + pos = 0; + while (pos < nplain) + { + if (piecelen > nplain - pos) + piecelen = nplain - pos; + + err = gcry_cipher_decrypt (hd, in + pos, piecelen, out + pos, piecelen); + if (err) + { + fail ("pass %d, algo %d, mode %d, split-buffer (pos: %d, " + "piecelen: %d), gcry_cipher_decrypt failed: %s\n", + pass, algo, mode, pos, piecelen, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + + pos += piecelen; + piecelen = piecelen * 2 - ((piecelen != blklen) ? blklen : 0); + } + + if (memcmp (plain, in, nplain)) + fail ("pass %d, algo %d, mode %d, split-buffer, encrypt-decrypt mismatch\n", + pass, algo, mode); + + /* Again, using in-place encryption and splitting encryption in multiple + * operations. */ + gcry_cipher_reset (hd); + + piecelen = blklen; + pos = 0; + while (pos < nplain) + { + if (piecelen > nplain - pos) + piecelen = nplain - pos; + + memcpy (out + pos, plain + pos, piecelen); + err = gcry_cipher_encrypt (hd, out + pos, piecelen, NULL, 0); + if (err) + { + fail ("pass %d, algo %d, mode %d, in-place split-buffer (pos: %d, " + "piecelen: %d), gcry_cipher_encrypt failed: %s\n", + pass, algo, mode, pos, piecelen, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + + pos += piecelen; + piecelen = piecelen * 2 - ((piecelen != blklen) ? blklen : 0); + } + + if (memcmp (enc_result, out, nplain)) + fail ("pass %d, algo %d, mode %d, in-place split-buffer, encrypt mismatch\n", + pass, algo, mode); + + gcry_cipher_reset (hd); + + piecelen = blklen; + pos = 0; + while (pos < nplain) + { + if (piecelen > nplain - pos) + piecelen = nplain - pos; + + err = gcry_cipher_decrypt (hd, out + pos, piecelen, NULL, 0); + if (err) + { + fail ("pass %d, algo %d, mode %d, in-place split-buffer (pos: %d, " + "piecelen: %d), gcry_cipher_decrypt failed: %s\n", + pass, algo, mode, pos, piecelen, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + + pos += piecelen; + piecelen = piecelen * 2 - ((piecelen != blklen) ? blklen : 0); + } + + if (memcmp (plain, out, nplain)) + fail ("pass %d, algo %d, mode %d, in-place split-buffer, encrypt-decrypt" + " mismatch\n", pass, algo, mode); + gcry_cipher_close (hd); ----------------------------------------------------------------------- Summary of changes: cipher/arcfour-amd64.S | 4 +- tests/basic.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 154 insertions(+), 2 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Fri May 1 15:07:57 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 16:07:57 +0300 Subject: [PATCH 1/4] Fix reseting cipher in OCB mode Message-ID: <20150501130757.19146.23786.stgit@localhost6.localdomain6> * cipher/cipher.c (cipher_reset): Setup default taglen for OCB after clearing state. -- Signed-off-by: Jussi Kivilinna --- cipher/cipher.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/cipher/cipher.c b/cipher/cipher.c index 6e1173f..d1550c0 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -744,6 +744,8 @@ cipher_reset (gcry_cipher_hd_t c) case GCRY_CIPHER_MODE_OCB: memset (&c->u_mode.ocb, 0, sizeof c->u_mode.ocb); + /* Setup default taglen. */ + c->u_mode.ocb.taglen = 16; break; default: From jussi.kivilinna at iki.fi Fri May 1 15:08:02 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 16:08:02 +0300 Subject: [PATCH 2/4] Enable more modes in basic ciphers test In-Reply-To: <20150501130757.19146.23786.stgit@localhost6.localdomain6> References: <20150501130757.19146.23786.stgit@localhost6.localdomain6> Message-ID: <20150501130802.19146.68538.stgit@localhost6.localdomain6> * src/gcrypt.h.in (GCRY_OCB_BLOCK_LEN): New. * tests/basic.c (check_one_cipher_core_reset): New. (check_one_cipher_core): Use check_one_cipher_core_reset inplace of gcry_cipher_reset. (check_ciphers): Add CCM and OCB modes for block cipher tests. -- Signed-off-by: Jussi Kivilinna --- src/gcrypt.h.in | 3 ++ tests/basic.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 68 insertions(+), 8 deletions(-) diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index cac2b49..0984d11 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -931,6 +931,9 @@ enum gcry_cipher_flags /* CCM works only with blocks of 128 bits. */ #define GCRY_CCM_BLOCK_LEN (128 / 8) +/* OCB works only with blocks of 128 bits. */ +#define GCRY_OCB_BLOCK_LEN (128 / 8) + /* Create a handle for algorithm ALGO to be used in MODE. FLAGS may be given as an bitwise OR of the gcry_cipher_flags values. */ gcry_error_t gcry_cipher_open (gcry_cipher_hd_t *handle, diff --git a/tests/basic.c b/tests/basic.c index 07fd4d0..f3105de 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -4676,7 +4676,8 @@ check_bulk_cipher_modes (void) } -static unsigned int get_algo_mode_blklen(int algo, int mode) +static unsigned int +get_algo_mode_blklen (int algo, int mode) { unsigned int blklen = gcry_cipher_get_algo_blklen(algo); @@ -4696,6 +4697,48 @@ static unsigned int get_algo_mode_blklen(int algo, int mode) } +static int +check_one_cipher_core_reset (gcry_cipher_hd_t hd, int algo, int mode, int pass, + int nplain) +{ + static const unsigned char iv[8] = { 0, 1, 2, 3, 4, 5, 6, 7 }; + u64 ctl_params[3]; + int err; + + gcry_cipher_reset (hd); + + if (mode == GCRY_CIPHER_MODE_OCB || mode == GCRY_CIPHER_MODE_CCM) + { + err = gcry_cipher_setiv (hd, iv, sizeof(iv)); + if (err) + { + fail ("pass %d, algo %d, mode %d, gcry_cipher_setiv failed: %s\n", + pass, algo, mode, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + } + + if (mode == GCRY_CIPHER_MODE_CCM) + { + ctl_params[0] = nplain; /* encryptedlen */ + ctl_params[1] = 0; /* aadlen */ + ctl_params[2] = 16; /* authtaglen */ + err = gcry_cipher_ctl (hd, GCRYCTL_SET_CCM_LENGTHS, ctl_params, + sizeof(ctl_params)); + if (err) + { + fail ("pass %d, algo %d, mode %d, gcry_cipher_ctl " + "GCRYCTL_SET_CCM_LENGTHS failed: %s\n", + pass, algo, mode, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + } + + return 0; +} + /* The core of the cipher check. In addition to the parameters passed to check_one_cipher it also receives the KEY and the plain data. PASS is printed with error messages. The function returns 0 on @@ -4782,6 +4825,9 @@ check_one_cipher_core (int algo, int mode, int flags, return -1; } + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; + err = gcry_cipher_encrypt (hd, out, nplain, plain, nplain); if (err) { @@ -4793,7 +4839,8 @@ check_one_cipher_core (int algo, int mode, int flags, memcpy (enc_result, out, nplain); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; err = gcry_cipher_decrypt (hd, in, nplain, out, nplain); if (err) @@ -4809,7 +4856,8 @@ check_one_cipher_core (int algo, int mode, int flags, pass, algo, mode); /* Again, using in-place encryption. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; memcpy (out, plain, nplain); err = gcry_cipher_encrypt (hd, out, nplain, NULL, 0); @@ -4826,7 +4874,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, in-place, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; err = gcry_cipher_decrypt (hd, out, nplain, NULL, 0); if (err) @@ -4843,7 +4892,8 @@ check_one_cipher_core (int algo, int mode, int flags, pass, algo, mode); /* Again, splitting encryption in multiple operations. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4871,7 +4921,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, split-buffer, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4900,7 +4951,8 @@ check_one_cipher_core (int algo, int mode, int flags, /* Again, using in-place encryption and splitting encryption in multiple * operations. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4928,7 +4980,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, in-place split-buffer, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -5096,8 +5149,12 @@ check_ciphers (void) check_one_cipher (algos[i], GCRY_CIPHER_MODE_CBC, 0); check_one_cipher (algos[i], GCRY_CIPHER_MODE_CBC, GCRY_CIPHER_CBC_CTS); check_one_cipher (algos[i], GCRY_CIPHER_MODE_CTR, 0); + if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_CCM_BLOCK_LEN) + check_one_cipher (algos[i], GCRY_CIPHER_MODE_CCM, 0); if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_GCM_BLOCK_LEN) check_one_cipher (algos[i], GCRY_CIPHER_MODE_GCM, 0); + if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_OCB_BLOCK_LEN) + check_one_cipher (algos[i], GCRY_CIPHER_MODE_OCB, 0); } for (i = 0; algos2[i]; i++) From jussi.kivilinna at iki.fi Fri May 1 15:08:12 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 16:08:12 +0300 Subject: [PATCH 4/4] Add --disable-hwf for basic tests In-Reply-To: <20150501130757.19146.23786.stgit@localhost6.localdomain6> References: <20150501130757.19146.23786.stgit@localhost6.localdomain6> Message-ID: <20150501130812.19146.87154.stgit@localhost6.localdomain6> * tests/basic.c (main): Add handling for '--disable-hwf'. -- Signed-off-by: Jussi Kivilinna --- tests/basic.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tests/basic.c b/tests/basic.c index 8400f9e..2cf8dd0 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -8028,6 +8028,21 @@ main (int argc, char **argv) argc--; argv++; } } + else if (!strcmp (*argv, "--disable-hwf")) + { + argc--; + argv++; + if (argc) + { + if (gcry_control (GCRYCTL_DISABLE_HWF, *argv, NULL)) + fprintf (stderr, + PGM + ": unknown hardware feature `%s' - option ignored\n", + *argv); + argc--; + argv++; + } + } } gcry_control (GCRYCTL_SET_VERBOSITY, (int)verbose); From jussi.kivilinna at iki.fi Fri May 1 15:08:07 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 16:08:07 +0300 Subject: [PATCH 3/4] Use more odd chuck sizes for check_one_md In-Reply-To: <20150501130757.19146.23786.stgit@localhost6.localdomain6> References: <20150501130757.19146.23786.stgit@localhost6.localdomain6> Message-ID: <20150501130807.19146.56016.stgit@localhost6.localdomain6> * tests/basic.c (check_one_md): Make chuck size vary oddly, instead of using fixed length of 1000 bytes. -- Signed-off-by: Jussi Kivilinna --- tests/basic.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/tests/basic.c b/tests/basic.c index f3105de..8400f9e 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5231,11 +5231,29 @@ check_one_md (int algo, const char *data, int len, const char *expect) if (*data == '!' && !data[1]) { /* hash one million times a "a" */ char aaa[1000]; + size_t left = 1000 * 1000; + size_t startlen = 1; + size_t piecelen = startlen; - /* Write in odd size chunks so that we test the buffering. */ memset (aaa, 'a', 1000); - for (i = 0; i < 1000; i++) - gcry_md_write (hd, aaa, 1000); + + /* Write in odd size chunks so that we test the buffering. */ + while (left > 0) + { + if (piecelen > sizeof(aaa)) + piecelen = sizeof(aaa); + if (piecelen > left) + piecelen = left; + + gcry_md_write (hd, aaa, piecelen); + + left -= piecelen; + + if (piecelen == sizeof(aaa)) + piecelen = ++startlen; + else + piecelen = piecelen * 2 - ((piecelen != startlen) ? startlen : 0); + } } else gcry_md_write (hd, data, len); From jussi.kivilinna at iki.fi Fri May 1 17:51:42 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 18:51:42 +0300 Subject: [PATCH] Fix tail handling in buf_xor_1 Message-ID: <20150501155142.23983.44239.stgit@localhost6.localdomain6> * cipher/bufhelp.h (buf_xor_1): Increment source pointer at tail handling. -- Signed-off-by: Jussi Kivilinna --- cipher/bufhelp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index fb87939..c1aa52e 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -162,7 +162,7 @@ do_bytes: #endif /* Handle tail. */ for (; len; len--) - *dst++ ^= *src; + *dst++ ^= *src++; } From jussi.kivilinna at iki.fi Fri May 1 18:09:25 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 19:09:25 +0300 Subject: [PATCH] Fix packed attribute check for Windows targets Message-ID: <20150501160925.4692.80981.stgit@localhost6.localdomain6> * configure.ac (gcry_cv_gcc_attribute_packed): Move 'long b' to its own packed structure. -- Change packed attribute test so that it works with both MS ABI and SYSV ABI. Signed-off-by: Jussi Kivilinna --- configure.ac | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 16f6a21..555ad1e 100644 --- a/configure.ac +++ b/configure.ac @@ -964,7 +964,9 @@ AC_CACHE_CHECK([whether the GCC style packed attribute is supported], [gcry_cv_gcc_attribute_packed], [gcry_cv_gcc_attribute_packed=no AC_COMPILE_IFELSE([AC_LANG_SOURCE( - [[struct foo_s { char a; long b; } __attribute__ ((packed)); + [[struct foolong_s { long b; } __attribute__ ((packed)); + struct foo_s { char a; struct foolong_s b; } + __attribute__ ((packed)); enum bar { FOO = 1 / (sizeof(struct foo_s) == (sizeof(char) + sizeof(long))), };]])], From jussi.kivilinna at iki.fi Fri May 1 19:39:34 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:34 +0300 Subject: [PATCH 1/8] Disable building mpi assembly routines on WIN64 Message-ID: <20150501173934.5385.86854.stgit@localhost6.localdomain6> * mpi/config.links: Disable assembly for host 'x86_64-*mingw32*'. -- Signed-off-by: Jussi Kivilinna --- mpi/config.links | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mpi/config.links b/mpi/config.links index f44299d..d71918a 100644 --- a/mpi/config.links +++ b/mpi/config.links @@ -132,6 +132,11 @@ case "${host}" in path="amd64" mpi_cpu_arch="x86" ;; + x86_64-*mingw32*) + echo '/* No working assembler modules available */' >>./mpi/asm-syntax.h + path="" + mpi_cpu_arch="x86" + ;; x86_64-*-*) echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h From jussi.kivilinna at iki.fi Fri May 1 19:39:39 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:39 +0300 Subject: [PATCH 2/8] Disable GCM and AES-NI assembly implementations for WIN64 In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501173939.5385.17510.stgit@localhost6.localdomain6> * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when __WIN64__ defined. * cipher/rijndael-internal.h (USE_AESNI): Ditto. -- Signed-off-by: Jussi Kivilinna --- cipher/cipher-internal.h | 4 +++- cipher/rijndael-internal.h | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index e20ea56..693f218 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -67,7 +67,9 @@ #if defined(ENABLE_PCLMUL_SUPPORT) && defined(GCM_USE_TABLES) # if ((defined(__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# define GCM_USE_INTEL_PCLMUL 1 +# ifndef __WIN64__ +# define GCM_USE_INTEL_PCLMUL 1 +# endif # endif # endif #endif /* GCM_USE_INTEL_PCLMUL */ diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index 854980b..bd247a9 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -75,7 +75,9 @@ #ifdef ENABLE_AESNI_SUPPORT # if ((defined (__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# define USE_AESNI 1 +# ifndef __WIN64__ +# define USE_AESNI 1 +# endif # endif # endif #endif /* ENABLE_AESNI_SUPPORT */ From jussi.kivilinna at iki.fi Fri May 1 19:39:44 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:44 +0300 Subject: [PATCH 3/8] Prepare random/win32.c fast poll for 64-bit Windows In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501173944.5385.44329.stgit@localhost6.localdomain6> * random/win32.c (_gcry_rndw32_gather_random_fast) [ADD]: Rename to ADDINT. (_gcry_rndw32_gather_random_fast): Add ADDPTR. (_gcry_rndw32_gather_random_fast): Disable entropy gathering from GetQueueStatus(QS_ALLEVENTS). (_gcry_rndw32_gather_random_fast): Change minimumWorkingSetSize and maximumWorkingSetSize to SIZE_T from DWORD. (_gcry_rndw32_gather_random_fast): Only add lower 32-bits of minimumWorkingSetSize and maximumWorkingSetSize to random poll. (_gcry_rndw32_gather_random_fast) [__WIN64__]: Read TSC directly using intrinsic. -- Introduce entropy gatherer changes related to 64-bit Windows platform as done in cryptlib fast poll: - Change ADD macro to ADDPTR/ADDINT to handle pointer values. ADDPTR discards high 32-bits of 64-bit pointer values. - minimum/maximumWorkingSetSize changed to SIZE_T type to avoid stack corruption on 64-bit; only low 32-bits are used for entropy. - Use __rdtsc() intrinsic on 64-bit (as TSC is always available). Signed-off-by: Jussi Kivilinna --- random/rndw32.c | 83 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 52 insertions(+), 31 deletions(-) diff --git a/random/rndw32.c b/random/rndw32.c index c495131..4ab1bca 100644 --- a/random/rndw32.c +++ b/random/rndw32.c @@ -826,39 +826,47 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, cursor position for last message, 1 ms time for last message, handle of window with clipboard open, handle of process heap, handle of procs window station, types of events in input queue, - and milliseconds since Windows was started. */ + and milliseconds since Windows was started. On 64-bit platform + some of these return values are pointers and thus 64-bit wide. + We discard the upper 32-bit of those values. */ { byte buffer[20*sizeof(ulong)], *bufptr; bufptr = buffer; -#define ADD(f) do { ulong along = (ulong)(f); \ - memcpy (bufptr, &along, sizeof (along) ); \ - bufptr += sizeof (along); \ - } while (0) - - ADD ( GetActiveWindow ()); - ADD ( GetCapture ()); - ADD ( GetClipboardOwner ()); - ADD ( GetClipboardViewer ()); - ADD ( GetCurrentProcess ()); - ADD ( GetCurrentProcessId ()); - ADD ( GetCurrentThread ()); - ADD ( GetCurrentThreadId ()); - ADD ( GetDesktopWindow ()); - ADD ( GetFocus ()); - ADD ( GetInputState ()); - ADD ( GetMessagePos ()); - ADD ( GetMessageTime ()); - ADD ( GetOpenClipboardWindow ()); - ADD ( GetProcessHeap ()); - ADD ( GetProcessWindowStation ()); - ADD ( GetQueueStatus (QS_ALLEVENTS)); - ADD ( GetTickCount ()); +#define ADDINT(f) do { ulong along = (ulong)(f); \ + memcpy (bufptr, &along, sizeof (along) ); \ + bufptr += sizeof (along); \ + } while (0) +#define ADDPTR(f) do { void *aptr = (f); \ + ADDINT((SIZE_T)aptr); \ + } while (0) + + ADDPTR ( GetActiveWindow ()); + ADDPTR ( GetCapture ()); + ADDPTR ( GetClipboardOwner ()); + ADDPTR ( GetClipboardViewer ()); + ADDPTR ( GetCurrentProcess ()); + ADDINT ( GetCurrentProcessId ()); + ADDPTR ( GetCurrentThread ()); + ADDINT ( GetCurrentThreadId ()); + ADDPTR ( GetDesktopWindow ()); + ADDPTR ( GetFocus ()); + ADDINT ( GetInputState ()); + ADDINT ( GetMessagePos ()); + ADDINT ( GetMessageTime ()); + ADDPTR ( GetOpenClipboardWindow ()); + ADDPTR ( GetProcessHeap ()); + ADDPTR ( GetProcessWindowStation ()); + /* Following function in some cases stops returning events, and cannot + be used as an entropy source. */ + /*ADDINT ( GetQueueStatus (QS_ALLEVENTS));*/ + ADDINT ( GetTickCount ()); gcry_assert ( bufptr-buffer < sizeof (buffer) ); (*add) ( buffer, bufptr-buffer, origin ); -#undef ADD +#undef ADDINT +#undef ADDPTR } /* Get multiword system information: Current caret position, current @@ -888,7 +896,7 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, { HANDLE handle; FILETIME creationTime, exitTime, kernelTime, userTime; - DWORD minimumWorkingSetSize, maximumWorkingSetSize; + SIZE_T minimumWorkingSetSize, maximumWorkingSetSize; handle = GetCurrentThread (); GetThreadTimes (handle, &creationTime, &exitTime, @@ -910,10 +918,9 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, process. */ GetProcessWorkingSetSize (handle, &minimumWorkingSetSize, &maximumWorkingSetSize); - (*add) ( &minimumWorkingSetSize, - sizeof (minimumWorkingSetSize), origin ); - (*add) ( &maximumWorkingSetSize, - sizeof (maximumWorkingSetSize), origin ); + /* On 64-bit system, discard the high 32-bits. */ + (*add) ( &minimumWorkingSetSize, sizeof (int), origin ); + (*add) ( &maximumWorkingSetSize, sizeof (int), origin ); } @@ -961,7 +968,20 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, To make things unambiguous, we detect a CPU new enough to call RDTSC directly by checking for CPUID capabilities, and fall back to QPC if - this isn't present. */ + this isn't present. + + On AMD64, TSC is always available and intrinsic is provided for accessing + it. */ +#ifdef __WIN64__ + { + unsigned __int64 aint64; + + /* Note: cryptlib does not discard upper 32 bits of TSC on WIN64, but does + * on WIN32. Is this correct? */ + aint64 = __rdtsc(); + (*add) (&aint64, sizeof(aint64), origin); + } +#else #ifdef __GNUC__ /* FIXME: We would need to implement the CPU feature tests first. */ /* if (cpu_has_feature_rdtsc) */ @@ -990,6 +1010,7 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, (*add) (&aword, sizeof (aword), origin ); } } +#endif /*__WIN64__*/ } From jussi.kivilinna at iki.fi Fri May 1 19:39:49 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:49 +0300 Subject: [PATCH 4/8] Fix rndhw for 64-bit Windows build In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501173949.5385.10722.stgit@localhost6.localdomain6> * configure.ac: Add sizeof check for 'void *'. * random/rndhw.c (poll_padlock): Check for SIZEOF_VOID_P == 8 instead of defined(__LP64__). (RDRAND_LONG): Check for SIZEOF_UNSIGNED_LONG == 8 instead of defined(__LP64__). -- __LP64__ is not predefined for 64-bit mingw64-gcc, which caused wrong assembly code selections. Do selection based on type sizes instead, to support x86_64, x32 and win64 properly. Signed-off-by: Jussi Kivilinna --- configure.ac | 1 + random/rndhw.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 555ad1e..594209f 100644 --- a/configure.ac +++ b/configure.ac @@ -344,6 +344,7 @@ AC_CHECK_SIZEOF(unsigned short, 2) AC_CHECK_SIZEOF(unsigned int, 4) AC_CHECK_SIZEOF(unsigned long, 4) AC_CHECK_SIZEOF(unsigned long long, 0) +AC_CHECK_SIZEOF(void *, 0) AC_TYPE_UINTPTR_T diff --git a/random/rndhw.c b/random/rndhw.c index e625512..8e50751 100644 --- a/random/rndhw.c +++ b/random/rndhw.c @@ -69,7 +69,7 @@ poll_padlock (void (*add)(const void*, size_t, enum random_origins), nbytes = 0; while (nbytes < 64) { -#if defined(__x86_64__) && defined(__LP64__) +#if defined(__x86_64__) && SIZEOF_VOID_P == 8 asm volatile ("movq %1, %%rdi\n\t" /* Set buffer. */ "xorq %%rdx, %%rdx\n\t" /* Request up to 8 bytes. */ @@ -123,7 +123,7 @@ poll_padlock (void (*add)(const void*, size_t, enum random_origins), #ifdef USE_DRNG # define RDRAND_RETRY_LOOPS 10 # define RDRAND_INT ".byte 0x0f,0xc7,0xf0" -# if defined(__x86_64__) && defined(__LP64__) +# if defined(__x86_64__) && SIZEOF_UNSIGNED_LONG == 8 # define RDRAND_LONG ".byte 0x48,0x0f,0xc7,0xf0" # else # define RDRAND_LONG RDRAND_INT From jussi.kivilinna at iki.fi Fri May 1 19:39:59 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:59 +0300 Subject: [PATCH 6/8] DES: Silence compiler warnings on Windows In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501173959.5385.34542.stgit@localhost6.localdomain6> * cipher/des.c (working_memcmp): Make pointer arguments 'const void *'. -- Following warning seen on Windows target build: des.c: In function 'is_weak_key': des.c:1019:40: warning: pointer targets in passing argument 1 of 'working_memcmp' differ in signedness [-Wpointer-sign] if ( !(cmp_result=working_memcmp(work, weak_keys[middle], 8)) ) ^ des.c:149:1: note: expected 'const char *' but argument is of type 'unsigned char *' working_memcmp( const char *a, const char *b, size_t n ) ^ des.c:1019:46: warning: pointer targets in passing argument 2 of 'working_memcmp' differ in signedness [-Wpointer-sign] if ( !(cmp_result=working_memcmp(work, weak_keys[middle], 8)) ) ^ des.c:149:1: note: expected 'const char *' but argument is of type 'unsigned char *' working_memcmp( const char *a, const char *b, size_t n ) ^ Signed-off-by: Jussi Kivilinna --- cipher/des.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cipher/des.c b/cipher/des.c index bc2a474..d4863d1 100644 --- a/cipher/des.c +++ b/cipher/des.c @@ -146,8 +146,10 @@ * depending on whether characters are signed or not. */ static int -working_memcmp( const char *a, const char *b, size_t n ) +working_memcmp( const void *_a, const void *_b, size_t n ) { + const char *a = _a; + const char *b = _b; for( ; n; n--, a++, b++ ) if( *a != *b ) return (int)(*(byte*)a) - (int)(*(byte*)b); From jussi.kivilinna at iki.fi Fri May 1 19:40:04 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:40:04 +0300 Subject: [PATCH 7/8] Add W64 support for mpi amd64 assembly In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501174004.5385.36001.stgit@localhost6.localdomain6> acinclude.m4 (GNUPG_SYS_SYMBOL_UNDERSCORE): Set 'ac_cv_sys_symbol_underscore=no' on MingW-W64. mpi/amd64/func_abi.h: New. mpi/amd64/mpih-add1.S (_gcry_mpih_add_n): Add FUNC_ENTRY and FUNC_EXIT. mpi/amd64/mpih-lshift.S (_gcry_mpih_lshift): Ditto. mpi/amd64/mpih-mul1.S (_gcry_mpih_mul_1): Ditto. mpi/amd64/mpih-mul2.S (_gcry_mpih_addmul_1): Ditto. mpi/amd64/mpih-mul3.S (_gcry_mpih_submul_1): Ditto. mpi/amd64/mpih-rshift.S (_gcry_mpih_rshift): Ditto. mpi/amd64/mpih-sub1.S (_gcry_mpih_sub_n): Ditto. mpi/config.links [host=x86_64-*mingw*]: Enable assembly modules. [host=x86_64-*-*]: Append mpi/amd64/func_abi.h to mpi/asm-syntax.h. -- Signed-off-by: Jussi Kivilinna --- acinclude.m4 | 5 ++++- mpi/amd64/func_abi.h | 19 +++++++++++++++++++ mpi/amd64/mpih-add1.S | 2 ++ mpi/amd64/mpih-lshift.S | 2 ++ mpi/amd64/mpih-mul1.S | 2 ++ mpi/amd64/mpih-mul2.S | 2 ++ mpi/amd64/mpih-mul3.S | 3 ++- mpi/amd64/mpih-rshift.S | 2 ++ mpi/amd64/mpih-sub1.S | 2 ++ mpi/config.links | 13 +++++++++---- 10 files changed, 46 insertions(+), 6 deletions(-) create mode 100644 mpi/amd64/func_abi.h diff --git a/acinclude.m4 b/acinclude.m4 index 0791b84..764efd4 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -101,9 +101,12 @@ AC_DEFUN([GNUPG_CHECK_GNUMAKE], AC_DEFUN([GNUPG_SYS_SYMBOL_UNDERSCORE], [tmp_do_check="no" case "${host}" in - *-mingw32*) + i?86-*-mingw32*) ac_cv_sys_symbol_underscore=yes ;; + x86_64-*-mingw32*) + ac_cv_sys_symbol_underscore=no + ;; i386-emx-os2 | i[3456]86-pc-os2*emx | i386-pc-msdosdjgpp) ac_cv_sys_symbol_underscore=yes ;; diff --git a/mpi/amd64/func_abi.h b/mpi/amd64/func_abi.h new file mode 100644 index 0000000..ce44674 --- /dev/null +++ b/mpi/amd64/func_abi.h @@ -0,0 +1,19 @@ +#ifdef USE_MS_ABI + /* Store registers and move four first input arguments from MS ABI to + * SYSV ABI. */ + #define FUNC_ENTRY() \ + pushq %rsi; \ + pushq %rdi; \ + movq %rdx, %rsi; \ + movq %rcx, %rdi; \ + movq %r8, %rdx; \ + movq %r9, %rcx; + + /* Restore registers. */ + #define FUNC_EXIT() \ + popq %rdi; \ + popq %rsi; +#else + #define FUNC_ENTRY() /**/ + #define FUNC_EXIT() /**/ +#endif diff --git a/mpi/amd64/mpih-add1.S b/mpi/amd64/mpih-add1.S index f0ec89c..6a90262 100644 --- a/mpi/amd64/mpih-add1.S +++ b/mpi/amd64/mpih-add1.S @@ -43,6 +43,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_add_n) C_SYMBOL_NAME(_gcry_mpih_add_n:) + FUNC_ENTRY() leaq (%rsi,%rcx,8), %rsi leaq (%rdi,%rcx,8), %rdi leaq (%rdx,%rcx,8), %rdx @@ -59,5 +60,6 @@ C_SYMBOL_NAME(_gcry_mpih_add_n:) movq %rcx, %rax /* zero %rax */ adcq %rax, %rax + FUNC_EXIT() ret \ No newline at end of file diff --git a/mpi/amd64/mpih-lshift.S b/mpi/amd64/mpih-lshift.S index e87dd1a..9e8979b 100644 --- a/mpi/amd64/mpih-lshift.S +++ b/mpi/amd64/mpih-lshift.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_lshift) C_SYMBOL_NAME(_gcry_mpih_lshift:) + FUNC_ENTRY() movq -8(%rsi,%rdx,8), %mm7 movd %ecx, %mm1 movl $64, %eax @@ -74,4 +75,5 @@ C_SYMBOL_NAME(_gcry_mpih_lshift:) .Lende: psllq %mm1, %mm2 movq %mm2, (%rdi) emms + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul1.S b/mpi/amd64/mpih-mul1.S index 54b0ab4..67ab47e 100644 --- a/mpi/amd64/mpih-mul1.S +++ b/mpi/amd64/mpih-mul1.S @@ -46,6 +46,7 @@ GLOBL C_SYMBOL_NAME(_gcry_mpih_mul_1) C_SYMBOL_NAME(_gcry_mpih_mul_1:) + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%rdx,8), %rsi leaq (%rdi,%rdx,8), %rdi @@ -62,4 +63,5 @@ C_SYMBOL_NAME(_gcry_mpih_mul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul2.S b/mpi/amd64/mpih-mul2.S index a332a1d..1aa4fa0 100644 --- a/mpi/amd64/mpih-mul2.S +++ b/mpi/amd64/mpih-mul2.S @@ -41,6 +41,7 @@ TEXT GLOBL C_SYMBOL_NAME(_gcry_mpih_addmul_1) C_SYMBOL_NAME(_gcry_mpih_addmul_1:) + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%rdx,8), %rsi leaq (%rdi,%rdx,8), %rdi @@ -61,4 +62,5 @@ C_SYMBOL_NAME(_gcry_mpih_addmul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul3.S b/mpi/amd64/mpih-mul3.S index 4d458a7..bc41c4e 100644 --- a/mpi/amd64/mpih-mul3.S +++ b/mpi/amd64/mpih-mul3.S @@ -42,7 +42,7 @@ TEXT GLOBL C_SYMBOL_NAME(_gcry_mpih_submul_1) C_SYMBOL_NAME(_gcry_mpih_submul_1:) - + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%r11,8), %rsi leaq (%rdi,%r11,8), %rdi @@ -63,4 +63,5 @@ C_SYMBOL_NAME(_gcry_mpih_submul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-rshift.S b/mpi/amd64/mpih-rshift.S index 4cfc8f6..311b85b 100644 --- a/mpi/amd64/mpih-rshift.S +++ b/mpi/amd64/mpih-rshift.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_rshift) C_SYMBOL_NAME(_gcry_mpih_rshift:) + FUNC_ENTRY() movq (%rsi), %mm7 movd %ecx, %mm1 movl $64, %eax @@ -77,4 +78,5 @@ C_SYMBOL_NAME(_gcry_mpih_rshift:) .Lende: psrlq %mm1, %mm2 movq %mm2, -8(%rdi) emms + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-sub1.S b/mpi/amd64/mpih-sub1.S index b3609b0..ccf6496 100644 --- a/mpi/amd64/mpih-sub1.S +++ b/mpi/amd64/mpih-sub1.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_sub_n) C_SYMBOL_NAME(_gcry_mpih_sub_n:) + FUNC_ENTRY() leaq (%rsi,%rcx,8), %rsi leaq (%rdi,%rcx,8), %rdi leaq (%rdx,%rcx,8), %rdx @@ -58,4 +59,5 @@ C_SYMBOL_NAME(_gcry_mpih_sub_n:) movq %rcx, %rax /* zero %rax */ adcq %rax, %rax + FUNC_EXIT() ret diff --git a/mpi/config.links b/mpi/config.links index d71918a..2fb5e8a 100644 --- a/mpi/config.links +++ b/mpi/config.links @@ -129,17 +129,22 @@ case "${host}" in x86_64-apple-darwin*) echo '#define BSD_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h path="amd64" mpi_cpu_arch="x86" ;; x86_64-*mingw32*) - echo '/* No working assembler modules available */' >>./mpi/asm-syntax.h - path="" - mpi_cpu_arch="x86" + echo '#define USE_MS_ABI' >>./mpi/asm-syntax.h + echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h + cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h + path="amd64" + mpi_cpu_arch="x86" ;; x86_64-*-*) echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h path="amd64" mpi_cpu_arch="x86" ;; @@ -314,7 +319,7 @@ case "${host}" in echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/powerpc32/syntax.h >>./mpi/asm-syntax.h path="powerpc32" - mpi_cpu_arch="ppc" + mpi_cpu_arch="ppc" ;; rs6000-*-aix[456789]* | \ From jussi.kivilinna at iki.fi Fri May 1 19:40:09 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:40:09 +0300 Subject: [PATCH 8/8] Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64 In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501174009.5385.36714.stgit@localhost6.localdomain6> * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna --- cipher/cipher-gcm-intel-pclmul.c | 72 +++++++++++++++++++++++++ cipher/cipher-internal.h | 4 - cipher/rijndael-aesni.c | 73 +++++++++++++++++++++----- cipher/rijndael-internal.h | 9 +-- cipher/rijndael-ssse3-amd64.c | 94 ++++++++++++++++++++++++++------- configure.ac | 108 +++++++++++++++++++++++++++++++++++++- 6 files changed, 317 insertions(+), 43 deletions(-) diff --git a/cipher/cipher-gcm-intel-pclmul.c b/cipher/cipher-gcm-intel-pclmul.c index 79648ce..a327249 100644 --- a/cipher/cipher-gcm-intel-pclmul.c +++ b/cipher/cipher-gcm-intel-pclmul.c @@ -249,6 +249,17 @@ void _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) { u64 tmp[2]; +#if defined(__x86_64__) && defined(__WIN64__) + char win64tmp[3 * 16]; + + /* XMM6-XMM8 need to be restored after use. */ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" + "movdqu %%xmm7, 1*16(%0)\n\t" + "movdqu %%xmm8, 2*16(%0)\n\t" + : + : "r" (win64tmp) + : "memory"); +#endif /* Swap endianness of hsub. */ tmp[0] = buf_get_be64(c->u_mode.gcm.u_ghash_key.key + 8); @@ -285,6 +296,21 @@ _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) : [h_234] "r" (c->u_mode.gcm.gcm_table) : "memory"); +#ifdef __WIN64__ + /* Clear/restore used registers. */ + asm volatile( "pxor %%xmm0, %%xmm0\n\t" + "pxor %%xmm1, %%xmm1\n\t" + "pxor %%xmm2, %%xmm2\n\t" + "pxor %%xmm3, %%xmm3\n\t" + "pxor %%xmm4, %%xmm4\n\t" + "pxor %%xmm5, %%xmm5\n\t" + "movdqu 0*16(%0), %%xmm6\n\t" + "movdqu 1*16(%0), %%xmm7\n\t" + "movdqu 2*16(%0), %%xmm8\n\t" + : + : "r" (win64tmp) + : "memory"); +#else /* Clear used registers. */ asm volatile( "pxor %%xmm0, %%xmm0\n\t" "pxor %%xmm1, %%xmm1\n\t" @@ -297,6 +323,7 @@ _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) "pxor %%xmm8, %%xmm8\n\t" ::: "cc" ); #endif +#endif wipememory (tmp, sizeof(tmp)); } @@ -309,10 +336,30 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, static const unsigned char be_mask[16] __attribute__ ((aligned (16))) = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; const unsigned int blocksize = GCRY_GCM_BLOCK_LEN; +#ifdef __WIN64__ + char win64tmp[10 * 16]; +#endif if (nblocks == 0) return 0; +#ifdef __WIN64__ + /* XMM8-XMM15 need to be restored after use. */ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" + "movdqu %%xmm7, 1*16(%0)\n\t" + "movdqu %%xmm8, 2*16(%0)\n\t" + "movdqu %%xmm9, 3*16(%0)\n\t" + "movdqu %%xmm10, 4*16(%0)\n\t" + "movdqu %%xmm11, 5*16(%0)\n\t" + "movdqu %%xmm12, 6*16(%0)\n\t" + "movdqu %%xmm13, 7*16(%0)\n\t" + "movdqu %%xmm14, 8*16(%0)\n\t" + "movdqu %%xmm15, 9*16(%0)\n\t" + : + : "r" (win64tmp) + : "memory" ); +#endif + /* Preload hash and H1. */ asm volatile ("movdqu %[hash], %%xmm1\n\t" "movdqa %[hsub], %%xmm0\n\t" @@ -353,6 +400,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, } while (nblocks >= 4); +#ifndef __WIN64__ /* Clear used x86-64/XMM registers. */ asm volatile( "pxor %%xmm8, %%xmm8\n\t" "pxor %%xmm9, %%xmm9\n\t" @@ -363,6 +411,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, "pxor %%xmm14, %%xmm14\n\t" "pxor %%xmm15, %%xmm15\n\t" ::: "cc" ); +#endif } #endif @@ -385,6 +434,28 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, : [hash] "=m" (*result) : [be_mask] "m" (*be_mask)); +#ifdef __WIN64__ + /* Clear/restore used registers. */ + asm volatile( "pxor %%xmm0, %%xmm0\n\t" + "pxor %%xmm1, %%xmm1\n\t" + "pxor %%xmm2, %%xmm2\n\t" + "pxor %%xmm3, %%xmm3\n\t" + "pxor %%xmm4, %%xmm4\n\t" + "pxor %%xmm5, %%xmm5\n\t" + "movdqu 0*16(%0), %%xmm6\n\t" + "movdqu 1*16(%0), %%xmm7\n\t" + "movdqu 2*16(%0), %%xmm8\n\t" + "movdqu 3*16(%0), %%xmm9\n\t" + "movdqu 4*16(%0), %%xmm10\n\t" + "movdqu 5*16(%0), %%xmm11\n\t" + "movdqu 6*16(%0), %%xmm12\n\t" + "movdqu 7*16(%0), %%xmm13\n\t" + "movdqu 8*16(%0), %%xmm14\n\t" + "movdqu 9*16(%0), %%xmm15\n\t" + : + : "r" (win64tmp) + : "memory" ); +#else /* Clear used registers. */ asm volatile( "pxor %%xmm0, %%xmm0\n\t" "pxor %%xmm1, %%xmm1\n\t" @@ -395,6 +466,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, "pxor %%xmm6, %%xmm6\n\t" "pxor %%xmm7, %%xmm7\n\t" ::: "cc" ); +#endif return 0; } diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 693f218..e20ea56 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -67,9 +67,7 @@ #if defined(ENABLE_PCLMUL_SUPPORT) && defined(GCM_USE_TABLES) # if ((defined(__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# ifndef __WIN64__ -# define GCM_USE_INTEL_PCLMUL 1 -# endif +# define GCM_USE_INTEL_PCLMUL 1 # endif # endif #endif /* GCM_USE_INTEL_PCLMUL */ diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 147679f..910bc68 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -49,24 +49,54 @@ typedef struct u128_s { u32 a, b, c, d; } u128_t; the use of these macros. There purpose is to make sure that the SSE regsiters are cleared and won't reveal any information about the key or the data. */ -#define aesni_prepare() do { } while (0) -#define aesni_cleanup() \ - do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ - "pxor %%xmm1, %%xmm1\n" :: ); \ - } while (0) -#define aesni_cleanup_2_6() \ - do { asm volatile ("pxor %%xmm2, %%xmm2\n\t" \ - "pxor %%xmm3, %%xmm3\n" \ - "pxor %%xmm4, %%xmm4\n" \ - "pxor %%xmm5, %%xmm5\n" \ - "pxor %%xmm6, %%xmm6\n":: ); \ - } while (0) - +#ifdef __WIN64__ +/* XMM6-XMM15 are callee-saved registers on WIN64. */ +# define aesni_prepare_2_6_variable char win64tmp[16] +# define aesni_prepare() do { } while (0) +# define aesni_prepare_2_6() \ + do { asm volatile ("movdqu %%xmm6, %0\n\t" \ + : "=m" (*win64tmp) \ + : \ + : "memory"); \ + } while (0) +# define aesni_cleanup() \ + do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ + "pxor %%xmm1, %%xmm1\n" :: ); \ + } while (0) +# define aesni_cleanup_2_6() \ + do { asm volatile ("movdqu %0, %%xmm6\n\t" \ + "pxor %%xmm2, %%xmm2\n" \ + "pxor %%xmm3, %%xmm3\n" \ + "pxor %%xmm4, %%xmm4\n" \ + "pxor %%xmm5, %%xmm5\n" \ + : \ + : "m" (*win64tmp) \ + : "memory"); \ + } while (0) +#else +# define aesni_prepare_2_6_variable +# define aesni_prepare() do { } while (0) +# define aesni_prepare_2_6() do { } while (0) +# define aesni_cleanup() \ + do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ + "pxor %%xmm1, %%xmm1\n" :: ); \ + } while (0) +# define aesni_cleanup_2_6() \ + do { asm volatile ("pxor %%xmm2, %%xmm2\n\t" \ + "pxor %%xmm3, %%xmm3\n" \ + "pxor %%xmm4, %%xmm4\n" \ + "pxor %%xmm5, %%xmm5\n" \ + "pxor %%xmm6, %%xmm6\n":: ); \ + } while (0) +#endif void _gcry_aes_aesni_do_setkey (RIJNDAEL_context *ctx, const byte *key) { + aesni_prepare_2_6_variable; + aesni_prepare(); + aesni_prepare_2_6(); if (ctx->rounds < 12) { @@ -999,7 +1029,10 @@ _gcry_aes_aesni_cbc_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks, int cbc_mac) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm5\n\t" : /* No output */ @@ -1044,8 +1077,10 @@ _gcry_aes_aesni_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, { static const unsigned char be_mask[16] __attribute__ ((aligned (16))) = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqa %[mask], %%xmm6\n\t" /* Preload mask */ "movdqa %[ctr], %%xmm5\n\t" /* Preload CTR */ @@ -1095,7 +1130,10 @@ _gcry_aes_aesni_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm6\n\t" : /* No output */ @@ -1177,7 +1215,10 @@ _gcry_aes_aesni_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm5\n\t" /* use xmm5 as fast IV storage */ @@ -1331,8 +1372,10 @@ aesni_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, unsigned char *outbuf = outbuf_arg; const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" @@ -1473,8 +1516,10 @@ aesni_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, unsigned char *outbuf = outbuf_arg; const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" @@ -1625,8 +1670,10 @@ _gcry_aes_aesni_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, RIJNDAEL_context *ctx = (void *)&c->context.c; const unsigned char *abuf = abuf_arg; u64 n = c->u_mode.ocb.aad_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Sum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index bd247a9..33ca53f 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -44,8 +44,9 @@ #endif /* USE_SSSE3 indicates whether to use SSSE3 code. */ -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif @@ -75,9 +76,7 @@ #ifdef ENABLE_AESNI_SUPPORT # if ((defined (__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# ifndef __WIN64__ -# define USE_AESNI 1 -# endif +# define USE_AESNI 1 # endif # endif #endif /* ENABLE_AESNI_SUPPORT */ diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index 3f1b352..21438dc 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -61,7 +61,60 @@ the use of these macros. There purpose is to make sure that the SSE registers are cleared and won't reveal any information about the key or the data. */ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +/* XMM6-XMM15 are callee-saved registers on WIN64. */ +# define vpaes_ssse3_prepare() \ + char win64tmp[16 * 10]; \ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" \ + "movdqu %%xmm7, 1*16(%0)\n\t" \ + "movdqu %%xmm8, 2*16(%0)\n\t" \ + "movdqu %%xmm9, 3*16(%0)\n\t" \ + "movdqu %%xmm10, 4*16(%0)\n\t" \ + "movdqu %%xmm11, 5*16(%0)\n\t" \ + "movdqu %%xmm12, 6*16(%0)\n\t" \ + "movdqu %%xmm13, 7*16(%0)\n\t" \ + "movdqu %%xmm14, 8*16(%0)\n\t" \ + "movdqu %%xmm15, 9*16(%0)\n\t" \ + : \ + : "r" (win64tmp) \ + : "memory" ) +# define vpaes_ssse3_cleanup() \ + asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ + "pxor %%xmm1, %%xmm1 \n\t" \ + "pxor %%xmm2, %%xmm2 \n\t" \ + "pxor %%xmm3, %%xmm3 \n\t" \ + "pxor %%xmm4, %%xmm4 \n\t" \ + "pxor %%xmm5, %%xmm5 \n\t" \ + "movdqu 0*16(%0), %%xmm6 \n\t" \ + "movdqu 1*16(%0), %%xmm7 \n\t" \ + "movdqu 2*16(%0), %%xmm8 \n\t" \ + "movdqu 3*16(%0), %%xmm9 \n\t" \ + "movdqu 4*16(%0), %%xmm10 \n\t" \ + "movdqu 5*16(%0), %%xmm11 \n\t" \ + "movdqu 6*16(%0), %%xmm12 \n\t" \ + "movdqu 7*16(%0), %%xmm13 \n\t" \ + "movdqu 8*16(%0), %%xmm14 \n\t" \ + "movdqu 9*16(%0), %%xmm15 \n\t" \ + : \ + : "r" (win64tmp) \ + : "memory" ) +#else +# define vpaes_ssse3_prepare() /*_*/ +# define vpaes_ssse3_cleanup() \ + asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ + "pxor %%xmm1, %%xmm1 \n\t" \ + "pxor %%xmm2, %%xmm2 \n\t" \ + "pxor %%xmm3, %%xmm3 \n\t" \ + "pxor %%xmm4, %%xmm4 \n\t" \ + "pxor %%xmm5, %%xmm5 \n\t" \ + "pxor %%xmm6, %%xmm6 \n\t" \ + "pxor %%xmm7, %%xmm7 \n\t" \ + "pxor %%xmm8, %%xmm8 \n\t" \ + ::: "memory" ) +#endif + #define vpaes_ssse3_prepare_enc(const_ptr) \ + vpaes_ssse3_prepare(); \ asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ "movdqa (%q0), %%xmm9 # 0F \n\t" \ "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ @@ -75,6 +128,7 @@ : "memory" ) #define vpaes_ssse3_prepare_dec(const_ptr) \ + vpaes_ssse3_prepare(); \ asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ "movdqa (%q0), %%xmm9 # 0F \n\t" \ "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ @@ -88,17 +142,6 @@ : \ : "memory" ) -#define vpaes_ssse3_cleanup() \ - asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ - "pxor %%xmm1, %%xmm1 \n\t" \ - "pxor %%xmm2, %%xmm2 \n\t" \ - "pxor %%xmm3, %%xmm3 \n\t" \ - "pxor %%xmm4, %%xmm4 \n\t" \ - "pxor %%xmm5, %%xmm5 \n\t" \ - "pxor %%xmm6, %%xmm6 \n\t" \ - "pxor %%xmm7, %%xmm7 \n\t" \ - "pxor %%xmm8, %%xmm8 \n\t" \ - ::: "memory" ) void @@ -106,6 +149,8 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) { unsigned int keybits = (ctx->rounds - 10) * 32 + 128; + vpaes_ssse3_prepare(); + asm volatile ("leaq %q[key], %%rdi" "\n\t" "movl %[bits], %%esi" "\n\t" "leaq %[buf], %%rdx" "\n\t" @@ -121,6 +166,8 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) : "r8", "r9", "r10", "r11", "rax", "rcx", "rdx", "rdi", "rsi", "cc", "memory"); + vpaes_ssse3_cleanup(); + /* Save key for setting up decryption. */ memcpy(&ctx->keyschdec32[0][0], key, keybits / 8); } @@ -132,6 +179,8 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) { unsigned int keybits = (ctx->rounds - 10) * 32 + 128; + vpaes_ssse3_prepare(); + asm volatile ("leaq %q[key], %%rdi" "\n\t" "movl %[bits], %%esi" "\n\t" "leaq %[buf], %%rdx" "\n\t" @@ -146,6 +195,8 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) [rotoffs] "g" ((keybits == 192) ? 0 : 32) : "r8", "r9", "r10", "r11", "rax", "rcx", "rdx", "rdi", "rsi", "cc", "memory"); + + vpaes_ssse3_cleanup(); } @@ -465,6 +516,11 @@ _gcry_aes_ssse3_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, } +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define X(...) +#else +# define X(...) __VA_ARGS__ +#endif asm ( "\n\t" "##" @@ -494,7 +550,7 @@ asm ( "\n\t" "##" "\n\t" "##" "\n\t" ".align 16" - "\n\t" ".type _aes_encrypt_core, at function" +X("\n\t" ".type _aes_encrypt_core, at function") "\n\t" "_aes_encrypt_core:" "\n\t" " leaq .Lk_mc_backward(%rcx), %rdi" "\n\t" " mov $16, %rsi" @@ -570,7 +626,7 @@ asm ( "\n\t" " pxor %xmm4, %xmm0 # 0 = A" "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" "\n\t" " ret" - "\n\t" ".size _aes_encrypt_core,.-_aes_encrypt_core" +X("\n\t" ".size _aes_encrypt_core,.-_aes_encrypt_core") "\n\t" "##" "\n\t" "## Decryption core" @@ -578,7 +634,7 @@ asm ( "\n\t" "## Same API as encryption core." "\n\t" "##" "\n\t" ".align 16" - "\n\t" ".type _aes_decrypt_core, at function" +X("\n\t" ".type _aes_decrypt_core, at function") "\n\t" "_aes_decrypt_core:" "\n\t" " movl %eax, %esi" "\n\t" " shll $4, %esi" @@ -670,7 +726,7 @@ asm ( "\n\t" " pxor %xmm4, %xmm0 # 0 = A" "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" "\n\t" " ret" - "\n\t" ".size _aes_decrypt_core,.-_aes_decrypt_core" +X("\n\t" ".size _aes_decrypt_core,.-_aes_decrypt_core") "\n\t" "########################################################" "\n\t" "## ##" @@ -679,7 +735,7 @@ asm ( "\n\t" "########################################################" "\n\t" ".align 16" - "\n\t" ".type _aes_schedule_core, at function" +X("\n\t" ".type _aes_schedule_core, at function") "\n\t" "_aes_schedule_core:" "\n\t" " # rdi = key" "\n\t" " # rsi = size in bits" @@ -1039,7 +1095,7 @@ asm ( "\n\t" " pxor %xmm7, %xmm7" "\n\t" " pxor %xmm8, %xmm8" "\n\t" " ret" - "\n\t" ".size _aes_schedule_core,.-_aes_schedule_core" +X("\n\t" ".size _aes_schedule_core,.-_aes_schedule_core") "\n\t" "########################################################" "\n\t" "## ##" @@ -1048,7 +1104,7 @@ asm ( "\n\t" "########################################################" "\n\t" ".align 16" - "\n\t" ".type _aes_consts, at object" +X("\n\t" ".type _aes_consts, at object") "\n\t" ".Laes_consts:" "\n\t" "_aes_consts:" "\n\t" " # s0F" @@ -1226,7 +1282,7 @@ asm ( "\n\t" " .quad 0xC7AA6DB9D4943E2D" "\n\t" " .quad 0x12D7560F93441D00" "\n\t" " .quad 0xCA4B8159D8C58E9C" - "\n\t" ".size _aes_consts,.-_aes_consts" +X("\n\t" ".size _aes_consts,.-_aes_consts") ); #endif /* USE_SSSE3 */ diff --git a/configure.ac b/configure.ac index 594209f..0f16175 100644 --- a/configure.ac +++ b/configure.ac @@ -1127,6 +1127,93 @@ fi #### #### ############################################# + +# Following tests depend on warnings to cause compile to fail, so set -Werror +# temporarily. +_gcc_cflags_save=$CFLAGS +CFLAGS="$CFLAGS -Werror" + + +# +# Check whether compiler supports 'ms_abi' function attribute. +# +AC_CACHE_CHECK([whether compiler supports 'ms_abi' function attribute], + [gcry_cv_gcc_attribute_ms_abi], + [gcry_cv_gcc_attribute_ms_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[int __attribute__ ((ms_abi)) proto(int);]])], + [gcry_cv_gcc_attribute_ms_abi=yes])]) +if test "$gcry_cv_gcc_attribute_ms_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_MS_ABI,1, + [Defined if compiler supports "__attribute__ ((ms_abi))" function attribute]) +fi + + +# +# Check whether compiler supports 'sysv_abi' function attribute. +# +AC_CACHE_CHECK([whether compiler supports 'sysv_abi' function attribute], + [gcry_cv_gcc_attribute_sysv_abi], + [gcry_cv_gcc_attribute_sysv_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[int __attribute__ ((sysv_abi)) proto(int);]])], + [gcry_cv_gcc_attribute_sysv_abi=yes])]) +if test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_SYSV_ABI,1, + [Defined if compiler supports "__attribute__ ((sysv_abi))" function attribute]) +fi + + +# +# Check whether default calling convention is 'ms_abi'. +# +if test "$gcry_cv_gcc_attribute_ms_abi" = "yes" ; then + AC_CACHE_CHECK([whether default calling convention is 'ms_abi'], + [gcry_cv_gcc_default_abi_is_ms_abi], + [gcry_cv_gcc_default_abi_is_ms_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[void *test(void) { + void *(*def_func)(void) = test; + void *__attribute__((ms_abi))(*msabi_func)(void); + /* warning on SysV abi targets, passes on Windows based targets */ + msabi_func = def_func; + return msabi_func; + }]])], + [gcry_cv_gcc_default_abi_is_ms_abi=yes])]) + if test "$gcry_cv_gcc_default_abi_is_ms_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_DEFAULT_ABI_IS_MS_ABI,1, + [Defined if default calling convention is 'ms_abi']) + fi +fi + + +# +# Check whether default calling convention is 'sysv_abi'. +# +if test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" ; then + AC_CACHE_CHECK([whether default calling convention is 'sysv_abi'], + [gcry_cv_gcc_default_abi_is_sysv_abi], + [gcry_cv_gcc_default_abi_is_sysv_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[void *test(void) { + void *(*def_func)(void) = test; + void *__attribute__((sysv_abi))(*sysvabi_func)(void); + /* warning on MS ABI targets, passes on SysV ABI targets */ + sysvabi_func = def_func; + return sysvabi_func; + }]])], + [gcry_cv_gcc_default_abi_is_sysv_abi=yes])]) + if test "$gcry_cv_gcc_default_abi_is_sysv_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_DEFAULT_ABI_IS_SYSV_ABI,1, + [Defined if default calling convention is 'sysv_abi']) + fi +fi + + +# Restore flags. +CFLAGS=$_gcc_cflags_save; + + # # Check whether GCC inline assembler supports SSSE3 instructions # This is required for the AES-NI instructions. @@ -1281,9 +1368,6 @@ if test $amd64_as_feature_detection = yes; then [[__asm__( /* Test if '.type' and '.size' are supported. */ /* These work only on ELF targets. */ - /* TODO: add COFF (mingw64, cygwin64) support to assembly - * implementations. Mingw64/cygwin64 also require additional - * work because they use different calling convention. */ "asmfunc:\n\t" ".size asmfunc,.-asmfunc;\n\t" ".type asmfunc, at function;\n\t" @@ -1299,6 +1383,24 @@ if test $amd64_as_feature_detection = yes; then AC_DEFINE(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS,1, [Defined if underlying assembler is compatible with amd64 assembly implementations]) fi + if test "$gcry_cv_gcc_amd64_platform_as_ok" = "no" && + test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" && + test "$gcry_cv_gcc_default_abi_is_ms_abi" = "yes"; then + AC_CACHE_CHECK([whether GCC assembler is compatible for WIN64 assembly implementations], + [gcry_cv_gcc_win64_platform_as_ok], + [gcry_cv_gcc_win64_platform_as_ok=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[__asm__( + ".globl asmfunc\n\t" + "asmfunc:\n\t" + "xorq \$(1234), %rbp;\n\t" + );]])], + [gcry_cv_gcc_win64_platform_as_ok=yes])]) + if test "$gcry_cv_gcc_win64_platform_as_ok" = "yes" ; then + AC_DEFINE(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS,1, + [Defined if underlying assembler is compatible with WIN64 assembly implementations]) + fi + fi fi From jussi.kivilinna at iki.fi Fri May 1 19:39:54 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 01 May 2015 20:39:54 +0300 Subject: [PATCH 5/8] Cast pointers to integers using uintptr_t instead of long In-Reply-To: <20150501173934.5385.86854.stgit@localhost6.localdomain6> References: <20150501173934.5385.86854.stgit@localhost6.localdomain6> Message-ID: <20150501173954.5385.42069.stgit@localhost6.localdomain6> --- cipher/cipher.c | 4 ++-- cipher/md.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/cipher/cipher.c b/cipher/cipher.c index d1550c0..7a29824 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -481,11 +481,11 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, size_t off = 0; #ifdef NEED_16BYTE_ALIGNED_CONTEXT - if ( ((unsigned long)h & 0x0f) ) + if ( ((uintptr_t)h & 0x0f) ) { /* The malloced block is not aligned on a 16 byte boundary. Correct for this. */ - off = 16 - ((unsigned long)h & 0x0f); + off = 16 - ((uintptr_t)h & 0x0f); h = (void*)((char*)h + off); } #endif /*NEED_16BYTE_ALIGNED_CONTEXT*/ diff --git a/cipher/md.c b/cipher/md.c index 9fef555..3ab46ef 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -1148,7 +1148,7 @@ md_stop_debug( gcry_md_hd_t md ) #ifdef HAVE_U64_TYPEDEF { /* a kludge to pull in the __muldi3 for Solaris */ - volatile u32 a = (u32)(ulong)md; + volatile u32 a = (u32)(uintptr_t)md; volatile u64 b = 42; volatile u64 c; c = a * b; From jussi.kivilinna at iki.fi Sat May 2 15:11:43 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 02 May 2015 16:11:43 +0300 Subject: [PATCH 1/5] Enable AMD64 SHA1 implementations for WIN64 Message-ID: <20150502131143.24338.51327.stgit@localhost6.localdomain6> * cipher/sha1-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha1-avx-bmi2-amd64.S: Ditto. * cipher/sha1-ssse3-amd64.S: Ditto. * cipher/sha1.c (USE_SSSE3, USE_AVX, USE_BMI2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_BMI2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha1_transform_amd64_ssse3, _gcry_sha1_transform_amd64_avx) (_gcry_sha1_transform_amd64_avx_bmi2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna --- cipher/sha1-avx-amd64.S | 12 ++++++++-- cipher/sha1-avx-bmi2-amd64.S | 12 ++++++++-- cipher/sha1-ssse3-amd64.S | 12 ++++++++-- cipher/sha1.c | 51 ++++++++++++++++++++++++++++++++---------- 4 files changed, 69 insertions(+), 18 deletions(-) diff --git a/cipher/sha1-avx-amd64.S b/cipher/sha1-avx-amd64.S index 6bec389..062a45b 100644 --- a/cipher/sha1-avx-amd64.S +++ b/cipher/sha1-avx-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(USE_SHA1) @@ -40,6 +41,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -209,7 +217,7 @@ */ .text .globl _gcry_sha1_transform_amd64_avx -.type _gcry_sha1_transform_amd64_avx, at function +ELF(.type _gcry_sha1_transform_amd64_avx, at function) .align 16 _gcry_sha1_transform_amd64_avx: /* input: diff --git a/cipher/sha1-avx-bmi2-amd64.S b/cipher/sha1-avx-bmi2-amd64.S index cd5af5b..22bcbb3 100644 --- a/cipher/sha1-avx-bmi2-amd64.S +++ b/cipher/sha1-avx-bmi2-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA1) @@ -40,6 +41,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -206,7 +214,7 @@ */ .text .globl _gcry_sha1_transform_amd64_avx_bmi2 -.type _gcry_sha1_transform_amd64_avx_bmi2, at function +ELF(.type _gcry_sha1_transform_amd64_avx_bmi2, at function) .align 16 _gcry_sha1_transform_amd64_avx_bmi2: /* input: diff --git a/cipher/sha1-ssse3-amd64.S b/cipher/sha1-ssse3-amd64.S index 226988d..98a19e6 100644 --- a/cipher/sha1-ssse3-amd64.S +++ b/cipher/sha1-ssse3-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA1) #ifdef __PIC__ @@ -39,6 +40,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -220,7 +228,7 @@ */ .text .globl _gcry_sha1_transform_amd64_ssse3 -.type _gcry_sha1_transform_amd64_ssse3, at function +ELF(.type _gcry_sha1_transform_amd64_ssse3, at function) .align 16 _gcry_sha1_transform_amd64_ssse3: /* input: diff --git a/cipher/sha1.c b/cipher/sha1.c index 6ccf0e8..eb42883 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -45,22 +45,26 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_BMI2 indicates whether to compile with Intel AVX/BMI2 code. */ #undef USE_BMI2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && defined(HAVE_GCC_INLINE_ASM_BMI2) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_BMI2 1 #endif @@ -287,22 +291,37 @@ transform_blk (void *ctx, const unsigned char *data) } +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_BMI2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_SSSE3 unsigned int _gcry_sha1_transform_amd64_ssse3 (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha1_transform_amd64_avx (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif #ifdef USE_BMI2 unsigned int _gcry_sha1_transform_amd64_avx_bmi2 (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif @@ -315,17 +334,17 @@ transform (void *ctx, const unsigned char *data, size_t nblks) #ifdef USE_BMI2 if (hd->use_bmi2) return _gcry_sha1_transform_amd64_avx_bmi2 (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (hd->use_avx) return _gcry_sha1_transform_amd64_avx (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (hd->use_ssse3) return _gcry_sha1_transform_amd64_ssse3 (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_NEON if (hd->use_neon) @@ -340,6 +359,14 @@ transform (void *ctx, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } From jussi.kivilinna at iki.fi Sat May 2 15:11:48 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 02 May 2015 16:11:48 +0300 Subject: [PATCH 2/5] Enable AMD64 SHA256 implementations for WIN64 In-Reply-To: <20150502131143.24338.51327.stgit@localhost6.localdomain6> References: <20150502131143.24338.51327.stgit@localhost6.localdomain6> Message-ID: <20150502131148.24338.23658.stgit@localhost6.localdomain6> * cipher/sha256-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha256-avx2-bmi2-amd64.S: Ditto. * cipher/sha256-ssse3-amd64.S: Ditto. * cipher/sha256.c (USE_SSSE3, USE_AVX, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_AVX2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha256_transform_amd64_ssse3, _gcry_sha256_transform_amd64_avx) (_gcry_sha256_transform_amd64_avx2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna --- cipher/sha256-avx-amd64.S | 11 ++++++- cipher/sha256-avx2-bmi2-amd64.S | 11 ++++++- cipher/sha256-ssse3-amd64.S | 11 ++++++- cipher/sha256.c | 60 +++++++++++++++++++++++++++++---------- 4 files changed, 72 insertions(+), 21 deletions(-) diff --git a/cipher/sha256-avx-amd64.S b/cipher/sha256-avx-amd64.S index 3912db7..8bf26bd 100644 --- a/cipher/sha256-avx-amd64.S +++ b/cipher/sha256-avx-amd64.S @@ -54,7 +54,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA256) @@ -64,6 +65,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define VMOVDQ vmovdqu /* assume buffers not aligned */ @@ -370,7 +377,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_avx -.type _gcry_sha256_transform_amd64_avx, at function; +ELF(.type _gcry_sha256_transform_amd64_avx, at function;) .align 16 _gcry_sha256_transform_amd64_avx: vzeroupper diff --git a/cipher/sha256-avx2-bmi2-amd64.S b/cipher/sha256-avx2-bmi2-amd64.S index 09df711..74b6063 100644 --- a/cipher/sha256-avx2-bmi2-amd64.S +++ b/cipher/sha256-avx2-bmi2-amd64.S @@ -54,7 +54,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(USE_SHA256) @@ -65,6 +66,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define VMOVDQ vmovdqu /* ; assume buffers not aligned */ @@ -555,7 +562,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_avx2 -.type _gcry_sha256_transform_amd64_avx2, at function +ELF(.type _gcry_sha256_transform_amd64_avx2, at function) .align 32 _gcry_sha256_transform_amd64_avx2: push rbx diff --git a/cipher/sha256-ssse3-amd64.S b/cipher/sha256-ssse3-amd64.S index 80b1cec..9ec87e4 100644 --- a/cipher/sha256-ssse3-amd64.S +++ b/cipher/sha256-ssse3-amd64.S @@ -55,7 +55,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA256) @@ -65,6 +66,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define MOVDQ movdqu /* assume buffers not aligned */ @@ -376,7 +383,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_ssse3 -.type _gcry_sha256_transform_amd64_ssse3, at function; +ELF(.type _gcry_sha256_transform_amd64_ssse3, at function;) .align 16 _gcry_sha256_transform_amd64_ssse3: push rbx diff --git a/cipher/sha256.c b/cipher/sha256.c index d3af172..59ffa43 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -49,25 +49,29 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2/BMI2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX2) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX2 1 #endif @@ -322,19 +326,37 @@ transform_blk (void *ctx, const unsigned char *data) #undef R +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_SSSE3 unsigned int _gcry_sha256_transform_amd64_ssse3(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha256_transform_amd64_avx(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 unsigned int _gcry_sha256_transform_amd64_avx2(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif @@ -347,19 +369,19 @@ transform (void *ctx, const unsigned char *data, size_t nblks) #ifdef USE_AVX2 if (hd->use_avx2) return _gcry_sha256_transform_amd64_avx2 (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (hd->use_avx) return _gcry_sha256_transform_amd64_avx (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (hd->use_ssse3) return _gcry_sha256_transform_amd64_ssse3 (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif do @@ -369,6 +391,14 @@ transform (void *ctx, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } From jussi.kivilinna at iki.fi Sat May 2 15:11:53 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 02 May 2015 16:11:53 +0300 Subject: [PATCH 3/5] Enable AMD64 SHA512 implementations for WIN64 In-Reply-To: <20150502131143.24338.51327.stgit@localhost6.localdomain6> References: <20150502131143.24338.51327.stgit@localhost6.localdomain6> Message-ID: <20150502131153.24338.4959.stgit@localhost6.localdomain6> * cipher/sha512-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha512-avx-bmi2-amd64.S: Ditto. * cipher/sha512-ssse3-amd64.S: Ditto. * cipher/sha512.c (USE_SSSE3, USE_AVX, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_AVX2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha512_transform_amd64_ssse3, _gcry_sha512_transform_amd64_avx) (_gcry_sha512_transform_amd64_avx_bmi2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna --- cipher/sha512-avx-amd64.S | 11 ++++++- cipher/sha512-avx2-bmi2-amd64.S | 11 ++++++- cipher/sha512-ssse3-amd64.S | 11 ++++++- cipher/sha512.c | 60 +++++++++++++++++++++++++++++---------- 4 files changed, 72 insertions(+), 21 deletions(-) diff --git a/cipher/sha512-avx-amd64.S b/cipher/sha512-avx-amd64.S index 3449b87..699c271 100644 --- a/cipher/sha512-avx-amd64.S +++ b/cipher/sha512-avx-amd64.S @@ -41,7 +41,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA512) @@ -51,6 +52,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -259,7 +266,7 @@ frame_size = ((frame_GPRSAVE) + (frame_GPRSAVE_size)) ; L is the message length in SHA512 blocks */ .globl _gcry_sha512_transform_amd64_avx -.type _gcry_sha512_transform_amd64_avx, at function; +ELF(.type _gcry_sha512_transform_amd64_avx, at function;) .align 16 _gcry_sha512_transform_amd64_avx: xor eax, eax diff --git a/cipher/sha512-avx2-bmi2-amd64.S b/cipher/sha512-avx2-bmi2-amd64.S index d6301f3..02f95af 100644 --- a/cipher/sha512-avx2-bmi2-amd64.S +++ b/cipher/sha512-avx2-bmi2-amd64.S @@ -43,7 +43,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(USE_SHA512) @@ -54,6 +55,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -596,7 +603,7 @@ rotate_Ys ; L is the message length in SHA512 blocks */ .globl _gcry_sha512_transform_amd64_avx2 -.type _gcry_sha512_transform_amd64_avx2, at function; +ELF(.type _gcry_sha512_transform_amd64_avx2, at function;) .align 16 _gcry_sha512_transform_amd64_avx2: xor eax, eax diff --git a/cipher/sha512-ssse3-amd64.S b/cipher/sha512-ssse3-amd64.S index 4c80baa..c721bcf 100644 --- a/cipher/sha512-ssse3-amd64.S +++ b/cipher/sha512-ssse3-amd64.S @@ -44,7 +44,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA512) @@ -54,6 +55,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -261,7 +268,7 @@ frame_size = ((frame_GPRSAVE) + (frame_GPRSAVE_size)) ; L is the message length in SHA512 blocks. */ .globl _gcry_sha512_transform_amd64_ssse3 -.type _gcry_sha512_transform_amd64_ssse3, at function; +ELF(.type _gcry_sha512_transform_amd64_ssse3, at function;) .align 16 _gcry_sha512_transform_amd64_ssse3: xor eax, eax diff --git a/cipher/sha512.c b/cipher/sha512.c index 5a6af80..029f8f0 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -68,27 +68,31 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2/rorx code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX2) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX2 1 #endif @@ -543,6 +547,21 @@ transform_blk (SHA512_STATE *hd, const unsigned char *data) } +/* AMD64 assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_ARM_NEON_ASM void _gcry_sha512_transform_armv7_neon (SHA512_STATE *hd, const unsigned char *data, @@ -551,17 +570,20 @@ void _gcry_sha512_transform_armv7_neon (SHA512_STATE *hd, #ifdef USE_SSSE3 unsigned int _gcry_sha512_transform_amd64_ssse3(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha512_transform_amd64_avx(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 unsigned int _gcry_sha512_transform_amd64_avx2(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif @@ -574,19 +596,19 @@ transform (void *context, const unsigned char *data, size_t nblks) #ifdef USE_AVX2 if (ctx->use_avx2) return _gcry_sha512_transform_amd64_avx2 (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (ctx->use_avx) return _gcry_sha512_transform_amd64_avx (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (ctx->use_ssse3) return _gcry_sha512_transform_amd64_ssse3 (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_ARM_NEON_ASM @@ -607,6 +629,14 @@ transform (void *context, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } From jussi.kivilinna at iki.fi Sat May 2 15:11:58 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 02 May 2015 16:11:58 +0300 Subject: [PATCH 4/5] Enable AMD64 Whirlpool implementation for WIN64 In-Reply-To: <20150502131143.24338.51327.stgit@localhost6.localdomain6> References: <20150502131143.24338.51327.stgit@localhost6.localdomain6> Message-ID: <20150502131158.24338.19942.stgit@localhost6.localdomain6> * cipher/whirlpool-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/whirlpool.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AMD64_ASM] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64): Add ASM_FUNC_ABI to prototype. [USE_AMD64_ASM] (whirlpool_transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna --- cipher/whirlpool-sse2-amd64.S | 13 ++++++++++--- cipher/whirlpool.c | 15 ++++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/cipher/whirlpool-sse2-amd64.S b/cipher/whirlpool-sse2-amd64.S index d0bcf2d..e98b831 100644 --- a/cipher/whirlpool-sse2-amd64.S +++ b/cipher/whirlpool-sse2-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_WHIRLPOOL) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_WHIRLPOOL) #ifdef __PIC__ # define RIP %rip @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text /* look-up table offsets on RTAB */ @@ -157,7 +164,7 @@ .align 8 .globl _gcry_whirlpool_transform_amd64 -.type _gcry_whirlpool_transform_amd64, at function; +ELF(.type _gcry_whirlpool_transform_amd64, at function;) _gcry_whirlpool_transform_amd64: /* input: @@ -329,7 +336,7 @@ _gcry_whirlpool_transform_amd64: .Lskip: movl $(STACK_MAX + 8), %eax; ret; -.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64; +ELF(.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64;) #endif #endif diff --git a/cipher/whirlpool.c b/cipher/whirlpool.c index 2732f63..5f224a1 100644 --- a/cipher/whirlpool.c +++ b/cipher/whirlpool.c @@ -42,7 +42,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -1192,9 +1193,17 @@ whirlpool_init (void *ctx, unsigned int flags) #ifdef USE_AMD64_ASM +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + extern unsigned int _gcry_whirlpool_transform_amd64(u64 *state, const unsigned char *data, - size_t nblks, const struct whirlpool_tables_s *tables); + size_t nblks, const struct whirlpool_tables_s *tables) ASM_FUNC_ABI; static unsigned int whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) @@ -1202,7 +1211,7 @@ whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) whirlpool_context_t *context = ctx; return _gcry_whirlpool_transform_amd64( - context->hash_state, data, nblks, &tab); + context->hash_state, data, nblks, &tab) + ASM_EXTRA_STACK; } #else /* USE_AMD64_ASM */ From jussi.kivilinna at iki.fi Sat May 2 15:12:03 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 02 May 2015 16:12:03 +0300 Subject: [PATCH 5/5] Enable AMD64 AES implementation for WIN64 In-Reply-To: <20150502131143.24338.51327.stgit@localhost6.localdomain6> References: <20150502131143.24338.51327.stgit@localhost6.localdomain6> Message-ID: <20150502131203.24338.81842.stgit@localhost6.localdomain6> * cipher/rijndael-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/rijndael-internal.h (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (do_encrypt, do_decrypt) [USE_AMD64_ASM && !HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Use assembly block to call AMD64 assembly encrypt/decrypt function. -- Signed-off-by: Jussi Kivilinna --- cipher/rijndael-amd64.S | 17 ++++++++++++----- cipher/rijndael-internal.h | 3 ++- cipher/rijndael.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+), 6 deletions(-) diff --git a/cipher/rijndael-amd64.S b/cipher/rijndael-amd64.S index 24c555a..b149e94 100644 --- a/cipher/rijndael-amd64.S +++ b/cipher/rijndael-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_AES) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_AES) #ifdef __PIC__ # define RIP (%rip) @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text /* table macros */ @@ -205,7 +212,7 @@ .align 8 .globl _gcry_aes_amd64_encrypt_block -.type _gcry_aes_amd64_encrypt_block, at function; +ELF(.type _gcry_aes_amd64_encrypt_block, at function;) _gcry_aes_amd64_encrypt_block: /* input: @@ -279,7 +286,7 @@ _gcry_aes_amd64_encrypt_block: lastencround(11); jmp .Lenc_done; -.size _gcry_aes_amd64_encrypt_block,.-_gcry_aes_amd64_encrypt_block; +ELF(.size _gcry_aes_amd64_encrypt_block,.-_gcry_aes_amd64_encrypt_block;) #define do_decround(next_r) \ do16bit_shr(16, mov, RA, Dsize, D0, RNA, D0, RNB, RT0, RT1); \ @@ -365,7 +372,7 @@ _gcry_aes_amd64_encrypt_block: .align 8 .globl _gcry_aes_amd64_decrypt_block -.type _gcry_aes_amd64_decrypt_block, at function; +ELF(.type _gcry_aes_amd64_decrypt_block, at function;) _gcry_aes_amd64_decrypt_block: /* input: @@ -440,7 +447,7 @@ _gcry_aes_amd64_decrypt_block: decround(9); jmp .Ldec_tail; -.size _gcry_aes_amd64_decrypt_block,.-_gcry_aes_amd64_decrypt_block; +ELF(.size _gcry_aes_amd64_decrypt_block,.-_gcry_aes_amd64_decrypt_block;) #endif /*USE_AES*/ #endif /*__x86_64*/ diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index 33ca53f..6641728 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -39,7 +39,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif diff --git a/cipher/rijndael.c b/cipher/rijndael.c index ade41c9..7ebf329 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -665,8 +665,25 @@ do_encrypt (const RIJNDAEL_context *ctx, unsigned char *bx, const unsigned char *ax) { #ifdef USE_AMD64_ASM +# ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS return _gcry_aes_amd64_encrypt_block(ctx->keyschenc, bx, ax, ctx->rounds, encT); +# else + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + uintptr_t ret; + asm ("movq %[encT], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret) + : "0" (_gcry_aes_amd64_encrypt_block), + "D" (ctx->keyschenc), + "S" (bx), + "d" (ax), + "c" (ctx->rounds), + [encT] "r" (encT) + : "cc", "memory", "r8", "r9", "r10", "r11"); + return ret; +# endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) return _gcry_aes_arm_encrypt_block(ctx->keyschenc, bx, ax, ctx->rounds, encT); #else @@ -1008,8 +1025,25 @@ do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx, const unsigned char *ax) { #ifdef USE_AMD64_ASM +# ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS return _gcry_aes_amd64_decrypt_block(ctx->keyschdec, bx, ax, ctx->rounds, &dec_tables); +# else + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + uintptr_t ret; + asm ("movq %[dectabs], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret) + : "0" (_gcry_aes_amd64_decrypt_block), + "D" (ctx->keyschdec), + "S" (bx), + "d" (ax), + "c" (ctx->rounds), + [dectabs] "r" (&dec_tables) + : "cc", "memory", "r8", "r9", "r10", "r11"); + return ret; +# endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) return _gcry_aes_arm_decrypt_block(ctx->keyschdec, bx, ax, ctx->rounds, &dec_tables); From jussi.kivilinna at iki.fi Sun May 3 00:26:58 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 May 2015 01:26:58 +0300 Subject: [PATCH] Add '1 million a characters' test vectors Message-ID: <20150502222658.31924.34681.stgit@localhost6.localdomain6> * tests/basic.c (check_digests): Add "!" test vectors for MD5, SHA-384, SHA-512, RIPEMD160 and CRC32. -- Signed-off-by: Jussi Kivilinna --- tests/basic.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tests/basic.c b/tests/basic.c index bb07394..2c664c0 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5391,6 +5391,8 @@ check_digests (void) "TY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser Gene" "ral Public License for more details.", "\xc4\x1a\x5c\x0b\x44\x5f\xba\x1a\xda\xbc\xc0\x38\x0e\x0c\x9e\x33" }, + { GCRY_MD_MD5, "!", + "\x77\x07\xd6\xae\x4e\x02\x7c\x70\xee\xa2\xa9\x35\xc2\x29\x6f\x21" }, { GCRY_MD_SHA1, "abc", "\xA9\x99\x3E\x36\x47\x06\x81\x6A\xBA\x3E" "\x25\x71\x78\x50\xC2\x6C\x9C\xD0\xD8\x9D" }, @@ -5471,6 +5473,10 @@ check_digests (void) "\xe4\x6d\xb4\x28\x33\x77\x99\x49\x94\x0f\xcf\x87\xc2\x2f\x30\xd6" "\x06\x24\x82\x9d\x80\x64\x8a\x07\xa1\x20\x8f\x5f\xf3\x85\xb3\xaa" "\x39\xb8\x61\x00\xfc\x7f\x18\xc6\x82\x23\x4b\x45\xfa\xf1\xbc\x69" }, + { GCRY_MD_SHA384, "!", + "\x9d\x0e\x18\x09\x71\x64\x74\xcb\x08\x6e\x83\x4e\x31\x0a\x4a\x1c" + "\xed\x14\x9e\x9c\x00\xf2\x48\x52\x79\x72\xce\xc5\x70\x4c\x2a\x5b" + "\x07\xb8\xb3\xdc\x38\xec\xc4\xeb\xae\x97\xdd\xd8\x7f\x3d\x89\x85" }, { GCRY_MD_SHA512, "abc", "\xDD\xAF\x35\xA1\x93\x61\x7A\xBA\xCC\x41\x73\x49\xAE\x20\x41\x31" "\x12\xE6\xFA\x4E\x89\xA9\x7E\xA2\x0A\x9E\xEE\xE6\x4B\x55\xD3\x9A" @@ -5489,6 +5495,11 @@ check_digests (void) "\xdd\xec\x62\x0f\xf7\x1a\x1e\x10\x32\x05\x02\xa6\xb0\x1f\x70\x37" "\xbc\xd7\x15\xed\x71\x6c\x78\x20\xc8\x54\x87\xd0\x66\x6a\x17\x83" "\x05\x61\x92\xbe\xcc\x8f\x3b\xbf\x11\x72\x22\x69\x23\x5b\x48\x5c" }, + { GCRY_MD_SHA512, "!", + "\xe7\x18\x48\x3d\x0c\xe7\x69\x64\x4e\x2e\x42\xc7\xbc\x15\xb4\x63" + "\x8e\x1f\x98\xb1\x3b\x20\x44\x28\x56\x32\xa8\x03\xaf\xa9\x73\xeb" + "\xde\x0f\xf2\x44\x87\x7e\xa6\x0a\x4c\xb0\x43\x2c\xe5\x77\xc3\x1b" + "\xeb\x00\x9c\x5c\x2c\x49\xaa\x2e\x4e\xad\xb2\x17\xad\x8c\xc0\x9b" }, { GCRY_MD_RMD160, "", "\x9c\x11\x85\xa5\xc5\xe9\xfc\x54\x61\x28" "\x08\x97\x7e\xe8\xf5\x48\xb2\x25\x8d\x31" }, @@ -5512,6 +5523,9 @@ check_digests (void) "ral Public License for more details.", "\x06\x6d\x3c\x4e\xc9\xba\x89\x75\x16\x90\x96\x4e\xfd\x43\x07\xde" "\x04\xca\x69\x6b" }, + { GCRY_MD_RMD160, "!", + "\x52\x78\x32\x43\xc1\x69\x7b\xdb\xe1\x6d\x37\xf9\x7f\x68\xf0\x83" + "\x25\xdc\x15\x28" }, { GCRY_MD_CRC32, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32, "foo", "\x8c\x73\x65\x21" }, { GCRY_MD_CRC32, @@ -5525,6 +5539,7 @@ check_digests (void) "ral Public License for more details.", "\x4A\x53\x7D\x67" }, { GCRY_MD_CRC32, "123456789", "\xcb\xf4\x39\x26" }, + { GCRY_MD_CRC32, "!", "\xdc\x25\xbf\xbc" }, { GCRY_MD_CRC32_RFC1510, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32_RFC1510, "foo", "\x73\x32\xbc\x33" }, { GCRY_MD_CRC32_RFC1510, "test0123456789", "\xb8\x3e\x88\xd6" }, From jussi.kivilinna at iki.fi Sat May 2 23:54:02 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 May 2015 00:54:02 +0300 Subject: [PATCH] More optimized CRC implementations Message-ID: <20150502215402.12454.62823.stgit@localhost6.localdomain6> * cipher/crc.c (crc32_table, crc24_table): Replace with new table contents. (update_crc32, CRC24_INIT, CRC24_POLY): Remove. (crc32_next, crc32_next4, crc24_init, crc24_next, crc24_next4) (crc24_final): New. (crc24rfc2440_init): Use crc24_init. (crc32_write): Rewrite to use crc32_next & crc32_next4. (crc24_write): Rewrite to use crc24_next & crc24_next4. (crc32_final, crc32rfc1510_final): Use buf_put_be32. (crc24rfc2440_final): Use crc24_final & buf_put_le32. * tests/basic.c (check_digests): Add CRC "123456789" tests. -- Patch adds more optimized CRC implementations generated with universal_crc tool by Danjel McGougan: http://www.mcgougan.se/universal_crc/ Benchmark on Intel Haswell (no-turbo, 3200 Mhz): Before: CRC32 | 2.52 ns/B 378.3 MiB/s 8.07 c/B CRC32RFC1510 | 2.52 ns/B 378.1 MiB/s 8.07 c/B CRC24RFC2440 | 46.62 ns/B 20.46 MiB/s 149.2 c/B After: CRC32 | 0.918 ns/B 1039.3 MiB/s 2.94 c/B CRC32RFC1510 | 0.918 ns/B 1039.0 MiB/s 2.94 c/B CRC24RFC2440 | 0.918 ns/B 1039.4 MiB/s 2.94 c/B Signed-off-by: Jussi Kivilinna --- cipher/crc.c | 817 ++++++++++++++++++++++++++++++++++++++++++++++----------- tests/basic.c | 3 2 files changed, 660 insertions(+), 160 deletions(-) diff --git a/cipher/crc.c b/cipher/crc.c index 1322f0d..9105dfe 100644 --- a/cipher/crc.c +++ b/cipher/crc.c @@ -28,125 +28,311 @@ #include "cipher.h" #include "bithelp.h" +#include "bufhelp.h" + + +typedef struct +{ + u32 CRC; + byte buf[4]; +} +CRC_CONTEXT; -/* Table of CRCs of all 8-bit messages. Generated by running code - from RFC 1952 modified to print out the table. */ -static u32 crc32_table[256] = { - 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, - 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, - 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, - 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, - 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, - 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, - 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, - 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, - 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, - 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, - 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, - 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, - 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, - 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, - 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, - 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, - 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, - 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, - 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, - 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, - 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, - 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, - 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, - 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, - 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, - 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, - 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, - 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, - 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, - 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, - 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, - 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, - 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, - 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, - 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, - 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, - 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, - 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, - 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, - 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, - 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, - 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, - 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d -}; /* - * The following function was extracted from RFC 1952 by Simon - * Josefsson, for the Shishi project, and modified to be compatible - * with the modified CRC-32 used by RFC 1510, and subsequently - * modified for GNU Libgcrypt to allow it to be used for calculating - * both unmodified CRC-32 and modified CRC-32 values. Original - * copyright and notice from the document follows: + * Code generated by universal_crc by Danjel McGougan * - * Copyright (c) 1996 L. Peter Deutsch - * - * Permission is granted to copy and distribute this document for - * any purpose and without charge, including translations into - * other languages and incorporation into compilations, provided - * that the copyright notice and this notice are preserved, and - * that any substantive changes or deletions from the original are - * clearly marked. - * - * The copyright on RFCs, and consequently the function below, are - * supposedly also retroactively claimed by the Internet Society - * (according to rfc-editor at rfc-editor.org), with the following - * copyright notice: - * - * Copyright (C) The Internet Society. All Rights Reserved. - * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright - * notice and this paragraph are included on all such copies and - * derivative works. However, this document itself may not be - * modified in any way, such as by removing the copyright notice or - * references to the Internet Society or other Internet - * organizations, except as needed for the purpose of developing - * Internet standards in which case the procedures for copyrights - * defined in the Internet Standards process must be followed, or - * as required to translate it into languages other than English. - * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided - * on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A - * PARTICULAR PURPOSE. + * CRC parameters used: + * bits: 32 + * poly: 0x04c11db7 + * init: 0xffffffff + * xor: 0xffffffff + * reverse: true + * non-direct: false * + * CRC of the string "123456789" is 0xcbf43926 */ -static u32 -update_crc32 (u32 crc, const void *buf_arg, size_t len) -{ - const char *buf = buf_arg; - size_t n; - for (n = 0; n < len; n++) - crc = crc32_table[(crc ^ buf[n]) & 0xff] ^ (crc >> 8); +static const u32 crc32_table[1024] = { + 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, + 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, + 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, + 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, + 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, + 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, + 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, + 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, + 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, + 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, + 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, + 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, + 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, + 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, + 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, + 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, + 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, + 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, + 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, + 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, + 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, + 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, + 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, + 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, + 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, + 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, + 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, + 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, + 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, + 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, + 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, + 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, + 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, + 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, + 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, + 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, + 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, + 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, + 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, + 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, + 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, + 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, + 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, + 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, + 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, + 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, + 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, + 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, + 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, + 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, + 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, + 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, + 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, + 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, + 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, + 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, + 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, + 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, + 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, + 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, + 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, + 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, + 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, + 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d, + 0x00000000, 0x191b3141, 0x32366282, 0x2b2d53c3, + 0x646cc504, 0x7d77f445, 0x565aa786, 0x4f4196c7, + 0xc8d98a08, 0xd1c2bb49, 0xfaefe88a, 0xe3f4d9cb, + 0xacb54f0c, 0xb5ae7e4d, 0x9e832d8e, 0x87981ccf, + 0x4ac21251, 0x53d92310, 0x78f470d3, 0x61ef4192, + 0x2eaed755, 0x37b5e614, 0x1c98b5d7, 0x05838496, + 0x821b9859, 0x9b00a918, 0xb02dfadb, 0xa936cb9a, + 0xe6775d5d, 0xff6c6c1c, 0xd4413fdf, 0xcd5a0e9e, + 0x958424a2, 0x8c9f15e3, 0xa7b24620, 0xbea97761, + 0xf1e8e1a6, 0xe8f3d0e7, 0xc3de8324, 0xdac5b265, + 0x5d5daeaa, 0x44469feb, 0x6f6bcc28, 0x7670fd69, + 0x39316bae, 0x202a5aef, 0x0b07092c, 0x121c386d, + 0xdf4636f3, 0xc65d07b2, 0xed705471, 0xf46b6530, + 0xbb2af3f7, 0xa231c2b6, 0x891c9175, 0x9007a034, + 0x179fbcfb, 0x0e848dba, 0x25a9de79, 0x3cb2ef38, + 0x73f379ff, 0x6ae848be, 0x41c51b7d, 0x58de2a3c, + 0xf0794f05, 0xe9627e44, 0xc24f2d87, 0xdb541cc6, + 0x94158a01, 0x8d0ebb40, 0xa623e883, 0xbf38d9c2, + 0x38a0c50d, 0x21bbf44c, 0x0a96a78f, 0x138d96ce, + 0x5ccc0009, 0x45d73148, 0x6efa628b, 0x77e153ca, + 0xbabb5d54, 0xa3a06c15, 0x888d3fd6, 0x91960e97, + 0xded79850, 0xc7cca911, 0xece1fad2, 0xf5facb93, + 0x7262d75c, 0x6b79e61d, 0x4054b5de, 0x594f849f, + 0x160e1258, 0x0f152319, 0x243870da, 0x3d23419b, + 0x65fd6ba7, 0x7ce65ae6, 0x57cb0925, 0x4ed03864, + 0x0191aea3, 0x188a9fe2, 0x33a7cc21, 0x2abcfd60, + 0xad24e1af, 0xb43fd0ee, 0x9f12832d, 0x8609b26c, + 0xc94824ab, 0xd05315ea, 0xfb7e4629, 0xe2657768, + 0x2f3f79f6, 0x362448b7, 0x1d091b74, 0x04122a35, + 0x4b53bcf2, 0x52488db3, 0x7965de70, 0x607eef31, + 0xe7e6f3fe, 0xfefdc2bf, 0xd5d0917c, 0xcccba03d, + 0x838a36fa, 0x9a9107bb, 0xb1bc5478, 0xa8a76539, + 0x3b83984b, 0x2298a90a, 0x09b5fac9, 0x10aecb88, + 0x5fef5d4f, 0x46f46c0e, 0x6dd93fcd, 0x74c20e8c, + 0xf35a1243, 0xea412302, 0xc16c70c1, 0xd8774180, + 0x9736d747, 0x8e2de606, 0xa500b5c5, 0xbc1b8484, + 0x71418a1a, 0x685abb5b, 0x4377e898, 0x5a6cd9d9, + 0x152d4f1e, 0x0c367e5f, 0x271b2d9c, 0x3e001cdd, + 0xb9980012, 0xa0833153, 0x8bae6290, 0x92b553d1, + 0xddf4c516, 0xc4eff457, 0xefc2a794, 0xf6d996d5, + 0xae07bce9, 0xb71c8da8, 0x9c31de6b, 0x852aef2a, + 0xca6b79ed, 0xd37048ac, 0xf85d1b6f, 0xe1462a2e, + 0x66de36e1, 0x7fc507a0, 0x54e85463, 0x4df36522, + 0x02b2f3e5, 0x1ba9c2a4, 0x30849167, 0x299fa026, + 0xe4c5aeb8, 0xfdde9ff9, 0xd6f3cc3a, 0xcfe8fd7b, + 0x80a96bbc, 0x99b25afd, 0xb29f093e, 0xab84387f, + 0x2c1c24b0, 0x350715f1, 0x1e2a4632, 0x07317773, + 0x4870e1b4, 0x516bd0f5, 0x7a468336, 0x635db277, + 0xcbfad74e, 0xd2e1e60f, 0xf9ccb5cc, 0xe0d7848d, + 0xaf96124a, 0xb68d230b, 0x9da070c8, 0x84bb4189, + 0x03235d46, 0x1a386c07, 0x31153fc4, 0x280e0e85, + 0x674f9842, 0x7e54a903, 0x5579fac0, 0x4c62cb81, + 0x8138c51f, 0x9823f45e, 0xb30ea79d, 0xaa1596dc, + 0xe554001b, 0xfc4f315a, 0xd7626299, 0xce7953d8, + 0x49e14f17, 0x50fa7e56, 0x7bd72d95, 0x62cc1cd4, + 0x2d8d8a13, 0x3496bb52, 0x1fbbe891, 0x06a0d9d0, + 0x5e7ef3ec, 0x4765c2ad, 0x6c48916e, 0x7553a02f, + 0x3a1236e8, 0x230907a9, 0x0824546a, 0x113f652b, + 0x96a779e4, 0x8fbc48a5, 0xa4911b66, 0xbd8a2a27, + 0xf2cbbce0, 0xebd08da1, 0xc0fdde62, 0xd9e6ef23, + 0x14bce1bd, 0x0da7d0fc, 0x268a833f, 0x3f91b27e, + 0x70d024b9, 0x69cb15f8, 0x42e6463b, 0x5bfd777a, + 0xdc656bb5, 0xc57e5af4, 0xee530937, 0xf7483876, + 0xb809aeb1, 0xa1129ff0, 0x8a3fcc33, 0x9324fd72, + 0x00000000, 0x01c26a37, 0x0384d46e, 0x0246be59, + 0x0709a8dc, 0x06cbc2eb, 0x048d7cb2, 0x054f1685, + 0x0e1351b8, 0x0fd13b8f, 0x0d9785d6, 0x0c55efe1, + 0x091af964, 0x08d89353, 0x0a9e2d0a, 0x0b5c473d, + 0x1c26a370, 0x1de4c947, 0x1fa2771e, 0x1e601d29, + 0x1b2f0bac, 0x1aed619b, 0x18abdfc2, 0x1969b5f5, + 0x1235f2c8, 0x13f798ff, 0x11b126a6, 0x10734c91, + 0x153c5a14, 0x14fe3023, 0x16b88e7a, 0x177ae44d, + 0x384d46e0, 0x398f2cd7, 0x3bc9928e, 0x3a0bf8b9, + 0x3f44ee3c, 0x3e86840b, 0x3cc03a52, 0x3d025065, + 0x365e1758, 0x379c7d6f, 0x35dac336, 0x3418a901, + 0x3157bf84, 0x3095d5b3, 0x32d36bea, 0x331101dd, + 0x246be590, 0x25a98fa7, 0x27ef31fe, 0x262d5bc9, + 0x23624d4c, 0x22a0277b, 0x20e69922, 0x2124f315, + 0x2a78b428, 0x2bbade1f, 0x29fc6046, 0x283e0a71, + 0x2d711cf4, 0x2cb376c3, 0x2ef5c89a, 0x2f37a2ad, + 0x709a8dc0, 0x7158e7f7, 0x731e59ae, 0x72dc3399, + 0x7793251c, 0x76514f2b, 0x7417f172, 0x75d59b45, + 0x7e89dc78, 0x7f4bb64f, 0x7d0d0816, 0x7ccf6221, + 0x798074a4, 0x78421e93, 0x7a04a0ca, 0x7bc6cafd, + 0x6cbc2eb0, 0x6d7e4487, 0x6f38fade, 0x6efa90e9, + 0x6bb5866c, 0x6a77ec5b, 0x68315202, 0x69f33835, + 0x62af7f08, 0x636d153f, 0x612bab66, 0x60e9c151, + 0x65a6d7d4, 0x6464bde3, 0x662203ba, 0x67e0698d, + 0x48d7cb20, 0x4915a117, 0x4b531f4e, 0x4a917579, + 0x4fde63fc, 0x4e1c09cb, 0x4c5ab792, 0x4d98dda5, + 0x46c49a98, 0x4706f0af, 0x45404ef6, 0x448224c1, + 0x41cd3244, 0x400f5873, 0x4249e62a, 0x438b8c1d, + 0x54f16850, 0x55330267, 0x5775bc3e, 0x56b7d609, + 0x53f8c08c, 0x523aaabb, 0x507c14e2, 0x51be7ed5, + 0x5ae239e8, 0x5b2053df, 0x5966ed86, 0x58a487b1, + 0x5deb9134, 0x5c29fb03, 0x5e6f455a, 0x5fad2f6d, + 0xe1351b80, 0xe0f771b7, 0xe2b1cfee, 0xe373a5d9, + 0xe63cb35c, 0xe7fed96b, 0xe5b86732, 0xe47a0d05, + 0xef264a38, 0xeee4200f, 0xeca29e56, 0xed60f461, + 0xe82fe2e4, 0xe9ed88d3, 0xebab368a, 0xea695cbd, + 0xfd13b8f0, 0xfcd1d2c7, 0xfe976c9e, 0xff5506a9, + 0xfa1a102c, 0xfbd87a1b, 0xf99ec442, 0xf85cae75, + 0xf300e948, 0xf2c2837f, 0xf0843d26, 0xf1465711, + 0xf4094194, 0xf5cb2ba3, 0xf78d95fa, 0xf64fffcd, + 0xd9785d60, 0xd8ba3757, 0xdafc890e, 0xdb3ee339, + 0xde71f5bc, 0xdfb39f8b, 0xddf521d2, 0xdc374be5, + 0xd76b0cd8, 0xd6a966ef, 0xd4efd8b6, 0xd52db281, + 0xd062a404, 0xd1a0ce33, 0xd3e6706a, 0xd2241a5d, + 0xc55efe10, 0xc49c9427, 0xc6da2a7e, 0xc7184049, + 0xc25756cc, 0xc3953cfb, 0xc1d382a2, 0xc011e895, + 0xcb4dafa8, 0xca8fc59f, 0xc8c97bc6, 0xc90b11f1, + 0xcc440774, 0xcd866d43, 0xcfc0d31a, 0xce02b92d, + 0x91af9640, 0x906dfc77, 0x922b422e, 0x93e92819, + 0x96a63e9c, 0x976454ab, 0x9522eaf2, 0x94e080c5, + 0x9fbcc7f8, 0x9e7eadcf, 0x9c381396, 0x9dfa79a1, + 0x98b56f24, 0x99770513, 0x9b31bb4a, 0x9af3d17d, + 0x8d893530, 0x8c4b5f07, 0x8e0de15e, 0x8fcf8b69, + 0x8a809dec, 0x8b42f7db, 0x89044982, 0x88c623b5, + 0x839a6488, 0x82580ebf, 0x801eb0e6, 0x81dcdad1, + 0x8493cc54, 0x8551a663, 0x8717183a, 0x86d5720d, + 0xa9e2d0a0, 0xa820ba97, 0xaa6604ce, 0xaba46ef9, + 0xaeeb787c, 0xaf29124b, 0xad6fac12, 0xacadc625, + 0xa7f18118, 0xa633eb2f, 0xa4755576, 0xa5b73f41, + 0xa0f829c4, 0xa13a43f3, 0xa37cfdaa, 0xa2be979d, + 0xb5c473d0, 0xb40619e7, 0xb640a7be, 0xb782cd89, + 0xb2cddb0c, 0xb30fb13b, 0xb1490f62, 0xb08b6555, + 0xbbd72268, 0xba15485f, 0xb853f606, 0xb9919c31, + 0xbcde8ab4, 0xbd1ce083, 0xbf5a5eda, 0xbe9834ed, + 0x00000000, 0xb8bc6765, 0xaa09c88b, 0x12b5afee, + 0x8f629757, 0x37def032, 0x256b5fdc, 0x9dd738b9, + 0xc5b428ef, 0x7d084f8a, 0x6fbde064, 0xd7018701, + 0x4ad6bfb8, 0xf26ad8dd, 0xe0df7733, 0x58631056, + 0x5019579f, 0xe8a530fa, 0xfa109f14, 0x42acf871, + 0xdf7bc0c8, 0x67c7a7ad, 0x75720843, 0xcdce6f26, + 0x95ad7f70, 0x2d111815, 0x3fa4b7fb, 0x8718d09e, + 0x1acfe827, 0xa2738f42, 0xb0c620ac, 0x087a47c9, + 0xa032af3e, 0x188ec85b, 0x0a3b67b5, 0xb28700d0, + 0x2f503869, 0x97ec5f0c, 0x8559f0e2, 0x3de59787, + 0x658687d1, 0xdd3ae0b4, 0xcf8f4f5a, 0x7733283f, + 0xeae41086, 0x525877e3, 0x40edd80d, 0xf851bf68, + 0xf02bf8a1, 0x48979fc4, 0x5a22302a, 0xe29e574f, + 0x7f496ff6, 0xc7f50893, 0xd540a77d, 0x6dfcc018, + 0x359fd04e, 0x8d23b72b, 0x9f9618c5, 0x272a7fa0, + 0xbafd4719, 0x0241207c, 0x10f48f92, 0xa848e8f7, + 0x9b14583d, 0x23a83f58, 0x311d90b6, 0x89a1f7d3, + 0x1476cf6a, 0xaccaa80f, 0xbe7f07e1, 0x06c36084, + 0x5ea070d2, 0xe61c17b7, 0xf4a9b859, 0x4c15df3c, + 0xd1c2e785, 0x697e80e0, 0x7bcb2f0e, 0xc377486b, + 0xcb0d0fa2, 0x73b168c7, 0x6104c729, 0xd9b8a04c, + 0x446f98f5, 0xfcd3ff90, 0xee66507e, 0x56da371b, + 0x0eb9274d, 0xb6054028, 0xa4b0efc6, 0x1c0c88a3, + 0x81dbb01a, 0x3967d77f, 0x2bd27891, 0x936e1ff4, + 0x3b26f703, 0x839a9066, 0x912f3f88, 0x299358ed, + 0xb4446054, 0x0cf80731, 0x1e4da8df, 0xa6f1cfba, + 0xfe92dfec, 0x462eb889, 0x549b1767, 0xec277002, + 0x71f048bb, 0xc94c2fde, 0xdbf98030, 0x6345e755, + 0x6b3fa09c, 0xd383c7f9, 0xc1366817, 0x798a0f72, + 0xe45d37cb, 0x5ce150ae, 0x4e54ff40, 0xf6e89825, + 0xae8b8873, 0x1637ef16, 0x048240f8, 0xbc3e279d, + 0x21e91f24, 0x99557841, 0x8be0d7af, 0x335cb0ca, + 0xed59b63b, 0x55e5d15e, 0x47507eb0, 0xffec19d5, + 0x623b216c, 0xda874609, 0xc832e9e7, 0x708e8e82, + 0x28ed9ed4, 0x9051f9b1, 0x82e4565f, 0x3a58313a, + 0xa78f0983, 0x1f336ee6, 0x0d86c108, 0xb53aa66d, + 0xbd40e1a4, 0x05fc86c1, 0x1749292f, 0xaff54e4a, + 0x322276f3, 0x8a9e1196, 0x982bbe78, 0x2097d91d, + 0x78f4c94b, 0xc048ae2e, 0xd2fd01c0, 0x6a4166a5, + 0xf7965e1c, 0x4f2a3979, 0x5d9f9697, 0xe523f1f2, + 0x4d6b1905, 0xf5d77e60, 0xe762d18e, 0x5fdeb6eb, + 0xc2098e52, 0x7ab5e937, 0x680046d9, 0xd0bc21bc, + 0x88df31ea, 0x3063568f, 0x22d6f961, 0x9a6a9e04, + 0x07bda6bd, 0xbf01c1d8, 0xadb46e36, 0x15080953, + 0x1d724e9a, 0xa5ce29ff, 0xb77b8611, 0x0fc7e174, + 0x9210d9cd, 0x2aacbea8, 0x38191146, 0x80a57623, + 0xd8c66675, 0x607a0110, 0x72cfaefe, 0xca73c99b, + 0x57a4f122, 0xef189647, 0xfdad39a9, 0x45115ecc, + 0x764dee06, 0xcef18963, 0xdc44268d, 0x64f841e8, + 0xf92f7951, 0x41931e34, 0x5326b1da, 0xeb9ad6bf, + 0xb3f9c6e9, 0x0b45a18c, 0x19f00e62, 0xa14c6907, + 0x3c9b51be, 0x842736db, 0x96929935, 0x2e2efe50, + 0x2654b999, 0x9ee8defc, 0x8c5d7112, 0x34e11677, + 0xa9362ece, 0x118a49ab, 0x033fe645, 0xbb838120, + 0xe3e09176, 0x5b5cf613, 0x49e959fd, 0xf1553e98, + 0x6c820621, 0xd43e6144, 0xc68bceaa, 0x7e37a9cf, + 0xd67f4138, 0x6ec3265d, 0x7c7689b3, 0xc4caeed6, + 0x591dd66f, 0xe1a1b10a, 0xf3141ee4, 0x4ba87981, + 0x13cb69d7, 0xab770eb2, 0xb9c2a15c, 0x017ec639, + 0x9ca9fe80, 0x241599e5, 0x36a0360b, 0x8e1c516e, + 0x866616a7, 0x3eda71c2, 0x2c6fde2c, 0x94d3b949, + 0x090481f0, 0xb1b8e695, 0xa30d497b, 0x1bb12e1e, + 0x43d23e48, 0xfb6e592d, 0xe9dbf6c3, 0x516791a6, + 0xccb0a91f, 0x740cce7a, 0x66b96194, 0xde0506f1 +}; - return crc; -} +/* CRC32 */ -typedef struct +static inline u32 +crc32_next (u32 crc, byte data) { - u32 CRC; - byte buf[4]; + return (crc >> 8) ^ crc32_table[(crc & 0xff) ^ data]; } -CRC_CONTEXT; -/* CRC32 */ +/* + * Process 4 bytes in one go + */ +static inline u32 +crc32_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc32_table[(crc & 0xff) + 0x300] ^ + crc32_table[((crc >> 8) & 0xff) + 0x200] ^ + crc32_table[((crc >> 16) & 0xff) + 0x100] ^ + crc32_table[(crc >> 24) & 0xff]; + return crc; +} static void crc32_init (void *context, unsigned int flags) @@ -159,12 +345,40 @@ crc32_init (void *context, unsigned int flags) } static void -crc32_write (void *context, const void *inbuf, size_t inlen) +crc32_write (void *context, const void *inbuf_arg, size_t inlen) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - if (!inbuf) + const byte *inbuf = inbuf_arg; + u32 crc; + + if (!inbuf || !inlen) return; - ctx->CRC = update_crc32 (ctx->CRC, inbuf, inlen); + + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc32_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc32_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc32_next(crc, *inbuf++); + } + + ctx->CRC = crc; } static byte * @@ -179,13 +393,12 @@ crc32_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; ctx->CRC ^= 0xffffffffL; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32 (ctx->buf, ctx->CRC); } /* CRC32 a'la RFC 1510 */ +/* CRC of the string "123456789" is 0x2dfd2d88 */ + static void crc32rfc1510_init (void *context, unsigned int flags) { @@ -200,47 +413,315 @@ static void crc32rfc1510_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32(ctx->buf, ctx->CRC); } /* CRC24 a'la RFC 2440 */ /* - * The following CRC 24 routines are adapted from RFC 2440, which has - * the following copyright notice: - * - * Copyright (C) The Internet Society (1998). All Rights Reserved. + * Code generated by universal_crc by Danjel McGougan * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright notice - * and this paragraph are included on all such copies and derivative - * works. However, this document itself may not be modified in any - * way, such as by removing the copyright notice or references to - * the Internet Society or other Internet organizations, except as - * needed for the purpose of developing Internet standards in which - * case the procedures for copyrights defined in the Internet - * Standards process must be followed, or as required to translate - * it into languages other than English. + * CRC parameters used: + * bits: 24 + * poly: 0x864cfb + * init: 0xb704ce + * xor: 0x000000 + * reverse: false + * non-direct: false * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided on - * an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR - * PURPOSE. + * CRC of the string "123456789" is 0x21cf02 + */ + +static const u32 crc24_table[1024] = +{ + 0x00000000, 0x00fb4c86, 0x000dd58a, 0x00f6990c, + 0x00e1e693, 0x001aaa15, 0x00ec3319, 0x00177f9f, + 0x003981a1, 0x00c2cd27, 0x0034542b, 0x00cf18ad, + 0x00d86732, 0x00232bb4, 0x00d5b2b8, 0x002efe3e, + 0x00894ec5, 0x00720243, 0x00849b4f, 0x007fd7c9, + 0x0068a856, 0x0093e4d0, 0x00657ddc, 0x009e315a, + 0x00b0cf64, 0x004b83e2, 0x00bd1aee, 0x00465668, + 0x005129f7, 0x00aa6571, 0x005cfc7d, 0x00a7b0fb, + 0x00e9d10c, 0x00129d8a, 0x00e40486, 0x001f4800, + 0x0008379f, 0x00f37b19, 0x0005e215, 0x00feae93, + 0x00d050ad, 0x002b1c2b, 0x00dd8527, 0x0026c9a1, + 0x0031b63e, 0x00cafab8, 0x003c63b4, 0x00c72f32, + 0x00609fc9, 0x009bd34f, 0x006d4a43, 0x009606c5, + 0x0081795a, 0x007a35dc, 0x008cacd0, 0x0077e056, + 0x00591e68, 0x00a252ee, 0x0054cbe2, 0x00af8764, + 0x00b8f8fb, 0x0043b47d, 0x00b52d71, 0x004e61f7, + 0x00d2a319, 0x0029ef9f, 0x00df7693, 0x00243a15, + 0x0033458a, 0x00c8090c, 0x003e9000, 0x00c5dc86, + 0x00eb22b8, 0x00106e3e, 0x00e6f732, 0x001dbbb4, + 0x000ac42b, 0x00f188ad, 0x000711a1, 0x00fc5d27, + 0x005beddc, 0x00a0a15a, 0x00563856, 0x00ad74d0, + 0x00ba0b4f, 0x004147c9, 0x00b7dec5, 0x004c9243, + 0x00626c7d, 0x009920fb, 0x006fb9f7, 0x0094f571, + 0x00838aee, 0x0078c668, 0x008e5f64, 0x007513e2, + 0x003b7215, 0x00c03e93, 0x0036a79f, 0x00cdeb19, + 0x00da9486, 0x0021d800, 0x00d7410c, 0x002c0d8a, + 0x0002f3b4, 0x00f9bf32, 0x000f263e, 0x00f46ab8, + 0x00e31527, 0x001859a1, 0x00eec0ad, 0x00158c2b, + 0x00b23cd0, 0x00497056, 0x00bfe95a, 0x0044a5dc, + 0x0053da43, 0x00a896c5, 0x005e0fc9, 0x00a5434f, + 0x008bbd71, 0x0070f1f7, 0x008668fb, 0x007d247d, + 0x006a5be2, 0x00911764, 0x00678e68, 0x009cc2ee, + 0x00a44733, 0x005f0bb5, 0x00a992b9, 0x0052de3f, + 0x0045a1a0, 0x00beed26, 0x0048742a, 0x00b338ac, + 0x009dc692, 0x00668a14, 0x00901318, 0x006b5f9e, + 0x007c2001, 0x00876c87, 0x0071f58b, 0x008ab90d, + 0x002d09f6, 0x00d64570, 0x0020dc7c, 0x00db90fa, + 0x00ccef65, 0x0037a3e3, 0x00c13aef, 0x003a7669, + 0x00148857, 0x00efc4d1, 0x00195ddd, 0x00e2115b, + 0x00f56ec4, 0x000e2242, 0x00f8bb4e, 0x0003f7c8, + 0x004d963f, 0x00b6dab9, 0x004043b5, 0x00bb0f33, + 0x00ac70ac, 0x00573c2a, 0x00a1a526, 0x005ae9a0, + 0x0074179e, 0x008f5b18, 0x0079c214, 0x00828e92, + 0x0095f10d, 0x006ebd8b, 0x00982487, 0x00636801, + 0x00c4d8fa, 0x003f947c, 0x00c90d70, 0x003241f6, + 0x00253e69, 0x00de72ef, 0x0028ebe3, 0x00d3a765, + 0x00fd595b, 0x000615dd, 0x00f08cd1, 0x000bc057, + 0x001cbfc8, 0x00e7f34e, 0x00116a42, 0x00ea26c4, + 0x0076e42a, 0x008da8ac, 0x007b31a0, 0x00807d26, + 0x009702b9, 0x006c4e3f, 0x009ad733, 0x00619bb5, + 0x004f658b, 0x00b4290d, 0x0042b001, 0x00b9fc87, + 0x00ae8318, 0x0055cf9e, 0x00a35692, 0x00581a14, + 0x00ffaaef, 0x0004e669, 0x00f27f65, 0x000933e3, + 0x001e4c7c, 0x00e500fa, 0x001399f6, 0x00e8d570, + 0x00c62b4e, 0x003d67c8, 0x00cbfec4, 0x0030b242, + 0x0027cddd, 0x00dc815b, 0x002a1857, 0x00d154d1, + 0x009f3526, 0x006479a0, 0x0092e0ac, 0x0069ac2a, + 0x007ed3b5, 0x00859f33, 0x0073063f, 0x00884ab9, + 0x00a6b487, 0x005df801, 0x00ab610d, 0x00502d8b, + 0x00475214, 0x00bc1e92, 0x004a879e, 0x00b1cb18, + 0x00167be3, 0x00ed3765, 0x001bae69, 0x00e0e2ef, + 0x00f79d70, 0x000cd1f6, 0x00fa48fa, 0x0001047c, + 0x002ffa42, 0x00d4b6c4, 0x00222fc8, 0x00d9634e, + 0x00ce1cd1, 0x00355057, 0x00c3c95b, 0x003885dd, + 0x00000000, 0x00488f66, 0x00901ecd, 0x00d891ab, + 0x00db711c, 0x0093fe7a, 0x004b6fd1, 0x0003e0b7, + 0x00b6e338, 0x00fe6c5e, 0x0026fdf5, 0x006e7293, + 0x006d9224, 0x00251d42, 0x00fd8ce9, 0x00b5038f, + 0x006cc771, 0x00244817, 0x00fcd9bc, 0x00b456da, + 0x00b7b66d, 0x00ff390b, 0x0027a8a0, 0x006f27c6, + 0x00da2449, 0x0092ab2f, 0x004a3a84, 0x0002b5e2, + 0x00015555, 0x0049da33, 0x00914b98, 0x00d9c4fe, + 0x00d88ee3, 0x00900185, 0x0048902e, 0x00001f48, + 0x0003ffff, 0x004b7099, 0x0093e132, 0x00db6e54, + 0x006e6ddb, 0x0026e2bd, 0x00fe7316, 0x00b6fc70, + 0x00b51cc7, 0x00fd93a1, 0x0025020a, 0x006d8d6c, + 0x00b44992, 0x00fcc6f4, 0x0024575f, 0x006cd839, + 0x006f388e, 0x0027b7e8, 0x00ff2643, 0x00b7a925, + 0x0002aaaa, 0x004a25cc, 0x0092b467, 0x00da3b01, + 0x00d9dbb6, 0x009154d0, 0x0049c57b, 0x00014a1d, + 0x004b5141, 0x0003de27, 0x00db4f8c, 0x0093c0ea, + 0x0090205d, 0x00d8af3b, 0x00003e90, 0x0048b1f6, + 0x00fdb279, 0x00b53d1f, 0x006dacb4, 0x002523d2, + 0x0026c365, 0x006e4c03, 0x00b6dda8, 0x00fe52ce, + 0x00279630, 0x006f1956, 0x00b788fd, 0x00ff079b, + 0x00fce72c, 0x00b4684a, 0x006cf9e1, 0x00247687, + 0x00917508, 0x00d9fa6e, 0x00016bc5, 0x0049e4a3, + 0x004a0414, 0x00028b72, 0x00da1ad9, 0x009295bf, + 0x0093dfa2, 0x00db50c4, 0x0003c16f, 0x004b4e09, + 0x0048aebe, 0x000021d8, 0x00d8b073, 0x00903f15, + 0x00253c9a, 0x006db3fc, 0x00b52257, 0x00fdad31, + 0x00fe4d86, 0x00b6c2e0, 0x006e534b, 0x0026dc2d, + 0x00ff18d3, 0x00b797b5, 0x006f061e, 0x00278978, + 0x002469cf, 0x006ce6a9, 0x00b47702, 0x00fcf864, + 0x0049fbeb, 0x0001748d, 0x00d9e526, 0x00916a40, + 0x00928af7, 0x00da0591, 0x0002943a, 0x004a1b5c, + 0x0096a282, 0x00de2de4, 0x0006bc4f, 0x004e3329, + 0x004dd39e, 0x00055cf8, 0x00ddcd53, 0x00954235, + 0x002041ba, 0x0068cedc, 0x00b05f77, 0x00f8d011, + 0x00fb30a6, 0x00b3bfc0, 0x006b2e6b, 0x0023a10d, + 0x00fa65f3, 0x00b2ea95, 0x006a7b3e, 0x0022f458, + 0x002114ef, 0x00699b89, 0x00b10a22, 0x00f98544, + 0x004c86cb, 0x000409ad, 0x00dc9806, 0x00941760, + 0x0097f7d7, 0x00df78b1, 0x0007e91a, 0x004f667c, + 0x004e2c61, 0x0006a307, 0x00de32ac, 0x0096bdca, + 0x00955d7d, 0x00ddd21b, 0x000543b0, 0x004dccd6, + 0x00f8cf59, 0x00b0403f, 0x0068d194, 0x00205ef2, + 0x0023be45, 0x006b3123, 0x00b3a088, 0x00fb2fee, + 0x0022eb10, 0x006a6476, 0x00b2f5dd, 0x00fa7abb, + 0x00f99a0c, 0x00b1156a, 0x006984c1, 0x00210ba7, + 0x00940828, 0x00dc874e, 0x000416e5, 0x004c9983, + 0x004f7934, 0x0007f652, 0x00df67f9, 0x0097e89f, + 0x00ddf3c3, 0x00957ca5, 0x004ded0e, 0x00056268, + 0x000682df, 0x004e0db9, 0x00969c12, 0x00de1374, + 0x006b10fb, 0x00239f9d, 0x00fb0e36, 0x00b38150, + 0x00b061e7, 0x00f8ee81, 0x00207f2a, 0x0068f04c, + 0x00b134b2, 0x00f9bbd4, 0x00212a7f, 0x0069a519, + 0x006a45ae, 0x0022cac8, 0x00fa5b63, 0x00b2d405, + 0x0007d78a, 0x004f58ec, 0x0097c947, 0x00df4621, + 0x00dca696, 0x009429f0, 0x004cb85b, 0x0004373d, + 0x00057d20, 0x004df246, 0x009563ed, 0x00ddec8b, + 0x00de0c3c, 0x0096835a, 0x004e12f1, 0x00069d97, + 0x00b39e18, 0x00fb117e, 0x002380d5, 0x006b0fb3, + 0x0068ef04, 0x00206062, 0x00f8f1c9, 0x00b07eaf, + 0x0069ba51, 0x00213537, 0x00f9a49c, 0x00b12bfa, + 0x00b2cb4d, 0x00fa442b, 0x0022d580, 0x006a5ae6, + 0x00df5969, 0x0097d60f, 0x004f47a4, 0x0007c8c2, + 0x00042875, 0x004ca713, 0x009436b8, 0x00dcb9de, + 0x00000000, 0x00d70983, 0x00555f80, 0x00825603, + 0x0051f286, 0x0086fb05, 0x0004ad06, 0x00d3a485, + 0x0059a88b, 0x008ea108, 0x000cf70b, 0x00dbfe88, + 0x00085a0d, 0x00df538e, 0x005d058d, 0x008a0c0e, + 0x00491c91, 0x009e1512, 0x001c4311, 0x00cb4a92, + 0x0018ee17, 0x00cfe794, 0x004db197, 0x009ab814, + 0x0010b41a, 0x00c7bd99, 0x0045eb9a, 0x0092e219, + 0x0041469c, 0x00964f1f, 0x0014191c, 0x00c3109f, + 0x006974a4, 0x00be7d27, 0x003c2b24, 0x00eb22a7, + 0x00388622, 0x00ef8fa1, 0x006dd9a2, 0x00bad021, + 0x0030dc2f, 0x00e7d5ac, 0x006583af, 0x00b28a2c, + 0x00612ea9, 0x00b6272a, 0x00347129, 0x00e378aa, + 0x00206835, 0x00f761b6, 0x007537b5, 0x00a23e36, + 0x00719ab3, 0x00a69330, 0x0024c533, 0x00f3ccb0, + 0x0079c0be, 0x00aec93d, 0x002c9f3e, 0x00fb96bd, + 0x00283238, 0x00ff3bbb, 0x007d6db8, 0x00aa643b, + 0x0029a4ce, 0x00fead4d, 0x007cfb4e, 0x00abf2cd, + 0x00785648, 0x00af5fcb, 0x002d09c8, 0x00fa004b, + 0x00700c45, 0x00a705c6, 0x002553c5, 0x00f25a46, + 0x0021fec3, 0x00f6f740, 0x0074a143, 0x00a3a8c0, + 0x0060b85f, 0x00b7b1dc, 0x0035e7df, 0x00e2ee5c, + 0x00314ad9, 0x00e6435a, 0x00641559, 0x00b31cda, + 0x003910d4, 0x00ee1957, 0x006c4f54, 0x00bb46d7, + 0x0068e252, 0x00bfebd1, 0x003dbdd2, 0x00eab451, + 0x0040d06a, 0x0097d9e9, 0x00158fea, 0x00c28669, + 0x001122ec, 0x00c62b6f, 0x00447d6c, 0x009374ef, + 0x001978e1, 0x00ce7162, 0x004c2761, 0x009b2ee2, + 0x00488a67, 0x009f83e4, 0x001dd5e7, 0x00cadc64, + 0x0009ccfb, 0x00dec578, 0x005c937b, 0x008b9af8, + 0x00583e7d, 0x008f37fe, 0x000d61fd, 0x00da687e, + 0x00506470, 0x00876df3, 0x00053bf0, 0x00d23273, + 0x000196f6, 0x00d69f75, 0x0054c976, 0x0083c0f5, + 0x00a9041b, 0x007e0d98, 0x00fc5b9b, 0x002b5218, + 0x00f8f69d, 0x002fff1e, 0x00ada91d, 0x007aa09e, + 0x00f0ac90, 0x0027a513, 0x00a5f310, 0x0072fa93, + 0x00a15e16, 0x00765795, 0x00f40196, 0x00230815, + 0x00e0188a, 0x00371109, 0x00b5470a, 0x00624e89, + 0x00b1ea0c, 0x0066e38f, 0x00e4b58c, 0x0033bc0f, + 0x00b9b001, 0x006eb982, 0x00ecef81, 0x003be602, + 0x00e84287, 0x003f4b04, 0x00bd1d07, 0x006a1484, + 0x00c070bf, 0x0017793c, 0x00952f3f, 0x004226bc, + 0x00918239, 0x00468bba, 0x00c4ddb9, 0x0013d43a, + 0x0099d834, 0x004ed1b7, 0x00cc87b4, 0x001b8e37, + 0x00c82ab2, 0x001f2331, 0x009d7532, 0x004a7cb1, + 0x00896c2e, 0x005e65ad, 0x00dc33ae, 0x000b3a2d, + 0x00d89ea8, 0x000f972b, 0x008dc128, 0x005ac8ab, + 0x00d0c4a5, 0x0007cd26, 0x00859b25, 0x005292a6, + 0x00813623, 0x00563fa0, 0x00d469a3, 0x00036020, + 0x0080a0d5, 0x0057a956, 0x00d5ff55, 0x0002f6d6, + 0x00d15253, 0x00065bd0, 0x00840dd3, 0x00530450, + 0x00d9085e, 0x000e01dd, 0x008c57de, 0x005b5e5d, + 0x0088fad8, 0x005ff35b, 0x00dda558, 0x000aacdb, + 0x00c9bc44, 0x001eb5c7, 0x009ce3c4, 0x004bea47, + 0x00984ec2, 0x004f4741, 0x00cd1142, 0x001a18c1, + 0x009014cf, 0x00471d4c, 0x00c54b4f, 0x001242cc, + 0x00c1e649, 0x0016efca, 0x0094b9c9, 0x0043b04a, + 0x00e9d471, 0x003eddf2, 0x00bc8bf1, 0x006b8272, + 0x00b826f7, 0x006f2f74, 0x00ed7977, 0x003a70f4, + 0x00b07cfa, 0x00677579, 0x00e5237a, 0x00322af9, + 0x00e18e7c, 0x003687ff, 0x00b4d1fc, 0x0063d87f, + 0x00a0c8e0, 0x0077c163, 0x00f59760, 0x00229ee3, + 0x00f13a66, 0x002633e5, 0x00a465e6, 0x00736c65, + 0x00f9606b, 0x002e69e8, 0x00ac3feb, 0x007b3668, + 0x00a892ed, 0x007f9b6e, 0x00fdcd6d, 0x002ac4ee, + 0x00000000, 0x00520936, 0x00a4126c, 0x00f61b5a, + 0x004825d8, 0x001a2cee, 0x00ec37b4, 0x00be3e82, + 0x006b0636, 0x00390f00, 0x00cf145a, 0x009d1d6c, + 0x002323ee, 0x00712ad8, 0x00873182, 0x00d538b4, + 0x00d60c6c, 0x0084055a, 0x00721e00, 0x00201736, + 0x009e29b4, 0x00cc2082, 0x003a3bd8, 0x006832ee, + 0x00bd0a5a, 0x00ef036c, 0x00191836, 0x004b1100, + 0x00f52f82, 0x00a726b4, 0x00513dee, 0x000334d8, + 0x00ac19d8, 0x00fe10ee, 0x00080bb4, 0x005a0282, + 0x00e43c00, 0x00b63536, 0x00402e6c, 0x0012275a, + 0x00c71fee, 0x009516d8, 0x00630d82, 0x003104b4, + 0x008f3a36, 0x00dd3300, 0x002b285a, 0x0079216c, + 0x007a15b4, 0x00281c82, 0x00de07d8, 0x008c0eee, + 0x0032306c, 0x0060395a, 0x00962200, 0x00c42b36, + 0x00111382, 0x00431ab4, 0x00b501ee, 0x00e708d8, + 0x0059365a, 0x000b3f6c, 0x00fd2436, 0x00af2d00, + 0x00a37f36, 0x00f17600, 0x00076d5a, 0x0055646c, + 0x00eb5aee, 0x00b953d8, 0x004f4882, 0x001d41b4, + 0x00c87900, 0x009a7036, 0x006c6b6c, 0x003e625a, + 0x00805cd8, 0x00d255ee, 0x00244eb4, 0x00764782, + 0x0075735a, 0x00277a6c, 0x00d16136, 0x00836800, + 0x003d5682, 0x006f5fb4, 0x009944ee, 0x00cb4dd8, + 0x001e756c, 0x004c7c5a, 0x00ba6700, 0x00e86e36, + 0x005650b4, 0x00045982, 0x00f242d8, 0x00a04bee, + 0x000f66ee, 0x005d6fd8, 0x00ab7482, 0x00f97db4, + 0x00474336, 0x00154a00, 0x00e3515a, 0x00b1586c, + 0x006460d8, 0x003669ee, 0x00c072b4, 0x00927b82, + 0x002c4500, 0x007e4c36, 0x0088576c, 0x00da5e5a, + 0x00d96a82, 0x008b63b4, 0x007d78ee, 0x002f71d8, + 0x00914f5a, 0x00c3466c, 0x00355d36, 0x00675400, + 0x00b26cb4, 0x00e06582, 0x00167ed8, 0x004477ee, + 0x00fa496c, 0x00a8405a, 0x005e5b00, 0x000c5236, + 0x0046ff6c, 0x0014f65a, 0x00e2ed00, 0x00b0e436, + 0x000edab4, 0x005cd382, 0x00aac8d8, 0x00f8c1ee, + 0x002df95a, 0x007ff06c, 0x0089eb36, 0x00dbe200, + 0x0065dc82, 0x0037d5b4, 0x00c1ceee, 0x0093c7d8, + 0x0090f300, 0x00c2fa36, 0x0034e16c, 0x0066e85a, + 0x00d8d6d8, 0x008adfee, 0x007cc4b4, 0x002ecd82, + 0x00fbf536, 0x00a9fc00, 0x005fe75a, 0x000dee6c, + 0x00b3d0ee, 0x00e1d9d8, 0x0017c282, 0x0045cbb4, + 0x00eae6b4, 0x00b8ef82, 0x004ef4d8, 0x001cfdee, + 0x00a2c36c, 0x00f0ca5a, 0x0006d100, 0x0054d836, + 0x0081e082, 0x00d3e9b4, 0x0025f2ee, 0x0077fbd8, + 0x00c9c55a, 0x009bcc6c, 0x006dd736, 0x003fde00, + 0x003cead8, 0x006ee3ee, 0x0098f8b4, 0x00caf182, + 0x0074cf00, 0x0026c636, 0x00d0dd6c, 0x0082d45a, + 0x0057ecee, 0x0005e5d8, 0x00f3fe82, 0x00a1f7b4, + 0x001fc936, 0x004dc000, 0x00bbdb5a, 0x00e9d26c, + 0x00e5805a, 0x00b7896c, 0x00419236, 0x00139b00, + 0x00ada582, 0x00ffacb4, 0x0009b7ee, 0x005bbed8, + 0x008e866c, 0x00dc8f5a, 0x002a9400, 0x00789d36, + 0x00c6a3b4, 0x0094aa82, 0x0062b1d8, 0x0030b8ee, + 0x00338c36, 0x00618500, 0x00979e5a, 0x00c5976c, + 0x007ba9ee, 0x0029a0d8, 0x00dfbb82, 0x008db2b4, + 0x00588a00, 0x000a8336, 0x00fc986c, 0x00ae915a, + 0x0010afd8, 0x0042a6ee, 0x00b4bdb4, 0x00e6b482, + 0x00499982, 0x001b90b4, 0x00ed8bee, 0x00bf82d8, + 0x0001bc5a, 0x0053b56c, 0x00a5ae36, 0x00f7a700, + 0x00229fb4, 0x00709682, 0x00868dd8, 0x00d484ee, + 0x006aba6c, 0x0038b35a, 0x00cea800, 0x009ca136, + 0x009f95ee, 0x00cd9cd8, 0x003b8782, 0x00698eb4, + 0x00d7b036, 0x0085b900, 0x0073a25a, 0x0021ab6c, + 0x00f493d8, 0x00a69aee, 0x005081b4, 0x00028882, + 0x00bcb600, 0x00eebf36, 0x0018a46c, 0x004aad5a +}; + +static inline +u32 crc24_init (void) +{ + return 0xce04b7; +} + +static inline +u32 crc24_next (u32 crc, byte data) +{ + return (crc >> 8) ^ crc24_table[(crc & 0xff) ^ data]; +} + +/* + * Process 4 bytes in one go */ +static inline +u32 crc24_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc24_table[(crc & 0xff) + 0x300] ^ + crc24_table[((crc >> 8) & 0xff) + 0x200] ^ + crc24_table[((crc >> 16) & 0xff) + 0x100] ^ + crc24_table[(crc >> 24) & 0xff]; + return crc; +} -#define CRC24_INIT 0xb704ceL -#define CRC24_POLY 0x1864cfbL +static inline +u32 crc24_final (u32 crc) +{ + return crc & 0xffffff; +} static void crc24rfc2440_init (void *context, unsigned int flags) @@ -249,36 +730,52 @@ crc24rfc2440_init (void *context, unsigned int flags) (void)flags; - ctx->CRC = CRC24_INIT; + ctx->CRC = crc24_init(); } static void crc24rfc2440_write (void *context, const void *inbuf_arg, size_t inlen) { const unsigned char *inbuf = inbuf_arg; - int i; CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; + u32 crc; - if (!inbuf) + if (!inbuf || !inlen) return; - while (inlen--) { - ctx->CRC ^= (*inbuf++) << 16; - for (i = 0; i < 8; i++) { - ctx->CRC <<= 1; - if (ctx->CRC & 0x1000000) - ctx->CRC ^= CRC24_POLY; + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc24_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc24_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc24_next(crc, *inbuf++); } - } + + ctx->CRC = crc; } static void crc24rfc2440_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[2] = (ctx->CRC ) & 0xFF; + ctx->CRC = crc24_final(ctx->CRC); + buf_put_le32 (ctx->buf, ctx->CRC); } /* We allow the CRC algorithms even in FIPS mode because they are diff --git a/tests/basic.c b/tests/basic.c index 2cf8dd0..bb07394 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5524,6 +5524,7 @@ check_digests (void) "TY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser Gene" "ral Public License for more details.", "\x4A\x53\x7D\x67" }, + { GCRY_MD_CRC32, "123456789", "\xcb\xf4\x39\x26" }, { GCRY_MD_CRC32_RFC1510, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32_RFC1510, "foo", "\x73\x32\xbc\x33" }, { GCRY_MD_CRC32_RFC1510, "test0123456789", "\xb8\x3e\x88\xd6" }, @@ -5539,8 +5540,10 @@ check_digests (void) { GCRY_MD_CRC32_RFC1510, "\x80\x00\x00\x00", "\xed\x59\xb6\x3b" }, { GCRY_MD_CRC32_RFC1510, "\x00\x00\x00\x01", "\x77\x07\x30\x96" }, #endif + { GCRY_MD_CRC32_RFC1510, "123456789", "\x2d\xfd\x2d\x88" }, { GCRY_MD_CRC24_RFC2440, "", "\xb7\x04\xce" }, { GCRY_MD_CRC24_RFC2440, "foo", "\x4f\xc2\x55" }, + { GCRY_MD_CRC24_RFC2440, "123456789", "\x21\xcf\x02" }, { GCRY_MD_TIGER, "", "\x24\xF0\x13\x0C\x63\xAC\x93\x32\x16\x16\x6E\x76" From jussi.kivilinna at iki.fi Sun May 3 00:23:21 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 May 2015 01:23:21 +0300 Subject: [PATCH] More optimized CRC implementations Message-ID: <20150502222321.18868.25379.stgit@localhost6.localdomain6> * cipher/crc.c (crc32_table, crc24_table): Replace with new table contents. (update_crc32, CRC24_INIT, CRC24_POLY): Remove. (crc32_next, crc32_next4, crc24_init, crc24_next, crc24_next4) (crc24_final): New. (crc24rfc2440_init): Use crc24_init. (crc32_write): Rewrite to use crc32_next & crc32_next4. (crc24_write): Rewrite to use crc24_next & crc24_next4. (crc32_final, crc32rfc1510_final): Use buf_put_be32. (crc24rfc2440_final): Use crc24_final & buf_put_le32. * tests/basic.c (check_digests): Add CRC "123456789" tests. -- Patch adds more optimized CRC implementations generated with universal_crc tool by Danjel McGougan: http://www.mcgougan.se/universal_crc/ Benchmark on Intel Haswell (no-turbo, 3200 Mhz): Before: CRC32 | 2.52 ns/B 378.3 MiB/s 8.07 c/B CRC32RFC1510 | 2.52 ns/B 378.1 MiB/s 8.07 c/B CRC24RFC2440 | 46.62 ns/B 20.46 MiB/s 149.2 c/B After: CRC32 | 0.918 ns/B 1039.3 MiB/s 2.94 c/B CRC32RFC1510 | 0.918 ns/B 1039.0 MiB/s 2.94 c/B CRC24RFC2440 | 0.918 ns/B 1039.4 MiB/s 2.94 c/B Signed-off-by: Jussi Kivilinna --- cipher/crc.c | 817 ++++++++++++++++++++++++++++++++++++++++++++++----------- tests/basic.c | 3 2 files changed, 660 insertions(+), 160 deletions(-) diff --git a/cipher/crc.c b/cipher/crc.c index 1322f0d..9105dfe 100644 --- a/cipher/crc.c +++ b/cipher/crc.c @@ -28,125 +28,311 @@ #include "cipher.h" #include "bithelp.h" +#include "bufhelp.h" + + +typedef struct +{ + u32 CRC; + byte buf[4]; +} +CRC_CONTEXT; -/* Table of CRCs of all 8-bit messages. Generated by running code - from RFC 1952 modified to print out the table. */ -static u32 crc32_table[256] = { - 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, - 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, - 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, - 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, - 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, - 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, - 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, - 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, - 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, - 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, - 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, - 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, - 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, - 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, - 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, - 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, - 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, - 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, - 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, - 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, - 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, - 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, - 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, - 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, - 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, - 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, - 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, - 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, - 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, - 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, - 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, - 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, - 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, - 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, - 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, - 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, - 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, - 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, - 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, - 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, - 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, - 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, - 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d -}; /* - * The following function was extracted from RFC 1952 by Simon - * Josefsson, for the Shishi project, and modified to be compatible - * with the modified CRC-32 used by RFC 1510, and subsequently - * modified for GNU Libgcrypt to allow it to be used for calculating - * both unmodified CRC-32 and modified CRC-32 values. Original - * copyright and notice from the document follows: + * Code generated by universal_crc by Danjel McGougan * - * Copyright (c) 1996 L. Peter Deutsch - * - * Permission is granted to copy and distribute this document for - * any purpose and without charge, including translations into - * other languages and incorporation into compilations, provided - * that the copyright notice and this notice are preserved, and - * that any substantive changes or deletions from the original are - * clearly marked. - * - * The copyright on RFCs, and consequently the function below, are - * supposedly also retroactively claimed by the Internet Society - * (according to rfc-editor at rfc-editor.org), with the following - * copyright notice: - * - * Copyright (C) The Internet Society. All Rights Reserved. - * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright - * notice and this paragraph are included on all such copies and - * derivative works. However, this document itself may not be - * modified in any way, such as by removing the copyright notice or - * references to the Internet Society or other Internet - * organizations, except as needed for the purpose of developing - * Internet standards in which case the procedures for copyrights - * defined in the Internet Standards process must be followed, or - * as required to translate it into languages other than English. - * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided - * on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A - * PARTICULAR PURPOSE. + * CRC parameters used: + * bits: 32 + * poly: 0x04c11db7 + * init: 0xffffffff + * xor: 0xffffffff + * reverse: true + * non-direct: false * + * CRC of the string "123456789" is 0xcbf43926 */ -static u32 -update_crc32 (u32 crc, const void *buf_arg, size_t len) -{ - const char *buf = buf_arg; - size_t n; - for (n = 0; n < len; n++) - crc = crc32_table[(crc ^ buf[n]) & 0xff] ^ (crc >> 8); +static const u32 crc32_table[1024] = { + 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, + 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, + 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, + 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, + 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, + 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, + 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, + 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, + 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, + 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, + 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, + 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, + 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, + 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, + 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, + 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, + 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, + 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, + 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, + 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, + 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, + 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, + 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, + 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, + 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, + 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, + 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, + 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, + 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, + 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, + 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, + 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, + 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, + 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, + 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, + 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, + 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, + 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, + 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, + 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, + 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, + 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, + 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, + 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, + 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, + 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, + 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, + 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, + 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, + 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, + 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, + 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, + 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, + 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, + 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, + 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, + 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, + 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, + 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, + 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, + 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, + 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, + 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, + 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d, + 0x00000000, 0x191b3141, 0x32366282, 0x2b2d53c3, + 0x646cc504, 0x7d77f445, 0x565aa786, 0x4f4196c7, + 0xc8d98a08, 0xd1c2bb49, 0xfaefe88a, 0xe3f4d9cb, + 0xacb54f0c, 0xb5ae7e4d, 0x9e832d8e, 0x87981ccf, + 0x4ac21251, 0x53d92310, 0x78f470d3, 0x61ef4192, + 0x2eaed755, 0x37b5e614, 0x1c98b5d7, 0x05838496, + 0x821b9859, 0x9b00a918, 0xb02dfadb, 0xa936cb9a, + 0xe6775d5d, 0xff6c6c1c, 0xd4413fdf, 0xcd5a0e9e, + 0x958424a2, 0x8c9f15e3, 0xa7b24620, 0xbea97761, + 0xf1e8e1a6, 0xe8f3d0e7, 0xc3de8324, 0xdac5b265, + 0x5d5daeaa, 0x44469feb, 0x6f6bcc28, 0x7670fd69, + 0x39316bae, 0x202a5aef, 0x0b07092c, 0x121c386d, + 0xdf4636f3, 0xc65d07b2, 0xed705471, 0xf46b6530, + 0xbb2af3f7, 0xa231c2b6, 0x891c9175, 0x9007a034, + 0x179fbcfb, 0x0e848dba, 0x25a9de79, 0x3cb2ef38, + 0x73f379ff, 0x6ae848be, 0x41c51b7d, 0x58de2a3c, + 0xf0794f05, 0xe9627e44, 0xc24f2d87, 0xdb541cc6, + 0x94158a01, 0x8d0ebb40, 0xa623e883, 0xbf38d9c2, + 0x38a0c50d, 0x21bbf44c, 0x0a96a78f, 0x138d96ce, + 0x5ccc0009, 0x45d73148, 0x6efa628b, 0x77e153ca, + 0xbabb5d54, 0xa3a06c15, 0x888d3fd6, 0x91960e97, + 0xded79850, 0xc7cca911, 0xece1fad2, 0xf5facb93, + 0x7262d75c, 0x6b79e61d, 0x4054b5de, 0x594f849f, + 0x160e1258, 0x0f152319, 0x243870da, 0x3d23419b, + 0x65fd6ba7, 0x7ce65ae6, 0x57cb0925, 0x4ed03864, + 0x0191aea3, 0x188a9fe2, 0x33a7cc21, 0x2abcfd60, + 0xad24e1af, 0xb43fd0ee, 0x9f12832d, 0x8609b26c, + 0xc94824ab, 0xd05315ea, 0xfb7e4629, 0xe2657768, + 0x2f3f79f6, 0x362448b7, 0x1d091b74, 0x04122a35, + 0x4b53bcf2, 0x52488db3, 0x7965de70, 0x607eef31, + 0xe7e6f3fe, 0xfefdc2bf, 0xd5d0917c, 0xcccba03d, + 0x838a36fa, 0x9a9107bb, 0xb1bc5478, 0xa8a76539, + 0x3b83984b, 0x2298a90a, 0x09b5fac9, 0x10aecb88, + 0x5fef5d4f, 0x46f46c0e, 0x6dd93fcd, 0x74c20e8c, + 0xf35a1243, 0xea412302, 0xc16c70c1, 0xd8774180, + 0x9736d747, 0x8e2de606, 0xa500b5c5, 0xbc1b8484, + 0x71418a1a, 0x685abb5b, 0x4377e898, 0x5a6cd9d9, + 0x152d4f1e, 0x0c367e5f, 0x271b2d9c, 0x3e001cdd, + 0xb9980012, 0xa0833153, 0x8bae6290, 0x92b553d1, + 0xddf4c516, 0xc4eff457, 0xefc2a794, 0xf6d996d5, + 0xae07bce9, 0xb71c8da8, 0x9c31de6b, 0x852aef2a, + 0xca6b79ed, 0xd37048ac, 0xf85d1b6f, 0xe1462a2e, + 0x66de36e1, 0x7fc507a0, 0x54e85463, 0x4df36522, + 0x02b2f3e5, 0x1ba9c2a4, 0x30849167, 0x299fa026, + 0xe4c5aeb8, 0xfdde9ff9, 0xd6f3cc3a, 0xcfe8fd7b, + 0x80a96bbc, 0x99b25afd, 0xb29f093e, 0xab84387f, + 0x2c1c24b0, 0x350715f1, 0x1e2a4632, 0x07317773, + 0x4870e1b4, 0x516bd0f5, 0x7a468336, 0x635db277, + 0xcbfad74e, 0xd2e1e60f, 0xf9ccb5cc, 0xe0d7848d, + 0xaf96124a, 0xb68d230b, 0x9da070c8, 0x84bb4189, + 0x03235d46, 0x1a386c07, 0x31153fc4, 0x280e0e85, + 0x674f9842, 0x7e54a903, 0x5579fac0, 0x4c62cb81, + 0x8138c51f, 0x9823f45e, 0xb30ea79d, 0xaa1596dc, + 0xe554001b, 0xfc4f315a, 0xd7626299, 0xce7953d8, + 0x49e14f17, 0x50fa7e56, 0x7bd72d95, 0x62cc1cd4, + 0x2d8d8a13, 0x3496bb52, 0x1fbbe891, 0x06a0d9d0, + 0x5e7ef3ec, 0x4765c2ad, 0x6c48916e, 0x7553a02f, + 0x3a1236e8, 0x230907a9, 0x0824546a, 0x113f652b, + 0x96a779e4, 0x8fbc48a5, 0xa4911b66, 0xbd8a2a27, + 0xf2cbbce0, 0xebd08da1, 0xc0fdde62, 0xd9e6ef23, + 0x14bce1bd, 0x0da7d0fc, 0x268a833f, 0x3f91b27e, + 0x70d024b9, 0x69cb15f8, 0x42e6463b, 0x5bfd777a, + 0xdc656bb5, 0xc57e5af4, 0xee530937, 0xf7483876, + 0xb809aeb1, 0xa1129ff0, 0x8a3fcc33, 0x9324fd72, + 0x00000000, 0x01c26a37, 0x0384d46e, 0x0246be59, + 0x0709a8dc, 0x06cbc2eb, 0x048d7cb2, 0x054f1685, + 0x0e1351b8, 0x0fd13b8f, 0x0d9785d6, 0x0c55efe1, + 0x091af964, 0x08d89353, 0x0a9e2d0a, 0x0b5c473d, + 0x1c26a370, 0x1de4c947, 0x1fa2771e, 0x1e601d29, + 0x1b2f0bac, 0x1aed619b, 0x18abdfc2, 0x1969b5f5, + 0x1235f2c8, 0x13f798ff, 0x11b126a6, 0x10734c91, + 0x153c5a14, 0x14fe3023, 0x16b88e7a, 0x177ae44d, + 0x384d46e0, 0x398f2cd7, 0x3bc9928e, 0x3a0bf8b9, + 0x3f44ee3c, 0x3e86840b, 0x3cc03a52, 0x3d025065, + 0x365e1758, 0x379c7d6f, 0x35dac336, 0x3418a901, + 0x3157bf84, 0x3095d5b3, 0x32d36bea, 0x331101dd, + 0x246be590, 0x25a98fa7, 0x27ef31fe, 0x262d5bc9, + 0x23624d4c, 0x22a0277b, 0x20e69922, 0x2124f315, + 0x2a78b428, 0x2bbade1f, 0x29fc6046, 0x283e0a71, + 0x2d711cf4, 0x2cb376c3, 0x2ef5c89a, 0x2f37a2ad, + 0x709a8dc0, 0x7158e7f7, 0x731e59ae, 0x72dc3399, + 0x7793251c, 0x76514f2b, 0x7417f172, 0x75d59b45, + 0x7e89dc78, 0x7f4bb64f, 0x7d0d0816, 0x7ccf6221, + 0x798074a4, 0x78421e93, 0x7a04a0ca, 0x7bc6cafd, + 0x6cbc2eb0, 0x6d7e4487, 0x6f38fade, 0x6efa90e9, + 0x6bb5866c, 0x6a77ec5b, 0x68315202, 0x69f33835, + 0x62af7f08, 0x636d153f, 0x612bab66, 0x60e9c151, + 0x65a6d7d4, 0x6464bde3, 0x662203ba, 0x67e0698d, + 0x48d7cb20, 0x4915a117, 0x4b531f4e, 0x4a917579, + 0x4fde63fc, 0x4e1c09cb, 0x4c5ab792, 0x4d98dda5, + 0x46c49a98, 0x4706f0af, 0x45404ef6, 0x448224c1, + 0x41cd3244, 0x400f5873, 0x4249e62a, 0x438b8c1d, + 0x54f16850, 0x55330267, 0x5775bc3e, 0x56b7d609, + 0x53f8c08c, 0x523aaabb, 0x507c14e2, 0x51be7ed5, + 0x5ae239e8, 0x5b2053df, 0x5966ed86, 0x58a487b1, + 0x5deb9134, 0x5c29fb03, 0x5e6f455a, 0x5fad2f6d, + 0xe1351b80, 0xe0f771b7, 0xe2b1cfee, 0xe373a5d9, + 0xe63cb35c, 0xe7fed96b, 0xe5b86732, 0xe47a0d05, + 0xef264a38, 0xeee4200f, 0xeca29e56, 0xed60f461, + 0xe82fe2e4, 0xe9ed88d3, 0xebab368a, 0xea695cbd, + 0xfd13b8f0, 0xfcd1d2c7, 0xfe976c9e, 0xff5506a9, + 0xfa1a102c, 0xfbd87a1b, 0xf99ec442, 0xf85cae75, + 0xf300e948, 0xf2c2837f, 0xf0843d26, 0xf1465711, + 0xf4094194, 0xf5cb2ba3, 0xf78d95fa, 0xf64fffcd, + 0xd9785d60, 0xd8ba3757, 0xdafc890e, 0xdb3ee339, + 0xde71f5bc, 0xdfb39f8b, 0xddf521d2, 0xdc374be5, + 0xd76b0cd8, 0xd6a966ef, 0xd4efd8b6, 0xd52db281, + 0xd062a404, 0xd1a0ce33, 0xd3e6706a, 0xd2241a5d, + 0xc55efe10, 0xc49c9427, 0xc6da2a7e, 0xc7184049, + 0xc25756cc, 0xc3953cfb, 0xc1d382a2, 0xc011e895, + 0xcb4dafa8, 0xca8fc59f, 0xc8c97bc6, 0xc90b11f1, + 0xcc440774, 0xcd866d43, 0xcfc0d31a, 0xce02b92d, + 0x91af9640, 0x906dfc77, 0x922b422e, 0x93e92819, + 0x96a63e9c, 0x976454ab, 0x9522eaf2, 0x94e080c5, + 0x9fbcc7f8, 0x9e7eadcf, 0x9c381396, 0x9dfa79a1, + 0x98b56f24, 0x99770513, 0x9b31bb4a, 0x9af3d17d, + 0x8d893530, 0x8c4b5f07, 0x8e0de15e, 0x8fcf8b69, + 0x8a809dec, 0x8b42f7db, 0x89044982, 0x88c623b5, + 0x839a6488, 0x82580ebf, 0x801eb0e6, 0x81dcdad1, + 0x8493cc54, 0x8551a663, 0x8717183a, 0x86d5720d, + 0xa9e2d0a0, 0xa820ba97, 0xaa6604ce, 0xaba46ef9, + 0xaeeb787c, 0xaf29124b, 0xad6fac12, 0xacadc625, + 0xa7f18118, 0xa633eb2f, 0xa4755576, 0xa5b73f41, + 0xa0f829c4, 0xa13a43f3, 0xa37cfdaa, 0xa2be979d, + 0xb5c473d0, 0xb40619e7, 0xb640a7be, 0xb782cd89, + 0xb2cddb0c, 0xb30fb13b, 0xb1490f62, 0xb08b6555, + 0xbbd72268, 0xba15485f, 0xb853f606, 0xb9919c31, + 0xbcde8ab4, 0xbd1ce083, 0xbf5a5eda, 0xbe9834ed, + 0x00000000, 0xb8bc6765, 0xaa09c88b, 0x12b5afee, + 0x8f629757, 0x37def032, 0x256b5fdc, 0x9dd738b9, + 0xc5b428ef, 0x7d084f8a, 0x6fbde064, 0xd7018701, + 0x4ad6bfb8, 0xf26ad8dd, 0xe0df7733, 0x58631056, + 0x5019579f, 0xe8a530fa, 0xfa109f14, 0x42acf871, + 0xdf7bc0c8, 0x67c7a7ad, 0x75720843, 0xcdce6f26, + 0x95ad7f70, 0x2d111815, 0x3fa4b7fb, 0x8718d09e, + 0x1acfe827, 0xa2738f42, 0xb0c620ac, 0x087a47c9, + 0xa032af3e, 0x188ec85b, 0x0a3b67b5, 0xb28700d0, + 0x2f503869, 0x97ec5f0c, 0x8559f0e2, 0x3de59787, + 0x658687d1, 0xdd3ae0b4, 0xcf8f4f5a, 0x7733283f, + 0xeae41086, 0x525877e3, 0x40edd80d, 0xf851bf68, + 0xf02bf8a1, 0x48979fc4, 0x5a22302a, 0xe29e574f, + 0x7f496ff6, 0xc7f50893, 0xd540a77d, 0x6dfcc018, + 0x359fd04e, 0x8d23b72b, 0x9f9618c5, 0x272a7fa0, + 0xbafd4719, 0x0241207c, 0x10f48f92, 0xa848e8f7, + 0x9b14583d, 0x23a83f58, 0x311d90b6, 0x89a1f7d3, + 0x1476cf6a, 0xaccaa80f, 0xbe7f07e1, 0x06c36084, + 0x5ea070d2, 0xe61c17b7, 0xf4a9b859, 0x4c15df3c, + 0xd1c2e785, 0x697e80e0, 0x7bcb2f0e, 0xc377486b, + 0xcb0d0fa2, 0x73b168c7, 0x6104c729, 0xd9b8a04c, + 0x446f98f5, 0xfcd3ff90, 0xee66507e, 0x56da371b, + 0x0eb9274d, 0xb6054028, 0xa4b0efc6, 0x1c0c88a3, + 0x81dbb01a, 0x3967d77f, 0x2bd27891, 0x936e1ff4, + 0x3b26f703, 0x839a9066, 0x912f3f88, 0x299358ed, + 0xb4446054, 0x0cf80731, 0x1e4da8df, 0xa6f1cfba, + 0xfe92dfec, 0x462eb889, 0x549b1767, 0xec277002, + 0x71f048bb, 0xc94c2fde, 0xdbf98030, 0x6345e755, + 0x6b3fa09c, 0xd383c7f9, 0xc1366817, 0x798a0f72, + 0xe45d37cb, 0x5ce150ae, 0x4e54ff40, 0xf6e89825, + 0xae8b8873, 0x1637ef16, 0x048240f8, 0xbc3e279d, + 0x21e91f24, 0x99557841, 0x8be0d7af, 0x335cb0ca, + 0xed59b63b, 0x55e5d15e, 0x47507eb0, 0xffec19d5, + 0x623b216c, 0xda874609, 0xc832e9e7, 0x708e8e82, + 0x28ed9ed4, 0x9051f9b1, 0x82e4565f, 0x3a58313a, + 0xa78f0983, 0x1f336ee6, 0x0d86c108, 0xb53aa66d, + 0xbd40e1a4, 0x05fc86c1, 0x1749292f, 0xaff54e4a, + 0x322276f3, 0x8a9e1196, 0x982bbe78, 0x2097d91d, + 0x78f4c94b, 0xc048ae2e, 0xd2fd01c0, 0x6a4166a5, + 0xf7965e1c, 0x4f2a3979, 0x5d9f9697, 0xe523f1f2, + 0x4d6b1905, 0xf5d77e60, 0xe762d18e, 0x5fdeb6eb, + 0xc2098e52, 0x7ab5e937, 0x680046d9, 0xd0bc21bc, + 0x88df31ea, 0x3063568f, 0x22d6f961, 0x9a6a9e04, + 0x07bda6bd, 0xbf01c1d8, 0xadb46e36, 0x15080953, + 0x1d724e9a, 0xa5ce29ff, 0xb77b8611, 0x0fc7e174, + 0x9210d9cd, 0x2aacbea8, 0x38191146, 0x80a57623, + 0xd8c66675, 0x607a0110, 0x72cfaefe, 0xca73c99b, + 0x57a4f122, 0xef189647, 0xfdad39a9, 0x45115ecc, + 0x764dee06, 0xcef18963, 0xdc44268d, 0x64f841e8, + 0xf92f7951, 0x41931e34, 0x5326b1da, 0xeb9ad6bf, + 0xb3f9c6e9, 0x0b45a18c, 0x19f00e62, 0xa14c6907, + 0x3c9b51be, 0x842736db, 0x96929935, 0x2e2efe50, + 0x2654b999, 0x9ee8defc, 0x8c5d7112, 0x34e11677, + 0xa9362ece, 0x118a49ab, 0x033fe645, 0xbb838120, + 0xe3e09176, 0x5b5cf613, 0x49e959fd, 0xf1553e98, + 0x6c820621, 0xd43e6144, 0xc68bceaa, 0x7e37a9cf, + 0xd67f4138, 0x6ec3265d, 0x7c7689b3, 0xc4caeed6, + 0x591dd66f, 0xe1a1b10a, 0xf3141ee4, 0x4ba87981, + 0x13cb69d7, 0xab770eb2, 0xb9c2a15c, 0x017ec639, + 0x9ca9fe80, 0x241599e5, 0x36a0360b, 0x8e1c516e, + 0x866616a7, 0x3eda71c2, 0x2c6fde2c, 0x94d3b949, + 0x090481f0, 0xb1b8e695, 0xa30d497b, 0x1bb12e1e, + 0x43d23e48, 0xfb6e592d, 0xe9dbf6c3, 0x516791a6, + 0xccb0a91f, 0x740cce7a, 0x66b96194, 0xde0506f1 +}; - return crc; -} +/* CRC32 */ -typedef struct +static inline u32 +crc32_next (u32 crc, byte data) { - u32 CRC; - byte buf[4]; + return (crc >> 8) ^ crc32_table[(crc & 0xff) ^ data]; } -CRC_CONTEXT; -/* CRC32 */ +/* + * Process 4 bytes in one go + */ +static inline u32 +crc32_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc32_table[(crc & 0xff) + 0x300] ^ + crc32_table[((crc >> 8) & 0xff) + 0x200] ^ + crc32_table[((crc >> 16) & 0xff) + 0x100] ^ + crc32_table[(crc >> 24) & 0xff]; + return crc; +} static void crc32_init (void *context, unsigned int flags) @@ -159,12 +345,40 @@ crc32_init (void *context, unsigned int flags) } static void -crc32_write (void *context, const void *inbuf, size_t inlen) +crc32_write (void *context, const void *inbuf_arg, size_t inlen) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - if (!inbuf) + const byte *inbuf = inbuf_arg; + u32 crc; + + if (!inbuf || !inlen) return; - ctx->CRC = update_crc32 (ctx->CRC, inbuf, inlen); + + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc32_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc32_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc32_next(crc, *inbuf++); + } + + ctx->CRC = crc; } static byte * @@ -179,13 +393,12 @@ crc32_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; ctx->CRC ^= 0xffffffffL; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32 (ctx->buf, ctx->CRC); } /* CRC32 a'la RFC 1510 */ +/* CRC of the string "123456789" is 0x2dfd2d88 */ + static void crc32rfc1510_init (void *context, unsigned int flags) { @@ -200,47 +413,315 @@ static void crc32rfc1510_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32(ctx->buf, ctx->CRC); } /* CRC24 a'la RFC 2440 */ /* - * The following CRC 24 routines are adapted from RFC 2440, which has - * the following copyright notice: - * - * Copyright (C) The Internet Society (1998). All Rights Reserved. + * Code generated by universal_crc by Danjel McGougan * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright notice - * and this paragraph are included on all such copies and derivative - * works. However, this document itself may not be modified in any - * way, such as by removing the copyright notice or references to - * the Internet Society or other Internet organizations, except as - * needed for the purpose of developing Internet standards in which - * case the procedures for copyrights defined in the Internet - * Standards process must be followed, or as required to translate - * it into languages other than English. + * CRC parameters used: + * bits: 24 + * poly: 0x864cfb + * init: 0xb704ce + * xor: 0x000000 + * reverse: false + * non-direct: false * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided on - * an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR - * PURPOSE. + * CRC of the string "123456789" is 0x21cf02 + */ + +static const u32 crc24_table[1024] = +{ + 0x00000000, 0x00fb4c86, 0x000dd58a, 0x00f6990c, + 0x00e1e693, 0x001aaa15, 0x00ec3319, 0x00177f9f, + 0x003981a1, 0x00c2cd27, 0x0034542b, 0x00cf18ad, + 0x00d86732, 0x00232bb4, 0x00d5b2b8, 0x002efe3e, + 0x00894ec5, 0x00720243, 0x00849b4f, 0x007fd7c9, + 0x0068a856, 0x0093e4d0, 0x00657ddc, 0x009e315a, + 0x00b0cf64, 0x004b83e2, 0x00bd1aee, 0x00465668, + 0x005129f7, 0x00aa6571, 0x005cfc7d, 0x00a7b0fb, + 0x00e9d10c, 0x00129d8a, 0x00e40486, 0x001f4800, + 0x0008379f, 0x00f37b19, 0x0005e215, 0x00feae93, + 0x00d050ad, 0x002b1c2b, 0x00dd8527, 0x0026c9a1, + 0x0031b63e, 0x00cafab8, 0x003c63b4, 0x00c72f32, + 0x00609fc9, 0x009bd34f, 0x006d4a43, 0x009606c5, + 0x0081795a, 0x007a35dc, 0x008cacd0, 0x0077e056, + 0x00591e68, 0x00a252ee, 0x0054cbe2, 0x00af8764, + 0x00b8f8fb, 0x0043b47d, 0x00b52d71, 0x004e61f7, + 0x00d2a319, 0x0029ef9f, 0x00df7693, 0x00243a15, + 0x0033458a, 0x00c8090c, 0x003e9000, 0x00c5dc86, + 0x00eb22b8, 0x00106e3e, 0x00e6f732, 0x001dbbb4, + 0x000ac42b, 0x00f188ad, 0x000711a1, 0x00fc5d27, + 0x005beddc, 0x00a0a15a, 0x00563856, 0x00ad74d0, + 0x00ba0b4f, 0x004147c9, 0x00b7dec5, 0x004c9243, + 0x00626c7d, 0x009920fb, 0x006fb9f7, 0x0094f571, + 0x00838aee, 0x0078c668, 0x008e5f64, 0x007513e2, + 0x003b7215, 0x00c03e93, 0x0036a79f, 0x00cdeb19, + 0x00da9486, 0x0021d800, 0x00d7410c, 0x002c0d8a, + 0x0002f3b4, 0x00f9bf32, 0x000f263e, 0x00f46ab8, + 0x00e31527, 0x001859a1, 0x00eec0ad, 0x00158c2b, + 0x00b23cd0, 0x00497056, 0x00bfe95a, 0x0044a5dc, + 0x0053da43, 0x00a896c5, 0x005e0fc9, 0x00a5434f, + 0x008bbd71, 0x0070f1f7, 0x008668fb, 0x007d247d, + 0x006a5be2, 0x00911764, 0x00678e68, 0x009cc2ee, + 0x00a44733, 0x005f0bb5, 0x00a992b9, 0x0052de3f, + 0x0045a1a0, 0x00beed26, 0x0048742a, 0x00b338ac, + 0x009dc692, 0x00668a14, 0x00901318, 0x006b5f9e, + 0x007c2001, 0x00876c87, 0x0071f58b, 0x008ab90d, + 0x002d09f6, 0x00d64570, 0x0020dc7c, 0x00db90fa, + 0x00ccef65, 0x0037a3e3, 0x00c13aef, 0x003a7669, + 0x00148857, 0x00efc4d1, 0x00195ddd, 0x00e2115b, + 0x00f56ec4, 0x000e2242, 0x00f8bb4e, 0x0003f7c8, + 0x004d963f, 0x00b6dab9, 0x004043b5, 0x00bb0f33, + 0x00ac70ac, 0x00573c2a, 0x00a1a526, 0x005ae9a0, + 0x0074179e, 0x008f5b18, 0x0079c214, 0x00828e92, + 0x0095f10d, 0x006ebd8b, 0x00982487, 0x00636801, + 0x00c4d8fa, 0x003f947c, 0x00c90d70, 0x003241f6, + 0x00253e69, 0x00de72ef, 0x0028ebe3, 0x00d3a765, + 0x00fd595b, 0x000615dd, 0x00f08cd1, 0x000bc057, + 0x001cbfc8, 0x00e7f34e, 0x00116a42, 0x00ea26c4, + 0x0076e42a, 0x008da8ac, 0x007b31a0, 0x00807d26, + 0x009702b9, 0x006c4e3f, 0x009ad733, 0x00619bb5, + 0x004f658b, 0x00b4290d, 0x0042b001, 0x00b9fc87, + 0x00ae8318, 0x0055cf9e, 0x00a35692, 0x00581a14, + 0x00ffaaef, 0x0004e669, 0x00f27f65, 0x000933e3, + 0x001e4c7c, 0x00e500fa, 0x001399f6, 0x00e8d570, + 0x00c62b4e, 0x003d67c8, 0x00cbfec4, 0x0030b242, + 0x0027cddd, 0x00dc815b, 0x002a1857, 0x00d154d1, + 0x009f3526, 0x006479a0, 0x0092e0ac, 0x0069ac2a, + 0x007ed3b5, 0x00859f33, 0x0073063f, 0x00884ab9, + 0x00a6b487, 0x005df801, 0x00ab610d, 0x00502d8b, + 0x00475214, 0x00bc1e92, 0x004a879e, 0x00b1cb18, + 0x00167be3, 0x00ed3765, 0x001bae69, 0x00e0e2ef, + 0x00f79d70, 0x000cd1f6, 0x00fa48fa, 0x0001047c, + 0x002ffa42, 0x00d4b6c4, 0x00222fc8, 0x00d9634e, + 0x00ce1cd1, 0x00355057, 0x00c3c95b, 0x003885dd, + 0x00000000, 0x00488f66, 0x00901ecd, 0x00d891ab, + 0x00db711c, 0x0093fe7a, 0x004b6fd1, 0x0003e0b7, + 0x00b6e338, 0x00fe6c5e, 0x0026fdf5, 0x006e7293, + 0x006d9224, 0x00251d42, 0x00fd8ce9, 0x00b5038f, + 0x006cc771, 0x00244817, 0x00fcd9bc, 0x00b456da, + 0x00b7b66d, 0x00ff390b, 0x0027a8a0, 0x006f27c6, + 0x00da2449, 0x0092ab2f, 0x004a3a84, 0x0002b5e2, + 0x00015555, 0x0049da33, 0x00914b98, 0x00d9c4fe, + 0x00d88ee3, 0x00900185, 0x0048902e, 0x00001f48, + 0x0003ffff, 0x004b7099, 0x0093e132, 0x00db6e54, + 0x006e6ddb, 0x0026e2bd, 0x00fe7316, 0x00b6fc70, + 0x00b51cc7, 0x00fd93a1, 0x0025020a, 0x006d8d6c, + 0x00b44992, 0x00fcc6f4, 0x0024575f, 0x006cd839, + 0x006f388e, 0x0027b7e8, 0x00ff2643, 0x00b7a925, + 0x0002aaaa, 0x004a25cc, 0x0092b467, 0x00da3b01, + 0x00d9dbb6, 0x009154d0, 0x0049c57b, 0x00014a1d, + 0x004b5141, 0x0003de27, 0x00db4f8c, 0x0093c0ea, + 0x0090205d, 0x00d8af3b, 0x00003e90, 0x0048b1f6, + 0x00fdb279, 0x00b53d1f, 0x006dacb4, 0x002523d2, + 0x0026c365, 0x006e4c03, 0x00b6dda8, 0x00fe52ce, + 0x00279630, 0x006f1956, 0x00b788fd, 0x00ff079b, + 0x00fce72c, 0x00b4684a, 0x006cf9e1, 0x00247687, + 0x00917508, 0x00d9fa6e, 0x00016bc5, 0x0049e4a3, + 0x004a0414, 0x00028b72, 0x00da1ad9, 0x009295bf, + 0x0093dfa2, 0x00db50c4, 0x0003c16f, 0x004b4e09, + 0x0048aebe, 0x000021d8, 0x00d8b073, 0x00903f15, + 0x00253c9a, 0x006db3fc, 0x00b52257, 0x00fdad31, + 0x00fe4d86, 0x00b6c2e0, 0x006e534b, 0x0026dc2d, + 0x00ff18d3, 0x00b797b5, 0x006f061e, 0x00278978, + 0x002469cf, 0x006ce6a9, 0x00b47702, 0x00fcf864, + 0x0049fbeb, 0x0001748d, 0x00d9e526, 0x00916a40, + 0x00928af7, 0x00da0591, 0x0002943a, 0x004a1b5c, + 0x0096a282, 0x00de2de4, 0x0006bc4f, 0x004e3329, + 0x004dd39e, 0x00055cf8, 0x00ddcd53, 0x00954235, + 0x002041ba, 0x0068cedc, 0x00b05f77, 0x00f8d011, + 0x00fb30a6, 0x00b3bfc0, 0x006b2e6b, 0x0023a10d, + 0x00fa65f3, 0x00b2ea95, 0x006a7b3e, 0x0022f458, + 0x002114ef, 0x00699b89, 0x00b10a22, 0x00f98544, + 0x004c86cb, 0x000409ad, 0x00dc9806, 0x00941760, + 0x0097f7d7, 0x00df78b1, 0x0007e91a, 0x004f667c, + 0x004e2c61, 0x0006a307, 0x00de32ac, 0x0096bdca, + 0x00955d7d, 0x00ddd21b, 0x000543b0, 0x004dccd6, + 0x00f8cf59, 0x00b0403f, 0x0068d194, 0x00205ef2, + 0x0023be45, 0x006b3123, 0x00b3a088, 0x00fb2fee, + 0x0022eb10, 0x006a6476, 0x00b2f5dd, 0x00fa7abb, + 0x00f99a0c, 0x00b1156a, 0x006984c1, 0x00210ba7, + 0x00940828, 0x00dc874e, 0x000416e5, 0x004c9983, + 0x004f7934, 0x0007f652, 0x00df67f9, 0x0097e89f, + 0x00ddf3c3, 0x00957ca5, 0x004ded0e, 0x00056268, + 0x000682df, 0x004e0db9, 0x00969c12, 0x00de1374, + 0x006b10fb, 0x00239f9d, 0x00fb0e36, 0x00b38150, + 0x00b061e7, 0x00f8ee81, 0x00207f2a, 0x0068f04c, + 0x00b134b2, 0x00f9bbd4, 0x00212a7f, 0x0069a519, + 0x006a45ae, 0x0022cac8, 0x00fa5b63, 0x00b2d405, + 0x0007d78a, 0x004f58ec, 0x0097c947, 0x00df4621, + 0x00dca696, 0x009429f0, 0x004cb85b, 0x0004373d, + 0x00057d20, 0x004df246, 0x009563ed, 0x00ddec8b, + 0x00de0c3c, 0x0096835a, 0x004e12f1, 0x00069d97, + 0x00b39e18, 0x00fb117e, 0x002380d5, 0x006b0fb3, + 0x0068ef04, 0x00206062, 0x00f8f1c9, 0x00b07eaf, + 0x0069ba51, 0x00213537, 0x00f9a49c, 0x00b12bfa, + 0x00b2cb4d, 0x00fa442b, 0x0022d580, 0x006a5ae6, + 0x00df5969, 0x0097d60f, 0x004f47a4, 0x0007c8c2, + 0x00042875, 0x004ca713, 0x009436b8, 0x00dcb9de, + 0x00000000, 0x00d70983, 0x00555f80, 0x00825603, + 0x0051f286, 0x0086fb05, 0x0004ad06, 0x00d3a485, + 0x0059a88b, 0x008ea108, 0x000cf70b, 0x00dbfe88, + 0x00085a0d, 0x00df538e, 0x005d058d, 0x008a0c0e, + 0x00491c91, 0x009e1512, 0x001c4311, 0x00cb4a92, + 0x0018ee17, 0x00cfe794, 0x004db197, 0x009ab814, + 0x0010b41a, 0x00c7bd99, 0x0045eb9a, 0x0092e219, + 0x0041469c, 0x00964f1f, 0x0014191c, 0x00c3109f, + 0x006974a4, 0x00be7d27, 0x003c2b24, 0x00eb22a7, + 0x00388622, 0x00ef8fa1, 0x006dd9a2, 0x00bad021, + 0x0030dc2f, 0x00e7d5ac, 0x006583af, 0x00b28a2c, + 0x00612ea9, 0x00b6272a, 0x00347129, 0x00e378aa, + 0x00206835, 0x00f761b6, 0x007537b5, 0x00a23e36, + 0x00719ab3, 0x00a69330, 0x0024c533, 0x00f3ccb0, + 0x0079c0be, 0x00aec93d, 0x002c9f3e, 0x00fb96bd, + 0x00283238, 0x00ff3bbb, 0x007d6db8, 0x00aa643b, + 0x0029a4ce, 0x00fead4d, 0x007cfb4e, 0x00abf2cd, + 0x00785648, 0x00af5fcb, 0x002d09c8, 0x00fa004b, + 0x00700c45, 0x00a705c6, 0x002553c5, 0x00f25a46, + 0x0021fec3, 0x00f6f740, 0x0074a143, 0x00a3a8c0, + 0x0060b85f, 0x00b7b1dc, 0x0035e7df, 0x00e2ee5c, + 0x00314ad9, 0x00e6435a, 0x00641559, 0x00b31cda, + 0x003910d4, 0x00ee1957, 0x006c4f54, 0x00bb46d7, + 0x0068e252, 0x00bfebd1, 0x003dbdd2, 0x00eab451, + 0x0040d06a, 0x0097d9e9, 0x00158fea, 0x00c28669, + 0x001122ec, 0x00c62b6f, 0x00447d6c, 0x009374ef, + 0x001978e1, 0x00ce7162, 0x004c2761, 0x009b2ee2, + 0x00488a67, 0x009f83e4, 0x001dd5e7, 0x00cadc64, + 0x0009ccfb, 0x00dec578, 0x005c937b, 0x008b9af8, + 0x00583e7d, 0x008f37fe, 0x000d61fd, 0x00da687e, + 0x00506470, 0x00876df3, 0x00053bf0, 0x00d23273, + 0x000196f6, 0x00d69f75, 0x0054c976, 0x0083c0f5, + 0x00a9041b, 0x007e0d98, 0x00fc5b9b, 0x002b5218, + 0x00f8f69d, 0x002fff1e, 0x00ada91d, 0x007aa09e, + 0x00f0ac90, 0x0027a513, 0x00a5f310, 0x0072fa93, + 0x00a15e16, 0x00765795, 0x00f40196, 0x00230815, + 0x00e0188a, 0x00371109, 0x00b5470a, 0x00624e89, + 0x00b1ea0c, 0x0066e38f, 0x00e4b58c, 0x0033bc0f, + 0x00b9b001, 0x006eb982, 0x00ecef81, 0x003be602, + 0x00e84287, 0x003f4b04, 0x00bd1d07, 0x006a1484, + 0x00c070bf, 0x0017793c, 0x00952f3f, 0x004226bc, + 0x00918239, 0x00468bba, 0x00c4ddb9, 0x0013d43a, + 0x0099d834, 0x004ed1b7, 0x00cc87b4, 0x001b8e37, + 0x00c82ab2, 0x001f2331, 0x009d7532, 0x004a7cb1, + 0x00896c2e, 0x005e65ad, 0x00dc33ae, 0x000b3a2d, + 0x00d89ea8, 0x000f972b, 0x008dc128, 0x005ac8ab, + 0x00d0c4a5, 0x0007cd26, 0x00859b25, 0x005292a6, + 0x00813623, 0x00563fa0, 0x00d469a3, 0x00036020, + 0x0080a0d5, 0x0057a956, 0x00d5ff55, 0x0002f6d6, + 0x00d15253, 0x00065bd0, 0x00840dd3, 0x00530450, + 0x00d9085e, 0x000e01dd, 0x008c57de, 0x005b5e5d, + 0x0088fad8, 0x005ff35b, 0x00dda558, 0x000aacdb, + 0x00c9bc44, 0x001eb5c7, 0x009ce3c4, 0x004bea47, + 0x00984ec2, 0x004f4741, 0x00cd1142, 0x001a18c1, + 0x009014cf, 0x00471d4c, 0x00c54b4f, 0x001242cc, + 0x00c1e649, 0x0016efca, 0x0094b9c9, 0x0043b04a, + 0x00e9d471, 0x003eddf2, 0x00bc8bf1, 0x006b8272, + 0x00b826f7, 0x006f2f74, 0x00ed7977, 0x003a70f4, + 0x00b07cfa, 0x00677579, 0x00e5237a, 0x00322af9, + 0x00e18e7c, 0x003687ff, 0x00b4d1fc, 0x0063d87f, + 0x00a0c8e0, 0x0077c163, 0x00f59760, 0x00229ee3, + 0x00f13a66, 0x002633e5, 0x00a465e6, 0x00736c65, + 0x00f9606b, 0x002e69e8, 0x00ac3feb, 0x007b3668, + 0x00a892ed, 0x007f9b6e, 0x00fdcd6d, 0x002ac4ee, + 0x00000000, 0x00520936, 0x00a4126c, 0x00f61b5a, + 0x004825d8, 0x001a2cee, 0x00ec37b4, 0x00be3e82, + 0x006b0636, 0x00390f00, 0x00cf145a, 0x009d1d6c, + 0x002323ee, 0x00712ad8, 0x00873182, 0x00d538b4, + 0x00d60c6c, 0x0084055a, 0x00721e00, 0x00201736, + 0x009e29b4, 0x00cc2082, 0x003a3bd8, 0x006832ee, + 0x00bd0a5a, 0x00ef036c, 0x00191836, 0x004b1100, + 0x00f52f82, 0x00a726b4, 0x00513dee, 0x000334d8, + 0x00ac19d8, 0x00fe10ee, 0x00080bb4, 0x005a0282, + 0x00e43c00, 0x00b63536, 0x00402e6c, 0x0012275a, + 0x00c71fee, 0x009516d8, 0x00630d82, 0x003104b4, + 0x008f3a36, 0x00dd3300, 0x002b285a, 0x0079216c, + 0x007a15b4, 0x00281c82, 0x00de07d8, 0x008c0eee, + 0x0032306c, 0x0060395a, 0x00962200, 0x00c42b36, + 0x00111382, 0x00431ab4, 0x00b501ee, 0x00e708d8, + 0x0059365a, 0x000b3f6c, 0x00fd2436, 0x00af2d00, + 0x00a37f36, 0x00f17600, 0x00076d5a, 0x0055646c, + 0x00eb5aee, 0x00b953d8, 0x004f4882, 0x001d41b4, + 0x00c87900, 0x009a7036, 0x006c6b6c, 0x003e625a, + 0x00805cd8, 0x00d255ee, 0x00244eb4, 0x00764782, + 0x0075735a, 0x00277a6c, 0x00d16136, 0x00836800, + 0x003d5682, 0x006f5fb4, 0x009944ee, 0x00cb4dd8, + 0x001e756c, 0x004c7c5a, 0x00ba6700, 0x00e86e36, + 0x005650b4, 0x00045982, 0x00f242d8, 0x00a04bee, + 0x000f66ee, 0x005d6fd8, 0x00ab7482, 0x00f97db4, + 0x00474336, 0x00154a00, 0x00e3515a, 0x00b1586c, + 0x006460d8, 0x003669ee, 0x00c072b4, 0x00927b82, + 0x002c4500, 0x007e4c36, 0x0088576c, 0x00da5e5a, + 0x00d96a82, 0x008b63b4, 0x007d78ee, 0x002f71d8, + 0x00914f5a, 0x00c3466c, 0x00355d36, 0x00675400, + 0x00b26cb4, 0x00e06582, 0x00167ed8, 0x004477ee, + 0x00fa496c, 0x00a8405a, 0x005e5b00, 0x000c5236, + 0x0046ff6c, 0x0014f65a, 0x00e2ed00, 0x00b0e436, + 0x000edab4, 0x005cd382, 0x00aac8d8, 0x00f8c1ee, + 0x002df95a, 0x007ff06c, 0x0089eb36, 0x00dbe200, + 0x0065dc82, 0x0037d5b4, 0x00c1ceee, 0x0093c7d8, + 0x0090f300, 0x00c2fa36, 0x0034e16c, 0x0066e85a, + 0x00d8d6d8, 0x008adfee, 0x007cc4b4, 0x002ecd82, + 0x00fbf536, 0x00a9fc00, 0x005fe75a, 0x000dee6c, + 0x00b3d0ee, 0x00e1d9d8, 0x0017c282, 0x0045cbb4, + 0x00eae6b4, 0x00b8ef82, 0x004ef4d8, 0x001cfdee, + 0x00a2c36c, 0x00f0ca5a, 0x0006d100, 0x0054d836, + 0x0081e082, 0x00d3e9b4, 0x0025f2ee, 0x0077fbd8, + 0x00c9c55a, 0x009bcc6c, 0x006dd736, 0x003fde00, + 0x003cead8, 0x006ee3ee, 0x0098f8b4, 0x00caf182, + 0x0074cf00, 0x0026c636, 0x00d0dd6c, 0x0082d45a, + 0x0057ecee, 0x0005e5d8, 0x00f3fe82, 0x00a1f7b4, + 0x001fc936, 0x004dc000, 0x00bbdb5a, 0x00e9d26c, + 0x00e5805a, 0x00b7896c, 0x00419236, 0x00139b00, + 0x00ada582, 0x00ffacb4, 0x0009b7ee, 0x005bbed8, + 0x008e866c, 0x00dc8f5a, 0x002a9400, 0x00789d36, + 0x00c6a3b4, 0x0094aa82, 0x0062b1d8, 0x0030b8ee, + 0x00338c36, 0x00618500, 0x00979e5a, 0x00c5976c, + 0x007ba9ee, 0x0029a0d8, 0x00dfbb82, 0x008db2b4, + 0x00588a00, 0x000a8336, 0x00fc986c, 0x00ae915a, + 0x0010afd8, 0x0042a6ee, 0x00b4bdb4, 0x00e6b482, + 0x00499982, 0x001b90b4, 0x00ed8bee, 0x00bf82d8, + 0x0001bc5a, 0x0053b56c, 0x00a5ae36, 0x00f7a700, + 0x00229fb4, 0x00709682, 0x00868dd8, 0x00d484ee, + 0x006aba6c, 0x0038b35a, 0x00cea800, 0x009ca136, + 0x009f95ee, 0x00cd9cd8, 0x003b8782, 0x00698eb4, + 0x00d7b036, 0x0085b900, 0x0073a25a, 0x0021ab6c, + 0x00f493d8, 0x00a69aee, 0x005081b4, 0x00028882, + 0x00bcb600, 0x00eebf36, 0x0018a46c, 0x004aad5a +}; + +static inline +u32 crc24_init (void) +{ + return 0xce04b7; +} + +static inline +u32 crc24_next (u32 crc, byte data) +{ + return (crc >> 8) ^ crc24_table[(crc & 0xff) ^ data]; +} + +/* + * Process 4 bytes in one go */ +static inline +u32 crc24_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc24_table[(crc & 0xff) + 0x300] ^ + crc24_table[((crc >> 8) & 0xff) + 0x200] ^ + crc24_table[((crc >> 16) & 0xff) + 0x100] ^ + crc24_table[(crc >> 24) & 0xff]; + return crc; +} -#define CRC24_INIT 0xb704ceL -#define CRC24_POLY 0x1864cfbL +static inline +u32 crc24_final (u32 crc) +{ + return crc & 0xffffff; +} static void crc24rfc2440_init (void *context, unsigned int flags) @@ -249,36 +730,52 @@ crc24rfc2440_init (void *context, unsigned int flags) (void)flags; - ctx->CRC = CRC24_INIT; + ctx->CRC = crc24_init(); } static void crc24rfc2440_write (void *context, const void *inbuf_arg, size_t inlen) { const unsigned char *inbuf = inbuf_arg; - int i; CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; + u32 crc; - if (!inbuf) + if (!inbuf || !inlen) return; - while (inlen--) { - ctx->CRC ^= (*inbuf++) << 16; - for (i = 0; i < 8; i++) { - ctx->CRC <<= 1; - if (ctx->CRC & 0x1000000) - ctx->CRC ^= CRC24_POLY; + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc24_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc24_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc24_next(crc, *inbuf++); } - } + + ctx->CRC = crc; } static void crc24rfc2440_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[2] = (ctx->CRC ) & 0xFF; + ctx->CRC = crc24_final(ctx->CRC); + buf_put_le32 (ctx->buf, ctx->CRC); } /* We allow the CRC algorithms even in FIPS mode because they are diff --git a/tests/basic.c b/tests/basic.c index 2cf8dd0..bb07394 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5524,6 +5524,7 @@ check_digests (void) "TY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser Gene" "ral Public License for more details.", "\x4A\x53\x7D\x67" }, + { GCRY_MD_CRC32, "123456789", "\xcb\xf4\x39\x26" }, { GCRY_MD_CRC32_RFC1510, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32_RFC1510, "foo", "\x73\x32\xbc\x33" }, { GCRY_MD_CRC32_RFC1510, "test0123456789", "\xb8\x3e\x88\xd6" }, @@ -5539,8 +5540,10 @@ check_digests (void) { GCRY_MD_CRC32_RFC1510, "\x80\x00\x00\x00", "\xed\x59\xb6\x3b" }, { GCRY_MD_CRC32_RFC1510, "\x00\x00\x00\x01", "\x77\x07\x30\x96" }, #endif + { GCRY_MD_CRC32_RFC1510, "123456789", "\x2d\xfd\x2d\x88" }, { GCRY_MD_CRC24_RFC2440, "", "\xb7\x04\xce" }, { GCRY_MD_CRC24_RFC2440, "foo", "\x4f\xc2\x55" }, + { GCRY_MD_CRC24_RFC2440, "123456789", "\x21\xcf\x02" }, { GCRY_MD_TIGER, "", "\x24\xF0\x13\x0C\x63\xAC\x93\x32\x16\x16\x6E\x76" From cvs at cvs.gnupg.org Sun May 3 09:51:52 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sun, 03 May 2015 09:51:52 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-194-ge886e4f Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via e886e4f5e73fe6a9f9191f5155852ce5d8bb88fe (commit) via c2dba93e639639bdac139b3a3a456d10ddc61f79 (commit) via 839a3bbe2bb045139223b32753d656cc6c3d4669 (commit) via 9f086ffa43f2507b9d17522a0a2e394cb273baf8 (commit) via e40eff94f9f8654c3d29e03bbb7e5ee6a43c1435 (commit) via 88842cbc68beb4f73c87fdbcb74182cba818f789 (commit) from 124dfce7c5a2d9405fa2b2832e91ac1267943830 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit e886e4f5e73fe6a9f9191f5155852ce5d8bb88fe Author: Jussi Kivilinna Date: Fri May 1 19:07:07 2015 +0300 Fix packed attribute check for Windows targets * configure.ac (gcry_cv_gcc_attribute_packed): Move 'long b' to its own packed structure. -- Change packed attribute test so that it works with both MS ABI and SYSV ABI. Signed-off-by: Jussi Kivilinna diff --git a/configure.ac b/configure.ac index 16f6a21..555ad1e 100644 --- a/configure.ac +++ b/configure.ac @@ -964,7 +964,9 @@ AC_CACHE_CHECK([whether the GCC style packed attribute is supported], [gcry_cv_gcc_attribute_packed], [gcry_cv_gcc_attribute_packed=no AC_COMPILE_IFELSE([AC_LANG_SOURCE( - [[struct foo_s { char a; long b; } __attribute__ ((packed)); + [[struct foolong_s { long b; } __attribute__ ((packed)); + struct foo_s { char a; struct foolong_s b; } + __attribute__ ((packed)); enum bar { FOO = 1 / (sizeof(struct foo_s) == (sizeof(char) + sizeof(long))), };]])], commit c2dba93e639639bdac139b3a3a456d10ddc61f79 Author: Jussi Kivilinna Date: Fri May 1 18:50:34 2015 +0300 Fix tail handling in buf_xor_1 * cipher/bufhelp.h (buf_xor_1): Increment source pointer at tail handling. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index fb87939..c1aa52e 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -162,7 +162,7 @@ do_bytes: #endif /* Handle tail. */ for (; len; len--) - *dst++ ^= *src; + *dst++ ^= *src++; } commit 839a3bbe2bb045139223b32753d656cc6c3d4669 Author: Jussi Kivilinna Date: Fri May 1 15:03:38 2015 +0300 Add --disable-hwf for basic tests * tests/basic.c (main): Add handling for '--disable-hwf'. -- Signed-off-by: Jussi Kivilinna diff --git a/tests/basic.c b/tests/basic.c index 8400f9e..2cf8dd0 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -8028,6 +8028,21 @@ main (int argc, char **argv) argc--; argv++; } } + else if (!strcmp (*argv, "--disable-hwf")) + { + argc--; + argv++; + if (argc) + { + if (gcry_control (GCRYCTL_DISABLE_HWF, *argv, NULL)) + fprintf (stderr, + PGM + ": unknown hardware feature `%s' - option ignored\n", + *argv); + argc--; + argv++; + } + } } gcry_control (GCRYCTL_SET_VERBOSITY, (int)verbose); commit 9f086ffa43f2507b9d17522a0a2e394cb273baf8 Author: Jussi Kivilinna Date: Fri May 1 14:55:58 2015 +0300 Use more odd chuck sizes for check_one_md * tests/basic.c (check_one_md): Make chuck size vary oddly, instead of using fixed length of 1000 bytes. -- Signed-off-by: Jussi Kivilinna diff --git a/tests/basic.c b/tests/basic.c index f3105de..8400f9e 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5231,11 +5231,29 @@ check_one_md (int algo, const char *data, int len, const char *expect) if (*data == '!' && !data[1]) { /* hash one million times a "a" */ char aaa[1000]; + size_t left = 1000 * 1000; + size_t startlen = 1; + size_t piecelen = startlen; - /* Write in odd size chunks so that we test the buffering. */ memset (aaa, 'a', 1000); - for (i = 0; i < 1000; i++) - gcry_md_write (hd, aaa, 1000); + + /* Write in odd size chunks so that we test the buffering. */ + while (left > 0) + { + if (piecelen > sizeof(aaa)) + piecelen = sizeof(aaa); + if (piecelen > left) + piecelen = left; + + gcry_md_write (hd, aaa, piecelen); + + left -= piecelen; + + if (piecelen == sizeof(aaa)) + piecelen = ++startlen; + else + piecelen = piecelen * 2 - ((piecelen != startlen) ? startlen : 0); + } } else gcry_md_write (hd, data, len); commit e40eff94f9f8654c3d29e03bbb7e5ee6a43c1435 Author: Jussi Kivilinna Date: Fri May 1 14:33:29 2015 +0300 Enable more modes in basic ciphers test * src/gcrypt.h.in (GCRY_OCB_BLOCK_LEN): New. * tests/basic.c (check_one_cipher_core_reset): New. (check_one_cipher_core): Use check_one_cipher_core_reset inplace of gcry_cipher_reset. (check_ciphers): Add CCM and OCB modes for block cipher tests. -- Signed-off-by: Jussi Kivilinna diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index cac2b49..0984d11 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -931,6 +931,9 @@ enum gcry_cipher_flags /* CCM works only with blocks of 128 bits. */ #define GCRY_CCM_BLOCK_LEN (128 / 8) +/* OCB works only with blocks of 128 bits. */ +#define GCRY_OCB_BLOCK_LEN (128 / 8) + /* Create a handle for algorithm ALGO to be used in MODE. FLAGS may be given as an bitwise OR of the gcry_cipher_flags values. */ gcry_error_t gcry_cipher_open (gcry_cipher_hd_t *handle, diff --git a/tests/basic.c b/tests/basic.c index 07fd4d0..f3105de 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -4676,7 +4676,8 @@ check_bulk_cipher_modes (void) } -static unsigned int get_algo_mode_blklen(int algo, int mode) +static unsigned int +get_algo_mode_blklen (int algo, int mode) { unsigned int blklen = gcry_cipher_get_algo_blklen(algo); @@ -4696,6 +4697,48 @@ static unsigned int get_algo_mode_blklen(int algo, int mode) } +static int +check_one_cipher_core_reset (gcry_cipher_hd_t hd, int algo, int mode, int pass, + int nplain) +{ + static const unsigned char iv[8] = { 0, 1, 2, 3, 4, 5, 6, 7 }; + u64 ctl_params[3]; + int err; + + gcry_cipher_reset (hd); + + if (mode == GCRY_CIPHER_MODE_OCB || mode == GCRY_CIPHER_MODE_CCM) + { + err = gcry_cipher_setiv (hd, iv, sizeof(iv)); + if (err) + { + fail ("pass %d, algo %d, mode %d, gcry_cipher_setiv failed: %s\n", + pass, algo, mode, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + } + + if (mode == GCRY_CIPHER_MODE_CCM) + { + ctl_params[0] = nplain; /* encryptedlen */ + ctl_params[1] = 0; /* aadlen */ + ctl_params[2] = 16; /* authtaglen */ + err = gcry_cipher_ctl (hd, GCRYCTL_SET_CCM_LENGTHS, ctl_params, + sizeof(ctl_params)); + if (err) + { + fail ("pass %d, algo %d, mode %d, gcry_cipher_ctl " + "GCRYCTL_SET_CCM_LENGTHS failed: %s\n", + pass, algo, mode, gpg_strerror (err)); + gcry_cipher_close (hd); + return -1; + } + } + + return 0; +} + /* The core of the cipher check. In addition to the parameters passed to check_one_cipher it also receives the KEY and the plain data. PASS is printed with error messages. The function returns 0 on @@ -4782,6 +4825,9 @@ check_one_cipher_core (int algo, int mode, int flags, return -1; } + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; + err = gcry_cipher_encrypt (hd, out, nplain, plain, nplain); if (err) { @@ -4793,7 +4839,8 @@ check_one_cipher_core (int algo, int mode, int flags, memcpy (enc_result, out, nplain); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; err = gcry_cipher_decrypt (hd, in, nplain, out, nplain); if (err) @@ -4809,7 +4856,8 @@ check_one_cipher_core (int algo, int mode, int flags, pass, algo, mode); /* Again, using in-place encryption. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; memcpy (out, plain, nplain); err = gcry_cipher_encrypt (hd, out, nplain, NULL, 0); @@ -4826,7 +4874,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, in-place, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; err = gcry_cipher_decrypt (hd, out, nplain, NULL, 0); if (err) @@ -4843,7 +4892,8 @@ check_one_cipher_core (int algo, int mode, int flags, pass, algo, mode); /* Again, splitting encryption in multiple operations. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4871,7 +4921,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, split-buffer, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4900,7 +4951,8 @@ check_one_cipher_core (int algo, int mode, int flags, /* Again, using in-place encryption and splitting encryption in multiple * operations. */ - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -4928,7 +4980,8 @@ check_one_cipher_core (int algo, int mode, int flags, fail ("pass %d, algo %d, mode %d, in-place split-buffer, encrypt mismatch\n", pass, algo, mode); - gcry_cipher_reset (hd); + if (check_one_cipher_core_reset (hd, algo, mode, pass, nplain) < 0) + return -1; piecelen = blklen; pos = 0; @@ -5096,8 +5149,12 @@ check_ciphers (void) check_one_cipher (algos[i], GCRY_CIPHER_MODE_CBC, 0); check_one_cipher (algos[i], GCRY_CIPHER_MODE_CBC, GCRY_CIPHER_CBC_CTS); check_one_cipher (algos[i], GCRY_CIPHER_MODE_CTR, 0); + if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_CCM_BLOCK_LEN) + check_one_cipher (algos[i], GCRY_CIPHER_MODE_CCM, 0); if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_GCM_BLOCK_LEN) check_one_cipher (algos[i], GCRY_CIPHER_MODE_GCM, 0); + if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_OCB_BLOCK_LEN) + check_one_cipher (algos[i], GCRY_CIPHER_MODE_OCB, 0); } for (i = 0; algos2[i]; i++) commit 88842cbc68beb4f73c87fdbcb74182cba818f789 Author: Jussi Kivilinna Date: Fri May 1 14:32:36 2015 +0300 Fix reseting cipher in OCB mode * cipher/cipher.c (cipher_reset): Setup default taglen for OCB after clearing state. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher.c b/cipher/cipher.c index 6e1173f..d1550c0 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -744,6 +744,8 @@ cipher_reset (gcry_cipher_hd_t c) case GCRY_CIPHER_MODE_OCB: memset (&c->u_mode.ocb, 0, sizeof c->u_mode.ocb); + /* Setup default taglen. */ + c->u_mode.ocb.taglen = 16; break; default: ----------------------------------------------------------------------- Summary of changes: cipher/bufhelp.h | 2 +- cipher/cipher.c | 2 + configure.ac | 4 +- src/gcrypt.h.in | 3 ++ tests/basic.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++------ 5 files changed, 110 insertions(+), 13 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From cvs at cvs.gnupg.org Sun May 3 10:41:00 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sun, 03 May 2015 10:41:00 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-207-g66129b3 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 66129b3334a5aa54ff8a97981507e4704f759571 (commit) via 8422d5d699265b960bd1ca837044ee052fc5b614 (commit) via 1089a13073c26a9a456e43ec38d937e6ee7f4077 (commit) via 022959099644f64df5f2a83ade21159864f64837 (commit) via e433676a899fa0d274d40547166b03c7c8bd8e78 (commit) via 4e09aaa36d151c3312019724a77fc09aa345b82f (commit) via 460355f23e770637d29e3af7b998a957a2b5bc88 (commit) via 6c21cf5fed1ad430fa41445eac2350802bc8aaed (commit) via 9cf224322007d90193d4910f0da6e0e29ce01d70 (commit) via d5a7e00b6b222566a5650639ef29684b047c1909 (commit) via 0cdd24456b33defc7f8176fa82ab694fbc284385 (commit) via f701954555340a503f6e52cc18d58b0c515427b7 (commit) via e78560a4b717f7154f910a8ce4128de152f586da (commit) from e886e4f5e73fe6a9f9191f5155852ce5d8bb88fe (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 66129b3334a5aa54ff8a97981507e4704f759571 Author: Jussi Kivilinna Date: Sat May 2 13:27:06 2015 +0300 Enable AMD64 AES implementation for WIN64 * cipher/rijndael-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/rijndael-internal.h (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (do_encrypt, do_decrypt) [USE_AMD64_ASM && !HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Use assembly block to call AMD64 assembly encrypt/decrypt function. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-amd64.S b/cipher/rijndael-amd64.S index 24c555a..b149e94 100644 --- a/cipher/rijndael-amd64.S +++ b/cipher/rijndael-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_AES) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_AES) #ifdef __PIC__ # define RIP (%rip) @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text /* table macros */ @@ -205,7 +212,7 @@ .align 8 .globl _gcry_aes_amd64_encrypt_block -.type _gcry_aes_amd64_encrypt_block, at function; +ELF(.type _gcry_aes_amd64_encrypt_block, at function;) _gcry_aes_amd64_encrypt_block: /* input: @@ -279,7 +286,7 @@ _gcry_aes_amd64_encrypt_block: lastencround(11); jmp .Lenc_done; -.size _gcry_aes_amd64_encrypt_block,.-_gcry_aes_amd64_encrypt_block; +ELF(.size _gcry_aes_amd64_encrypt_block,.-_gcry_aes_amd64_encrypt_block;) #define do_decround(next_r) \ do16bit_shr(16, mov, RA, Dsize, D0, RNA, D0, RNB, RT0, RT1); \ @@ -365,7 +372,7 @@ _gcry_aes_amd64_encrypt_block: .align 8 .globl _gcry_aes_amd64_decrypt_block -.type _gcry_aes_amd64_decrypt_block, at function; +ELF(.type _gcry_aes_amd64_decrypt_block, at function;) _gcry_aes_amd64_decrypt_block: /* input: @@ -440,7 +447,7 @@ _gcry_aes_amd64_decrypt_block: decround(9); jmp .Ldec_tail; -.size _gcry_aes_amd64_decrypt_block,.-_gcry_aes_amd64_decrypt_block; +ELF(.size _gcry_aes_amd64_decrypt_block,.-_gcry_aes_amd64_decrypt_block;) #endif /*USE_AES*/ #endif /*__x86_64*/ diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index 33ca53f..6641728 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -39,7 +39,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif diff --git a/cipher/rijndael.c b/cipher/rijndael.c index ade41c9..7ebf329 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -665,8 +665,25 @@ do_encrypt (const RIJNDAEL_context *ctx, unsigned char *bx, const unsigned char *ax) { #ifdef USE_AMD64_ASM +# ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS return _gcry_aes_amd64_encrypt_block(ctx->keyschenc, bx, ax, ctx->rounds, encT); +# else + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + uintptr_t ret; + asm ("movq %[encT], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret) + : "0" (_gcry_aes_amd64_encrypt_block), + "D" (ctx->keyschenc), + "S" (bx), + "d" (ax), + "c" (ctx->rounds), + [encT] "r" (encT) + : "cc", "memory", "r8", "r9", "r10", "r11"); + return ret; +# endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) return _gcry_aes_arm_encrypt_block(ctx->keyschenc, bx, ax, ctx->rounds, encT); #else @@ -1008,8 +1025,25 @@ do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx, const unsigned char *ax) { #ifdef USE_AMD64_ASM +# ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS return _gcry_aes_amd64_decrypt_block(ctx->keyschdec, bx, ax, ctx->rounds, &dec_tables); +# else + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + uintptr_t ret; + asm ("movq %[dectabs], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret) + : "0" (_gcry_aes_amd64_decrypt_block), + "D" (ctx->keyschdec), + "S" (bx), + "d" (ax), + "c" (ctx->rounds), + [dectabs] "r" (&dec_tables) + : "cc", "memory", "r8", "r9", "r10", "r11"); + return ret; +# endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) return _gcry_aes_arm_decrypt_block(ctx->keyschdec, bx, ax, ctx->rounds, &dec_tables); commit 8422d5d699265b960bd1ca837044ee052fc5b614 Author: Jussi Kivilinna Date: Sat May 2 13:26:46 2015 +0300 Enable AMD64 Whirlpool implementation for WIN64 * cipher/whirlpool-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/whirlpool.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AMD64_ASM] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64): Add ASM_FUNC_ABI to prototype. [USE_AMD64_ASM] (whirlpool_transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/whirlpool-sse2-amd64.S b/cipher/whirlpool-sse2-amd64.S index d0bcf2d..e98b831 100644 --- a/cipher/whirlpool-sse2-amd64.S +++ b/cipher/whirlpool-sse2-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_WHIRLPOOL) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_WHIRLPOOL) #ifdef __PIC__ # define RIP %rip @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text /* look-up table offsets on RTAB */ @@ -157,7 +164,7 @@ .align 8 .globl _gcry_whirlpool_transform_amd64 -.type _gcry_whirlpool_transform_amd64, at function; +ELF(.type _gcry_whirlpool_transform_amd64, at function;) _gcry_whirlpool_transform_amd64: /* input: @@ -329,7 +336,7 @@ _gcry_whirlpool_transform_amd64: .Lskip: movl $(STACK_MAX + 8), %eax; ret; -.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64; +ELF(.size _gcry_whirlpool_transform_amd64,.-_gcry_whirlpool_transform_amd64;) #endif #endif diff --git a/cipher/whirlpool.c b/cipher/whirlpool.c index 2732f63..5f224a1 100644 --- a/cipher/whirlpool.c +++ b/cipher/whirlpool.c @@ -42,7 +42,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -1192,9 +1193,17 @@ whirlpool_init (void *ctx, unsigned int flags) #ifdef USE_AMD64_ASM +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + extern unsigned int _gcry_whirlpool_transform_amd64(u64 *state, const unsigned char *data, - size_t nblks, const struct whirlpool_tables_s *tables); + size_t nblks, const struct whirlpool_tables_s *tables) ASM_FUNC_ABI; static unsigned int whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) @@ -1202,7 +1211,7 @@ whirlpool_transform (void *ctx, const unsigned char *data, size_t nblks) whirlpool_context_t *context = ctx; return _gcry_whirlpool_transform_amd64( - context->hash_state, data, nblks, &tab); + context->hash_state, data, nblks, &tab) + ASM_EXTRA_STACK; } #else /* USE_AMD64_ASM */ commit 1089a13073c26a9a456e43ec38d937e6ee7f4077 Author: Jussi Kivilinna Date: Sat May 2 13:05:12 2015 +0300 Enable AMD64 SHA512 implementations for WIN64 * cipher/sha512-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha512-avx-bmi2-amd64.S: Ditto. * cipher/sha512-ssse3-amd64.S: Ditto. * cipher/sha512.c (USE_SSSE3, USE_AVX, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_AVX2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha512_transform_amd64_ssse3, _gcry_sha512_transform_amd64_avx) (_gcry_sha512_transform_amd64_avx_bmi2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/sha512-avx-amd64.S b/cipher/sha512-avx-amd64.S index 3449b87..699c271 100644 --- a/cipher/sha512-avx-amd64.S +++ b/cipher/sha512-avx-amd64.S @@ -41,7 +41,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA512) @@ -51,6 +52,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -259,7 +266,7 @@ frame_size = ((frame_GPRSAVE) + (frame_GPRSAVE_size)) ; L is the message length in SHA512 blocks */ .globl _gcry_sha512_transform_amd64_avx -.type _gcry_sha512_transform_amd64_avx, at function; +ELF(.type _gcry_sha512_transform_amd64_avx, at function;) .align 16 _gcry_sha512_transform_amd64_avx: xor eax, eax diff --git a/cipher/sha512-avx2-bmi2-amd64.S b/cipher/sha512-avx2-bmi2-amd64.S index d6301f3..02f95af 100644 --- a/cipher/sha512-avx2-bmi2-amd64.S +++ b/cipher/sha512-avx2-bmi2-amd64.S @@ -43,7 +43,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(USE_SHA512) @@ -54,6 +55,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -596,7 +603,7 @@ rotate_Ys ; L is the message length in SHA512 blocks */ .globl _gcry_sha512_transform_amd64_avx2 -.type _gcry_sha512_transform_amd64_avx2, at function; +ELF(.type _gcry_sha512_transform_amd64_avx2, at function;) .align 16 _gcry_sha512_transform_amd64_avx2: xor eax, eax diff --git a/cipher/sha512-ssse3-amd64.S b/cipher/sha512-ssse3-amd64.S index 4c80baa..c721bcf 100644 --- a/cipher/sha512-ssse3-amd64.S +++ b/cipher/sha512-ssse3-amd64.S @@ -44,7 +44,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA512) @@ -54,6 +55,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix .text @@ -261,7 +268,7 @@ frame_size = ((frame_GPRSAVE) + (frame_GPRSAVE_size)) ; L is the message length in SHA512 blocks. */ .globl _gcry_sha512_transform_amd64_ssse3 -.type _gcry_sha512_transform_amd64_ssse3, at function; +ELF(.type _gcry_sha512_transform_amd64_ssse3, at function;) .align 16 _gcry_sha512_transform_amd64_ssse3: xor eax, eax diff --git a/cipher/sha512.c b/cipher/sha512.c index 5a6af80..029f8f0 100644 --- a/cipher/sha512.c +++ b/cipher/sha512.c @@ -68,27 +68,31 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2/rorx code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX2) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX2 1 #endif @@ -543,6 +547,21 @@ transform_blk (SHA512_STATE *hd, const unsigned char *data) } +/* AMD64 assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_ARM_NEON_ASM void _gcry_sha512_transform_armv7_neon (SHA512_STATE *hd, const unsigned char *data, @@ -551,17 +570,20 @@ void _gcry_sha512_transform_armv7_neon (SHA512_STATE *hd, #ifdef USE_SSSE3 unsigned int _gcry_sha512_transform_amd64_ssse3(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha512_transform_amd64_avx(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 unsigned int _gcry_sha512_transform_amd64_avx2(const void *input_data, - void *state, size_t num_blks); + void *state, + size_t num_blks) ASM_FUNC_ABI; #endif @@ -574,19 +596,19 @@ transform (void *context, const unsigned char *data, size_t nblks) #ifdef USE_AVX2 if (ctx->use_avx2) return _gcry_sha512_transform_amd64_avx2 (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (ctx->use_avx) return _gcry_sha512_transform_amd64_avx (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (ctx->use_ssse3) return _gcry_sha512_transform_amd64_ssse3 (data, &ctx->state, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_ARM_NEON_ASM @@ -607,6 +629,14 @@ transform (void *context, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } commit 022959099644f64df5f2a83ade21159864f64837 Author: Jussi Kivilinna Date: Sat May 2 13:05:02 2015 +0300 Enable AMD64 SHA256 implementations for WIN64 * cipher/sha256-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha256-avx2-bmi2-amd64.S: Ditto. * cipher/sha256-ssse3-amd64.S: Ditto. * cipher/sha256.c (USE_SSSE3, USE_AVX, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_AVX2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha256_transform_amd64_ssse3, _gcry_sha256_transform_amd64_avx) (_gcry_sha256_transform_amd64_avx2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/sha256-avx-amd64.S b/cipher/sha256-avx-amd64.S index 3912db7..8bf26bd 100644 --- a/cipher/sha256-avx-amd64.S +++ b/cipher/sha256-avx-amd64.S @@ -54,7 +54,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA256) @@ -64,6 +65,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define VMOVDQ vmovdqu /* assume buffers not aligned */ @@ -370,7 +377,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_avx -.type _gcry_sha256_transform_amd64_avx, at function; +ELF(.type _gcry_sha256_transform_amd64_avx, at function;) .align 16 _gcry_sha256_transform_amd64_avx: vzeroupper diff --git a/cipher/sha256-avx2-bmi2-amd64.S b/cipher/sha256-avx2-bmi2-amd64.S index 09df711..74b6063 100644 --- a/cipher/sha256-avx2-bmi2-amd64.S +++ b/cipher/sha256-avx2-bmi2-amd64.S @@ -54,7 +54,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(USE_SHA256) @@ -65,6 +66,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define VMOVDQ vmovdqu /* ; assume buffers not aligned */ @@ -555,7 +562,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_avx2 -.type _gcry_sha256_transform_amd64_avx2, at function +ELF(.type _gcry_sha256_transform_amd64_avx2, at function) .align 32 _gcry_sha256_transform_amd64_avx2: push rbx diff --git a/cipher/sha256-ssse3-amd64.S b/cipher/sha256-ssse3-amd64.S index 80b1cec..9ec87e4 100644 --- a/cipher/sha256-ssse3-amd64.S +++ b/cipher/sha256-ssse3-amd64.S @@ -55,7 +55,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA256) @@ -65,6 +66,12 @@ # define ADD_RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .intel_syntax noprefix #define MOVDQ movdqu /* assume buffers not aligned */ @@ -376,7 +383,7 @@ rotate_Xs */ .text .globl _gcry_sha256_transform_amd64_ssse3 -.type _gcry_sha256_transform_amd64_ssse3, at function; +ELF(.type _gcry_sha256_transform_amd64_ssse3, at function;) .align 16 _gcry_sha256_transform_amd64_ssse3: push rbx diff --git a/cipher/sha256.c b/cipher/sha256.c index d3af172..59ffa43 100644 --- a/cipher/sha256.c +++ b/cipher/sha256.c @@ -49,25 +49,29 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2/BMI2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(HAVE_GCC_INLINE_ASM_BMI2) && \ - defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX2) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + defined(HAVE_INTEL_SYNTAX_PLATFORM_AS) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX2 1 #endif @@ -322,19 +326,37 @@ transform_blk (void *ctx, const unsigned char *data) #undef R +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_SSSE3 unsigned int _gcry_sha256_transform_amd64_ssse3(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha256_transform_amd64_avx(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 unsigned int _gcry_sha256_transform_amd64_avx2(const void *input_data, - u32 state[8], size_t num_blks); + u32 state[8], + size_t num_blks) ASM_FUNC_ABI; #endif @@ -347,19 +369,19 @@ transform (void *ctx, const unsigned char *data, size_t nblks) #ifdef USE_AVX2 if (hd->use_avx2) return _gcry_sha256_transform_amd64_avx2 (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (hd->use_avx) return _gcry_sha256_transform_amd64_avx (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (hd->use_ssse3) return _gcry_sha256_transform_amd64_ssse3 (data, &hd->h0, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif do @@ -369,6 +391,14 @@ transform (void *ctx, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } commit e433676a899fa0d274d40547166b03c7c8bd8e78 Author: Jussi Kivilinna Date: Sat May 2 12:57:07 2015 +0300 Enable AMD64 SHA1 implementations for WIN64 * cipher/sha1-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha1-avx-bmi2-amd64.S: Ditto. * cipher/sha1-ssse3-amd64.S: Ditto. * cipher/sha1.c (USE_SSSE3, USE_AVX, USE_BMI2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 ||?USE_AVX ||?USE_BMI2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha1_transform_amd64_ssse3, _gcry_sha1_transform_amd64_avx) (_gcry_sha1_transform_amd64_avx_bmi2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/sha1-avx-amd64.S b/cipher/sha1-avx-amd64.S index 6bec389..062a45b 100644 --- a/cipher/sha1-avx-amd64.S +++ b/cipher/sha1-avx-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(HAVE_GCC_INLINE_ASM_AVX2) && defined(USE_SHA1) @@ -40,6 +41,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -209,7 +217,7 @@ */ .text .globl _gcry_sha1_transform_amd64_avx -.type _gcry_sha1_transform_amd64_avx, at function +ELF(.type _gcry_sha1_transform_amd64_avx, at function) .align 16 _gcry_sha1_transform_amd64_avx: /* input: diff --git a/cipher/sha1-avx-bmi2-amd64.S b/cipher/sha1-avx-bmi2-amd64.S index cd5af5b..22bcbb3 100644 --- a/cipher/sha1-avx-bmi2-amd64.S +++ b/cipher/sha1-avx-bmi2-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_BMI2) && \ defined(HAVE_GCC_INLINE_ASM_AVX) && defined(USE_SHA1) @@ -40,6 +41,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -206,7 +214,7 @@ */ .text .globl _gcry_sha1_transform_amd64_avx_bmi2 -.type _gcry_sha1_transform_amd64_avx_bmi2, at function +ELF(.type _gcry_sha1_transform_amd64_avx_bmi2, at function) .align 16 _gcry_sha1_transform_amd64_avx_bmi2: /* input: diff --git a/cipher/sha1-ssse3-amd64.S b/cipher/sha1-ssse3-amd64.S index 226988d..98a19e6 100644 --- a/cipher/sha1-ssse3-amd64.S +++ b/cipher/sha1-ssse3-amd64.S @@ -29,7 +29,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && defined(USE_SHA1) #ifdef __PIC__ @@ -39,6 +40,13 @@ #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + /* Context structure */ #define state_h0 0 @@ -220,7 +228,7 @@ */ .text .globl _gcry_sha1_transform_amd64_ssse3 -.type _gcry_sha1_transform_amd64_ssse3, at function +ELF(.type _gcry_sha1_transform_amd64_ssse3, at function) .align 16 _gcry_sha1_transform_amd64_ssse3: /* input: diff --git a/cipher/sha1.c b/cipher/sha1.c index 6ccf0e8..eb42883 100644 --- a/cipher/sha1.c +++ b/cipher/sha1.c @@ -45,22 +45,26 @@ /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif /* USE_AVX indicates whether to compile with Intel AVX code. */ #undef USE_AVX -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AVX 1 #endif /* USE_BMI2 indicates whether to compile with Intel AVX/BMI2 code. */ #undef USE_BMI2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_AVX) && defined(HAVE_GCC_INLINE_ASM_BMI2) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_AVX) && \ + defined(HAVE_GCC_INLINE_ASM_BMI2) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_BMI2 1 #endif @@ -287,22 +291,37 @@ transform_blk (void *ctx, const unsigned char *data) } +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_SSSE3) || defined(USE_AVX) || defined(USE_BMI2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + + #ifdef USE_SSSE3 unsigned int _gcry_sha1_transform_amd64_ssse3 (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif #ifdef USE_AVX unsigned int _gcry_sha1_transform_amd64_avx (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif #ifdef USE_BMI2 unsigned int _gcry_sha1_transform_amd64_avx_bmi2 (void *state, const unsigned char *data, - size_t nblks); + size_t nblks) ASM_FUNC_ABI; #endif @@ -315,17 +334,17 @@ transform (void *ctx, const unsigned char *data, size_t nblks) #ifdef USE_BMI2 if (hd->use_bmi2) return _gcry_sha1_transform_amd64_avx_bmi2 (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_AVX if (hd->use_avx) return _gcry_sha1_transform_amd64_avx (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_SSSE3 if (hd->use_ssse3) return _gcry_sha1_transform_amd64_ssse3 (&hd->h0, data, nblks) - + 4 * sizeof(void*); + + 4 * sizeof(void*) + ASM_EXTRA_STACK; #endif #ifdef USE_NEON if (hd->use_neon) @@ -340,6 +359,14 @@ transform (void *ctx, const unsigned char *data, size_t nblks) } while (--nblks); +#ifdef ASM_EXTRA_STACK + /* 'transform_blk' is typically inlined and XMM6-XMM15 are stored at + * the prologue of this function. Therefore need to add ASM_EXTRA_STACK to + * here too. + */ + burn += ASM_EXTRA_STACK; +#endif + return burn; } commit 4e09aaa36d151c3312019724a77fc09aa345b82f Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64 * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-gcm-intel-pclmul.c b/cipher/cipher-gcm-intel-pclmul.c index 79648ce..a327249 100644 --- a/cipher/cipher-gcm-intel-pclmul.c +++ b/cipher/cipher-gcm-intel-pclmul.c @@ -249,6 +249,17 @@ void _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) { u64 tmp[2]; +#if defined(__x86_64__) && defined(__WIN64__) + char win64tmp[3 * 16]; + + /* XMM6-XMM8 need to be restored after use. */ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" + "movdqu %%xmm7, 1*16(%0)\n\t" + "movdqu %%xmm8, 2*16(%0)\n\t" + : + : "r" (win64tmp) + : "memory"); +#endif /* Swap endianness of hsub. */ tmp[0] = buf_get_be64(c->u_mode.gcm.u_ghash_key.key + 8); @@ -285,6 +296,21 @@ _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) : [h_234] "r" (c->u_mode.gcm.gcm_table) : "memory"); +#ifdef __WIN64__ + /* Clear/restore used registers. */ + asm volatile( "pxor %%xmm0, %%xmm0\n\t" + "pxor %%xmm1, %%xmm1\n\t" + "pxor %%xmm2, %%xmm2\n\t" + "pxor %%xmm3, %%xmm3\n\t" + "pxor %%xmm4, %%xmm4\n\t" + "pxor %%xmm5, %%xmm5\n\t" + "movdqu 0*16(%0), %%xmm6\n\t" + "movdqu 1*16(%0), %%xmm7\n\t" + "movdqu 2*16(%0), %%xmm8\n\t" + : + : "r" (win64tmp) + : "memory"); +#else /* Clear used registers. */ asm volatile( "pxor %%xmm0, %%xmm0\n\t" "pxor %%xmm1, %%xmm1\n\t" @@ -297,6 +323,7 @@ _gcry_ghash_setup_intel_pclmul (gcry_cipher_hd_t c) "pxor %%xmm8, %%xmm8\n\t" ::: "cc" ); #endif +#endif wipememory (tmp, sizeof(tmp)); } @@ -309,10 +336,30 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, static const unsigned char be_mask[16] __attribute__ ((aligned (16))) = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; const unsigned int blocksize = GCRY_GCM_BLOCK_LEN; +#ifdef __WIN64__ + char win64tmp[10 * 16]; +#endif if (nblocks == 0) return 0; +#ifdef __WIN64__ + /* XMM8-XMM15 need to be restored after use. */ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" + "movdqu %%xmm7, 1*16(%0)\n\t" + "movdqu %%xmm8, 2*16(%0)\n\t" + "movdqu %%xmm9, 3*16(%0)\n\t" + "movdqu %%xmm10, 4*16(%0)\n\t" + "movdqu %%xmm11, 5*16(%0)\n\t" + "movdqu %%xmm12, 6*16(%0)\n\t" + "movdqu %%xmm13, 7*16(%0)\n\t" + "movdqu %%xmm14, 8*16(%0)\n\t" + "movdqu %%xmm15, 9*16(%0)\n\t" + : + : "r" (win64tmp) + : "memory" ); +#endif + /* Preload hash and H1. */ asm volatile ("movdqu %[hash], %%xmm1\n\t" "movdqa %[hsub], %%xmm0\n\t" @@ -353,6 +400,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, } while (nblocks >= 4); +#ifndef __WIN64__ /* Clear used x86-64/XMM registers. */ asm volatile( "pxor %%xmm8, %%xmm8\n\t" "pxor %%xmm9, %%xmm9\n\t" @@ -363,6 +411,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, "pxor %%xmm14, %%xmm14\n\t" "pxor %%xmm15, %%xmm15\n\t" ::: "cc" ); +#endif } #endif @@ -385,6 +434,28 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, : [hash] "=m" (*result) : [be_mask] "m" (*be_mask)); +#ifdef __WIN64__ + /* Clear/restore used registers. */ + asm volatile( "pxor %%xmm0, %%xmm0\n\t" + "pxor %%xmm1, %%xmm1\n\t" + "pxor %%xmm2, %%xmm2\n\t" + "pxor %%xmm3, %%xmm3\n\t" + "pxor %%xmm4, %%xmm4\n\t" + "pxor %%xmm5, %%xmm5\n\t" + "movdqu 0*16(%0), %%xmm6\n\t" + "movdqu 1*16(%0), %%xmm7\n\t" + "movdqu 2*16(%0), %%xmm8\n\t" + "movdqu 3*16(%0), %%xmm9\n\t" + "movdqu 4*16(%0), %%xmm10\n\t" + "movdqu 5*16(%0), %%xmm11\n\t" + "movdqu 6*16(%0), %%xmm12\n\t" + "movdqu 7*16(%0), %%xmm13\n\t" + "movdqu 8*16(%0), %%xmm14\n\t" + "movdqu 9*16(%0), %%xmm15\n\t" + : + : "r" (win64tmp) + : "memory" ); +#else /* Clear used registers. */ asm volatile( "pxor %%xmm0, %%xmm0\n\t" "pxor %%xmm1, %%xmm1\n\t" @@ -395,6 +466,7 @@ _gcry_ghash_intel_pclmul (gcry_cipher_hd_t c, byte *result, const byte *buf, "pxor %%xmm6, %%xmm6\n\t" "pxor %%xmm7, %%xmm7\n\t" ::: "cc" ); +#endif return 0; } diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 693f218..e20ea56 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -67,9 +67,7 @@ #if defined(ENABLE_PCLMUL_SUPPORT) && defined(GCM_USE_TABLES) # if ((defined(__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# ifndef __WIN64__ -# define GCM_USE_INTEL_PCLMUL 1 -# endif +# define GCM_USE_INTEL_PCLMUL 1 # endif # endif #endif /* GCM_USE_INTEL_PCLMUL */ diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 147679f..910bc68 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -49,24 +49,54 @@ typedef struct u128_s { u32 a, b, c, d; } u128_t; the use of these macros. There purpose is to make sure that the SSE regsiters are cleared and won't reveal any information about the key or the data. */ -#define aesni_prepare() do { } while (0) -#define aesni_cleanup() \ - do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ - "pxor %%xmm1, %%xmm1\n" :: ); \ - } while (0) -#define aesni_cleanup_2_6() \ - do { asm volatile ("pxor %%xmm2, %%xmm2\n\t" \ - "pxor %%xmm3, %%xmm3\n" \ - "pxor %%xmm4, %%xmm4\n" \ - "pxor %%xmm5, %%xmm5\n" \ - "pxor %%xmm6, %%xmm6\n":: ); \ - } while (0) - +#ifdef __WIN64__ +/* XMM6-XMM15 are callee-saved registers on WIN64. */ +# define aesni_prepare_2_6_variable char win64tmp[16] +# define aesni_prepare() do { } while (0) +# define aesni_prepare_2_6() \ + do { asm volatile ("movdqu %%xmm6, %0\n\t" \ + : "=m" (*win64tmp) \ + : \ + : "memory"); \ + } while (0) +# define aesni_cleanup() \ + do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ + "pxor %%xmm1, %%xmm1\n" :: ); \ + } while (0) +# define aesni_cleanup_2_6() \ + do { asm volatile ("movdqu %0, %%xmm6\n\t" \ + "pxor %%xmm2, %%xmm2\n" \ + "pxor %%xmm3, %%xmm3\n" \ + "pxor %%xmm4, %%xmm4\n" \ + "pxor %%xmm5, %%xmm5\n" \ + : \ + : "m" (*win64tmp) \ + : "memory"); \ + } while (0) +#else +# define aesni_prepare_2_6_variable +# define aesni_prepare() do { } while (0) +# define aesni_prepare_2_6() do { } while (0) +# define aesni_cleanup() \ + do { asm volatile ("pxor %%xmm0, %%xmm0\n\t" \ + "pxor %%xmm1, %%xmm1\n" :: ); \ + } while (0) +# define aesni_cleanup_2_6() \ + do { asm volatile ("pxor %%xmm2, %%xmm2\n\t" \ + "pxor %%xmm3, %%xmm3\n" \ + "pxor %%xmm4, %%xmm4\n" \ + "pxor %%xmm5, %%xmm5\n" \ + "pxor %%xmm6, %%xmm6\n":: ); \ + } while (0) +#endif void _gcry_aes_aesni_do_setkey (RIJNDAEL_context *ctx, const byte *key) { + aesni_prepare_2_6_variable; + aesni_prepare(); + aesni_prepare_2_6(); if (ctx->rounds < 12) { @@ -999,7 +1029,10 @@ _gcry_aes_aesni_cbc_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks, int cbc_mac) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm5\n\t" : /* No output */ @@ -1044,8 +1077,10 @@ _gcry_aes_aesni_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, { static const unsigned char be_mask[16] __attribute__ ((aligned (16))) = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqa %[mask], %%xmm6\n\t" /* Preload mask */ "movdqa %[ctr], %%xmm5\n\t" /* Preload CTR */ @@ -1095,7 +1130,10 @@ _gcry_aes_aesni_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm6\n\t" : /* No output */ @@ -1177,7 +1215,10 @@ _gcry_aes_aesni_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, const unsigned char *inbuf, unsigned char *iv, size_t nblocks) { + aesni_prepare_2_6_variable; + aesni_prepare (); + aesni_prepare_2_6(); asm volatile ("movdqu %[iv], %%xmm5\n\t" /* use xmm5 as fast IV storage */ @@ -1331,8 +1372,10 @@ aesni_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, unsigned char *outbuf = outbuf_arg; const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" @@ -1473,8 +1516,10 @@ aesni_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, unsigned char *outbuf = outbuf_arg; const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" @@ -1625,8 +1670,10 @@ _gcry_aes_aesni_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, RIJNDAEL_context *ctx = (void *)&c->context.c; const unsigned char *abuf = abuf_arg; u64 n = c->u_mode.ocb.aad_nblocks; + aesni_prepare_2_6_variable; aesni_prepare (); + aesni_prepare_2_6 (); /* Preload Offset and Sum */ asm volatile ("movdqu %[iv], %%xmm5\n\t" diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index bd247a9..33ca53f 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -44,8 +44,9 @@ #endif /* USE_SSSE3 indicates whether to use SSSE3 code. */ -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ - defined(HAVE_GCC_INLINE_ASM_SSSE3) +#if defined(__x86_64__) && defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSSE3 1 #endif @@ -75,9 +76,7 @@ #ifdef ENABLE_AESNI_SUPPORT # if ((defined (__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# ifndef __WIN64__ -# define USE_AESNI 1 -# endif +# define USE_AESNI 1 # endif # endif #endif /* ENABLE_AESNI_SUPPORT */ diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index 3f1b352..21438dc 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -61,7 +61,60 @@ the use of these macros. There purpose is to make sure that the SSE registers are cleared and won't reveal any information about the key or the data. */ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +/* XMM6-XMM15 are callee-saved registers on WIN64. */ +# define vpaes_ssse3_prepare() \ + char win64tmp[16 * 10]; \ + asm volatile ("movdqu %%xmm6, 0*16(%0)\n\t" \ + "movdqu %%xmm7, 1*16(%0)\n\t" \ + "movdqu %%xmm8, 2*16(%0)\n\t" \ + "movdqu %%xmm9, 3*16(%0)\n\t" \ + "movdqu %%xmm10, 4*16(%0)\n\t" \ + "movdqu %%xmm11, 5*16(%0)\n\t" \ + "movdqu %%xmm12, 6*16(%0)\n\t" \ + "movdqu %%xmm13, 7*16(%0)\n\t" \ + "movdqu %%xmm14, 8*16(%0)\n\t" \ + "movdqu %%xmm15, 9*16(%0)\n\t" \ + : \ + : "r" (win64tmp) \ + : "memory" ) +# define vpaes_ssse3_cleanup() \ + asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ + "pxor %%xmm1, %%xmm1 \n\t" \ + "pxor %%xmm2, %%xmm2 \n\t" \ + "pxor %%xmm3, %%xmm3 \n\t" \ + "pxor %%xmm4, %%xmm4 \n\t" \ + "pxor %%xmm5, %%xmm5 \n\t" \ + "movdqu 0*16(%0), %%xmm6 \n\t" \ + "movdqu 1*16(%0), %%xmm7 \n\t" \ + "movdqu 2*16(%0), %%xmm8 \n\t" \ + "movdqu 3*16(%0), %%xmm9 \n\t" \ + "movdqu 4*16(%0), %%xmm10 \n\t" \ + "movdqu 5*16(%0), %%xmm11 \n\t" \ + "movdqu 6*16(%0), %%xmm12 \n\t" \ + "movdqu 7*16(%0), %%xmm13 \n\t" \ + "movdqu 8*16(%0), %%xmm14 \n\t" \ + "movdqu 9*16(%0), %%xmm15 \n\t" \ + : \ + : "r" (win64tmp) \ + : "memory" ) +#else +# define vpaes_ssse3_prepare() /*_*/ +# define vpaes_ssse3_cleanup() \ + asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ + "pxor %%xmm1, %%xmm1 \n\t" \ + "pxor %%xmm2, %%xmm2 \n\t" \ + "pxor %%xmm3, %%xmm3 \n\t" \ + "pxor %%xmm4, %%xmm4 \n\t" \ + "pxor %%xmm5, %%xmm5 \n\t" \ + "pxor %%xmm6, %%xmm6 \n\t" \ + "pxor %%xmm7, %%xmm7 \n\t" \ + "pxor %%xmm8, %%xmm8 \n\t" \ + ::: "memory" ) +#endif + #define vpaes_ssse3_prepare_enc(const_ptr) \ + vpaes_ssse3_prepare(); \ asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ "movdqa (%q0), %%xmm9 # 0F \n\t" \ "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ @@ -75,6 +128,7 @@ : "memory" ) #define vpaes_ssse3_prepare_dec(const_ptr) \ + vpaes_ssse3_prepare(); \ asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ "movdqa (%q0), %%xmm9 # 0F \n\t" \ "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ @@ -88,17 +142,6 @@ : \ : "memory" ) -#define vpaes_ssse3_cleanup() \ - asm volatile ("pxor %%xmm0, %%xmm0 \n\t" \ - "pxor %%xmm1, %%xmm1 \n\t" \ - "pxor %%xmm2, %%xmm2 \n\t" \ - "pxor %%xmm3, %%xmm3 \n\t" \ - "pxor %%xmm4, %%xmm4 \n\t" \ - "pxor %%xmm5, %%xmm5 \n\t" \ - "pxor %%xmm6, %%xmm6 \n\t" \ - "pxor %%xmm7, %%xmm7 \n\t" \ - "pxor %%xmm8, %%xmm8 \n\t" \ - ::: "memory" ) void @@ -106,6 +149,8 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) { unsigned int keybits = (ctx->rounds - 10) * 32 + 128; + vpaes_ssse3_prepare(); + asm volatile ("leaq %q[key], %%rdi" "\n\t" "movl %[bits], %%esi" "\n\t" "leaq %[buf], %%rdx" "\n\t" @@ -121,6 +166,8 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) : "r8", "r9", "r10", "r11", "rax", "rcx", "rdx", "rdi", "rsi", "cc", "memory"); + vpaes_ssse3_cleanup(); + /* Save key for setting up decryption. */ memcpy(&ctx->keyschdec32[0][0], key, keybits / 8); } @@ -132,6 +179,8 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) { unsigned int keybits = (ctx->rounds - 10) * 32 + 128; + vpaes_ssse3_prepare(); + asm volatile ("leaq %q[key], %%rdi" "\n\t" "movl %[bits], %%esi" "\n\t" "leaq %[buf], %%rdx" "\n\t" @@ -146,6 +195,8 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) [rotoffs] "g" ((keybits == 192) ? 0 : 32) : "r8", "r9", "r10", "r11", "rax", "rcx", "rdx", "rdi", "rsi", "cc", "memory"); + + vpaes_ssse3_cleanup(); } @@ -465,6 +516,11 @@ _gcry_aes_ssse3_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, } +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define X(...) +#else +# define X(...) __VA_ARGS__ +#endif asm ( "\n\t" "##" @@ -494,7 +550,7 @@ asm ( "\n\t" "##" "\n\t" "##" "\n\t" ".align 16" - "\n\t" ".type _aes_encrypt_core, at function" +X("\n\t" ".type _aes_encrypt_core, at function") "\n\t" "_aes_encrypt_core:" "\n\t" " leaq .Lk_mc_backward(%rcx), %rdi" "\n\t" " mov $16, %rsi" @@ -570,7 +626,7 @@ asm ( "\n\t" " pxor %xmm4, %xmm0 # 0 = A" "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" "\n\t" " ret" - "\n\t" ".size _aes_encrypt_core,.-_aes_encrypt_core" +X("\n\t" ".size _aes_encrypt_core,.-_aes_encrypt_core") "\n\t" "##" "\n\t" "## Decryption core" @@ -578,7 +634,7 @@ asm ( "\n\t" "## Same API as encryption core." "\n\t" "##" "\n\t" ".align 16" - "\n\t" ".type _aes_decrypt_core, at function" +X("\n\t" ".type _aes_decrypt_core, at function") "\n\t" "_aes_decrypt_core:" "\n\t" " movl %eax, %esi" "\n\t" " shll $4, %esi" @@ -670,7 +726,7 @@ asm ( "\n\t" " pxor %xmm4, %xmm0 # 0 = A" "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" "\n\t" " ret" - "\n\t" ".size _aes_decrypt_core,.-_aes_decrypt_core" +X("\n\t" ".size _aes_decrypt_core,.-_aes_decrypt_core") "\n\t" "########################################################" "\n\t" "## ##" @@ -679,7 +735,7 @@ asm ( "\n\t" "########################################################" "\n\t" ".align 16" - "\n\t" ".type _aes_schedule_core, at function" +X("\n\t" ".type _aes_schedule_core, at function") "\n\t" "_aes_schedule_core:" "\n\t" " # rdi = key" "\n\t" " # rsi = size in bits" @@ -1039,7 +1095,7 @@ asm ( "\n\t" " pxor %xmm7, %xmm7" "\n\t" " pxor %xmm8, %xmm8" "\n\t" " ret" - "\n\t" ".size _aes_schedule_core,.-_aes_schedule_core" +X("\n\t" ".size _aes_schedule_core,.-_aes_schedule_core") "\n\t" "########################################################" "\n\t" "## ##" @@ -1048,7 +1104,7 @@ asm ( "\n\t" "########################################################" "\n\t" ".align 16" - "\n\t" ".type _aes_consts, at object" +X("\n\t" ".type _aes_consts, at object") "\n\t" ".Laes_consts:" "\n\t" "_aes_consts:" "\n\t" " # s0F" @@ -1226,7 +1282,7 @@ asm ( "\n\t" " .quad 0xC7AA6DB9D4943E2D" "\n\t" " .quad 0x12D7560F93441D00" "\n\t" " .quad 0xCA4B8159D8C58E9C" - "\n\t" ".size _aes_consts,.-_aes_consts" +X("\n\t" ".size _aes_consts,.-_aes_consts") ); #endif /* USE_SSSE3 */ diff --git a/configure.ac b/configure.ac index 594209f..0f16175 100644 --- a/configure.ac +++ b/configure.ac @@ -1127,6 +1127,93 @@ fi #### #### ############################################# + +# Following tests depend on warnings to cause compile to fail, so set -Werror +# temporarily. +_gcc_cflags_save=$CFLAGS +CFLAGS="$CFLAGS -Werror" + + +# +# Check whether compiler supports 'ms_abi' function attribute. +# +AC_CACHE_CHECK([whether compiler supports 'ms_abi' function attribute], + [gcry_cv_gcc_attribute_ms_abi], + [gcry_cv_gcc_attribute_ms_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[int __attribute__ ((ms_abi)) proto(int);]])], + [gcry_cv_gcc_attribute_ms_abi=yes])]) +if test "$gcry_cv_gcc_attribute_ms_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_MS_ABI,1, + [Defined if compiler supports "__attribute__ ((ms_abi))" function attribute]) +fi + + +# +# Check whether compiler supports 'sysv_abi' function attribute. +# +AC_CACHE_CHECK([whether compiler supports 'sysv_abi' function attribute], + [gcry_cv_gcc_attribute_sysv_abi], + [gcry_cv_gcc_attribute_sysv_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[int __attribute__ ((sysv_abi)) proto(int);]])], + [gcry_cv_gcc_attribute_sysv_abi=yes])]) +if test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_SYSV_ABI,1, + [Defined if compiler supports "__attribute__ ((sysv_abi))" function attribute]) +fi + + +# +# Check whether default calling convention is 'ms_abi'. +# +if test "$gcry_cv_gcc_attribute_ms_abi" = "yes" ; then + AC_CACHE_CHECK([whether default calling convention is 'ms_abi'], + [gcry_cv_gcc_default_abi_is_ms_abi], + [gcry_cv_gcc_default_abi_is_ms_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[void *test(void) { + void *(*def_func)(void) = test; + void *__attribute__((ms_abi))(*msabi_func)(void); + /* warning on SysV abi targets, passes on Windows based targets */ + msabi_func = def_func; + return msabi_func; + }]])], + [gcry_cv_gcc_default_abi_is_ms_abi=yes])]) + if test "$gcry_cv_gcc_default_abi_is_ms_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_DEFAULT_ABI_IS_MS_ABI,1, + [Defined if default calling convention is 'ms_abi']) + fi +fi + + +# +# Check whether default calling convention is 'sysv_abi'. +# +if test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" ; then + AC_CACHE_CHECK([whether default calling convention is 'sysv_abi'], + [gcry_cv_gcc_default_abi_is_sysv_abi], + [gcry_cv_gcc_default_abi_is_sysv_abi=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[void *test(void) { + void *(*def_func)(void) = test; + void *__attribute__((sysv_abi))(*sysvabi_func)(void); + /* warning on MS ABI targets, passes on SysV ABI targets */ + sysvabi_func = def_func; + return sysvabi_func; + }]])], + [gcry_cv_gcc_default_abi_is_sysv_abi=yes])]) + if test "$gcry_cv_gcc_default_abi_is_sysv_abi" = "yes" ; then + AC_DEFINE(HAVE_GCC_DEFAULT_ABI_IS_SYSV_ABI,1, + [Defined if default calling convention is 'sysv_abi']) + fi +fi + + +# Restore flags. +CFLAGS=$_gcc_cflags_save; + + # # Check whether GCC inline assembler supports SSSE3 instructions # This is required for the AES-NI instructions. @@ -1281,9 +1368,6 @@ if test $amd64_as_feature_detection = yes; then [[__asm__( /* Test if '.type' and '.size' are supported. */ /* These work only on ELF targets. */ - /* TODO: add COFF (mingw64, cygwin64) support to assembly - * implementations. Mingw64/cygwin64 also require additional - * work because they use different calling convention. */ "asmfunc:\n\t" ".size asmfunc,.-asmfunc;\n\t" ".type asmfunc, at function;\n\t" @@ -1299,6 +1383,24 @@ if test $amd64_as_feature_detection = yes; then AC_DEFINE(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS,1, [Defined if underlying assembler is compatible with amd64 assembly implementations]) fi + if test "$gcry_cv_gcc_amd64_platform_as_ok" = "no" && + test "$gcry_cv_gcc_attribute_sysv_abi" = "yes" && + test "$gcry_cv_gcc_default_abi_is_ms_abi" = "yes"; then + AC_CACHE_CHECK([whether GCC assembler is compatible for WIN64 assembly implementations], + [gcry_cv_gcc_win64_platform_as_ok], + [gcry_cv_gcc_win64_platform_as_ok=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[__asm__( + ".globl asmfunc\n\t" + "asmfunc:\n\t" + "xorq \$(1234), %rbp;\n\t" + );]])], + [gcry_cv_gcc_win64_platform_as_ok=yes])]) + if test "$gcry_cv_gcc_win64_platform_as_ok" = "yes" ; then + AC_DEFINE(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS,1, + [Defined if underlying assembler is compatible with WIN64 assembly implementations]) + fi + fi fi commit 460355f23e770637d29e3af7b998a957a2b5bc88 Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Add W64 support for mpi amd64 assembly acinclude.m4 (GNUPG_SYS_SYMBOL_UNDERSCORE): Set 'ac_cv_sys_symbol_underscore=no' on MingW-W64. mpi/amd64/func_abi.h: New. mpi/amd64/mpih-add1.S (_gcry_mpih_add_n): Add FUNC_ENTRY and FUNC_EXIT. mpi/amd64/mpih-lshift.S (_gcry_mpih_lshift): Ditto. mpi/amd64/mpih-mul1.S (_gcry_mpih_mul_1): Ditto. mpi/amd64/mpih-mul2.S (_gcry_mpih_addmul_1): Ditto. mpi/amd64/mpih-mul3.S (_gcry_mpih_submul_1): Ditto. mpi/amd64/mpih-rshift.S (_gcry_mpih_rshift): Ditto. mpi/amd64/mpih-sub1.S (_gcry_mpih_sub_n): Ditto. mpi/config.links [host=x86_64-*mingw*]: Enable assembly modules. [host=x86_64-*-*]: Append mpi/amd64/func_abi.h to mpi/asm-syntax.h. -- Signed-off-by: Jussi Kivilinna diff --git a/acinclude.m4 b/acinclude.m4 index 0791b84..764efd4 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -101,9 +101,12 @@ AC_DEFUN([GNUPG_CHECK_GNUMAKE], AC_DEFUN([GNUPG_SYS_SYMBOL_UNDERSCORE], [tmp_do_check="no" case "${host}" in - *-mingw32*) + i?86-*-mingw32*) ac_cv_sys_symbol_underscore=yes ;; + x86_64-*-mingw32*) + ac_cv_sys_symbol_underscore=no + ;; i386-emx-os2 | i[3456]86-pc-os2*emx | i386-pc-msdosdjgpp) ac_cv_sys_symbol_underscore=yes ;; diff --git a/mpi/amd64/func_abi.h b/mpi/amd64/func_abi.h new file mode 100644 index 0000000..ce44674 --- /dev/null +++ b/mpi/amd64/func_abi.h @@ -0,0 +1,19 @@ +#ifdef USE_MS_ABI + /* Store registers and move four first input arguments from MS ABI to + * SYSV ABI. */ + #define FUNC_ENTRY() \ + pushq %rsi; \ + pushq %rdi; \ + movq %rdx, %rsi; \ + movq %rcx, %rdi; \ + movq %r8, %rdx; \ + movq %r9, %rcx; + + /* Restore registers. */ + #define FUNC_EXIT() \ + popq %rdi; \ + popq %rsi; +#else + #define FUNC_ENTRY() /**/ + #define FUNC_EXIT() /**/ +#endif diff --git a/mpi/amd64/mpih-add1.S b/mpi/amd64/mpih-add1.S index f0ec89c..6a90262 100644 --- a/mpi/amd64/mpih-add1.S +++ b/mpi/amd64/mpih-add1.S @@ -43,6 +43,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_add_n) C_SYMBOL_NAME(_gcry_mpih_add_n:) + FUNC_ENTRY() leaq (%rsi,%rcx,8), %rsi leaq (%rdi,%rcx,8), %rdi leaq (%rdx,%rcx,8), %rdx @@ -59,5 +60,6 @@ C_SYMBOL_NAME(_gcry_mpih_add_n:) movq %rcx, %rax /* zero %rax */ adcq %rax, %rax + FUNC_EXIT() ret \ No newline at end of file diff --git a/mpi/amd64/mpih-lshift.S b/mpi/amd64/mpih-lshift.S index e87dd1a..9e8979b 100644 --- a/mpi/amd64/mpih-lshift.S +++ b/mpi/amd64/mpih-lshift.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_lshift) C_SYMBOL_NAME(_gcry_mpih_lshift:) + FUNC_ENTRY() movq -8(%rsi,%rdx,8), %mm7 movd %ecx, %mm1 movl $64, %eax @@ -74,4 +75,5 @@ C_SYMBOL_NAME(_gcry_mpih_lshift:) .Lende: psllq %mm1, %mm2 movq %mm2, (%rdi) emms + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul1.S b/mpi/amd64/mpih-mul1.S index 54b0ab4..67ab47e 100644 --- a/mpi/amd64/mpih-mul1.S +++ b/mpi/amd64/mpih-mul1.S @@ -46,6 +46,7 @@ GLOBL C_SYMBOL_NAME(_gcry_mpih_mul_1) C_SYMBOL_NAME(_gcry_mpih_mul_1:) + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%rdx,8), %rsi leaq (%rdi,%rdx,8), %rdi @@ -62,4 +63,5 @@ C_SYMBOL_NAME(_gcry_mpih_mul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul2.S b/mpi/amd64/mpih-mul2.S index a332a1d..1aa4fa0 100644 --- a/mpi/amd64/mpih-mul2.S +++ b/mpi/amd64/mpih-mul2.S @@ -41,6 +41,7 @@ TEXT GLOBL C_SYMBOL_NAME(_gcry_mpih_addmul_1) C_SYMBOL_NAME(_gcry_mpih_addmul_1:) + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%rdx,8), %rsi leaq (%rdi,%rdx,8), %rdi @@ -61,4 +62,5 @@ C_SYMBOL_NAME(_gcry_mpih_addmul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-mul3.S b/mpi/amd64/mpih-mul3.S index 4d458a7..bc41c4e 100644 --- a/mpi/amd64/mpih-mul3.S +++ b/mpi/amd64/mpih-mul3.S @@ -42,7 +42,7 @@ TEXT GLOBL C_SYMBOL_NAME(_gcry_mpih_submul_1) C_SYMBOL_NAME(_gcry_mpih_submul_1:) - + FUNC_ENTRY() movq %rdx, %r11 leaq (%rsi,%r11,8), %rsi leaq (%rdi,%r11,8), %rdi @@ -63,4 +63,5 @@ C_SYMBOL_NAME(_gcry_mpih_submul_1:) jne .Loop movq %r8, %rax + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-rshift.S b/mpi/amd64/mpih-rshift.S index 4cfc8f6..311b85b 100644 --- a/mpi/amd64/mpih-rshift.S +++ b/mpi/amd64/mpih-rshift.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_rshift) C_SYMBOL_NAME(_gcry_mpih_rshift:) + FUNC_ENTRY() movq (%rsi), %mm7 movd %ecx, %mm1 movl $64, %eax @@ -77,4 +78,5 @@ C_SYMBOL_NAME(_gcry_mpih_rshift:) .Lende: psrlq %mm1, %mm2 movq %mm2, -8(%rdi) emms + FUNC_EXIT() ret diff --git a/mpi/amd64/mpih-sub1.S b/mpi/amd64/mpih-sub1.S index b3609b0..ccf6496 100644 --- a/mpi/amd64/mpih-sub1.S +++ b/mpi/amd64/mpih-sub1.S @@ -42,6 +42,7 @@ .text .globl C_SYMBOL_NAME(_gcry_mpih_sub_n) C_SYMBOL_NAME(_gcry_mpih_sub_n:) + FUNC_ENTRY() leaq (%rsi,%rcx,8), %rsi leaq (%rdi,%rcx,8), %rdi leaq (%rdx,%rcx,8), %rdx @@ -58,4 +59,5 @@ C_SYMBOL_NAME(_gcry_mpih_sub_n:) movq %rcx, %rax /* zero %rax */ adcq %rax, %rax + FUNC_EXIT() ret diff --git a/mpi/config.links b/mpi/config.links index d71918a..2fb5e8a 100644 --- a/mpi/config.links +++ b/mpi/config.links @@ -129,17 +129,22 @@ case "${host}" in x86_64-apple-darwin*) echo '#define BSD_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h path="amd64" mpi_cpu_arch="x86" ;; x86_64-*mingw32*) - echo '/* No working assembler modules available */' >>./mpi/asm-syntax.h - path="" - mpi_cpu_arch="x86" + echo '#define USE_MS_ABI' >>./mpi/asm-syntax.h + echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h + cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h + path="amd64" + mpi_cpu_arch="x86" ;; x86_64-*-*) echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h + cat $srcdir/mpi/amd64/func_abi.h >>./mpi/asm-syntax.h path="amd64" mpi_cpu_arch="x86" ;; @@ -314,7 +319,7 @@ case "${host}" in echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/powerpc32/syntax.h >>./mpi/asm-syntax.h path="powerpc32" - mpi_cpu_arch="ppc" + mpi_cpu_arch="ppc" ;; rs6000-*-aix[456789]* | \ commit 6c21cf5fed1ad430fa41445eac2350802bc8aaed Author: Jussi Kivilinna Date: Fri May 1 19:15:34 2015 +0300 DES: Silence compiler warnings on Windows * cipher/des.c (working_memcmp): Make pointer arguments 'const void *'. -- Following warning seen on Windows target build: des.c: In function 'is_weak_key': des.c:1019:40: warning: pointer targets in passing argument 1 of 'working_memcmp' differ in signedness [-Wpointer-sign] if ( !(cmp_result=working_memcmp(work, weak_keys[middle], 8)) ) ^ des.c:149:1: note: expected 'const char *' but argument is of type 'unsigned char *' working_memcmp( const char *a, const char *b, size_t n ) ^ des.c:1019:46: warning: pointer targets in passing argument 2 of 'working_memcmp' differ in signedness [-Wpointer-sign] if ( !(cmp_result=working_memcmp(work, weak_keys[middle], 8)) ) ^ des.c:149:1: note: expected 'const char *' but argument is of type 'unsigned char *' working_memcmp( const char *a, const char *b, size_t n ) ^ Signed-off-by: Jussi Kivilinna diff --git a/cipher/des.c b/cipher/des.c index bc2a474..d4863d1 100644 --- a/cipher/des.c +++ b/cipher/des.c @@ -146,8 +146,10 @@ * depending on whether characters are signed or not. */ static int -working_memcmp( const char *a, const char *b, size_t n ) +working_memcmp( const void *_a, const void *_b, size_t n ) { + const char *a = _a; + const char *b = _b; for( ; n; n--, a++, b++ ) if( *a != *b ) return (int)(*(byte*)a) - (int)(*(byte*)b); commit 9cf224322007d90193d4910f0da6e0e29ce01d70 Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Cast pointers to integers using uintptr_t instead of long diff --git a/cipher/cipher.c b/cipher/cipher.c index d1550c0..7a29824 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -481,11 +481,11 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, size_t off = 0; #ifdef NEED_16BYTE_ALIGNED_CONTEXT - if ( ((unsigned long)h & 0x0f) ) + if ( ((uintptr_t)h & 0x0f) ) { /* The malloced block is not aligned on a 16 byte boundary. Correct for this. */ - off = 16 - ((unsigned long)h & 0x0f); + off = 16 - ((uintptr_t)h & 0x0f); h = (void*)((char*)h + off); } #endif /*NEED_16BYTE_ALIGNED_CONTEXT*/ diff --git a/cipher/md.c b/cipher/md.c index 9fef555..3ab46ef 100644 --- a/cipher/md.c +++ b/cipher/md.c @@ -1148,7 +1148,7 @@ md_stop_debug( gcry_md_hd_t md ) #ifdef HAVE_U64_TYPEDEF { /* a kludge to pull in the __muldi3 for Solaris */ - volatile u32 a = (u32)(ulong)md; + volatile u32 a = (u32)(uintptr_t)md; volatile u64 b = 42; volatile u64 c; c = a * b; commit d5a7e00b6b222566a5650639ef29684b047c1909 Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Fix rndhw for 64-bit Windows build * configure.ac: Add sizeof check for 'void *'. * random/rndhw.c (poll_padlock): Check for SIZEOF_VOID_P == 8 instead of defined(__LP64__). (RDRAND_LONG): Check for SIZEOF_UNSIGNED_LONG == 8 instead of defined(__LP64__). -- __LP64__ is not predefined for 64-bit mingw64-gcc, which caused wrong assembly code selections. Do selection based on type sizes instead, to support x86_64, x32 and win64 properly. Signed-off-by: Jussi Kivilinna diff --git a/configure.ac b/configure.ac index 555ad1e..594209f 100644 --- a/configure.ac +++ b/configure.ac @@ -344,6 +344,7 @@ AC_CHECK_SIZEOF(unsigned short, 2) AC_CHECK_SIZEOF(unsigned int, 4) AC_CHECK_SIZEOF(unsigned long, 4) AC_CHECK_SIZEOF(unsigned long long, 0) +AC_CHECK_SIZEOF(void *, 0) AC_TYPE_UINTPTR_T diff --git a/random/rndhw.c b/random/rndhw.c index e625512..8e50751 100644 --- a/random/rndhw.c +++ b/random/rndhw.c @@ -69,7 +69,7 @@ poll_padlock (void (*add)(const void*, size_t, enum random_origins), nbytes = 0; while (nbytes < 64) { -#if defined(__x86_64__) && defined(__LP64__) +#if defined(__x86_64__) && SIZEOF_VOID_P == 8 asm volatile ("movq %1, %%rdi\n\t" /* Set buffer. */ "xorq %%rdx, %%rdx\n\t" /* Request up to 8 bytes. */ @@ -123,7 +123,7 @@ poll_padlock (void (*add)(const void*, size_t, enum random_origins), #ifdef USE_DRNG # define RDRAND_RETRY_LOOPS 10 # define RDRAND_INT ".byte 0x0f,0xc7,0xf0" -# if defined(__x86_64__) && defined(__LP64__) +# if defined(__x86_64__) && SIZEOF_UNSIGNED_LONG == 8 # define RDRAND_LONG ".byte 0x48,0x0f,0xc7,0xf0" # else # define RDRAND_LONG RDRAND_INT commit 0cdd24456b33defc7f8176fa82ab694fbc284385 Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Prepare random/win32.c fast poll for 64-bit Windows * random/win32.c (_gcry_rndw32_gather_random_fast) [ADD]: Rename to ADDINT. (_gcry_rndw32_gather_random_fast): Add ADDPTR. (_gcry_rndw32_gather_random_fast): Disable entropy gathering from GetQueueStatus(QS_ALLEVENTS). (_gcry_rndw32_gather_random_fast): Change minimumWorkingSetSize and maximumWorkingSetSize to SIZE_T from DWORD. (_gcry_rndw32_gather_random_fast): Only add lower 32-bits of minimumWorkingSetSize and maximumWorkingSetSize to random poll. (_gcry_rndw32_gather_random_fast) [__WIN64__]: Read TSC directly using intrinsic. -- Introduce entropy gatherer changes related to 64-bit Windows platform as done in cryptlib fast poll: - Change ADD macro to ADDPTR/ADDINT to handle pointer values. ADDPTR discards high 32-bits of 64-bit pointer values. - minimum/maximumWorkingSetSize changed to SIZE_T type to avoid stack corruption on 64-bit; only low 32-bits are used for entropy. - Use __rdtsc() intrinsic on 64-bit (as TSC is always available). Signed-off-by: Jussi Kivilinna diff --git a/random/rndw32.c b/random/rndw32.c index c495131..4ab1bca 100644 --- a/random/rndw32.c +++ b/random/rndw32.c @@ -826,39 +826,47 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, cursor position for last message, 1 ms time for last message, handle of window with clipboard open, handle of process heap, handle of procs window station, types of events in input queue, - and milliseconds since Windows was started. */ + and milliseconds since Windows was started. On 64-bit platform + some of these return values are pointers and thus 64-bit wide. + We discard the upper 32-bit of those values. */ { byte buffer[20*sizeof(ulong)], *bufptr; bufptr = buffer; -#define ADD(f) do { ulong along = (ulong)(f); \ - memcpy (bufptr, &along, sizeof (along) ); \ - bufptr += sizeof (along); \ - } while (0) - - ADD ( GetActiveWindow ()); - ADD ( GetCapture ()); - ADD ( GetClipboardOwner ()); - ADD ( GetClipboardViewer ()); - ADD ( GetCurrentProcess ()); - ADD ( GetCurrentProcessId ()); - ADD ( GetCurrentThread ()); - ADD ( GetCurrentThreadId ()); - ADD ( GetDesktopWindow ()); - ADD ( GetFocus ()); - ADD ( GetInputState ()); - ADD ( GetMessagePos ()); - ADD ( GetMessageTime ()); - ADD ( GetOpenClipboardWindow ()); - ADD ( GetProcessHeap ()); - ADD ( GetProcessWindowStation ()); - ADD ( GetQueueStatus (QS_ALLEVENTS)); - ADD ( GetTickCount ()); +#define ADDINT(f) do { ulong along = (ulong)(f); \ + memcpy (bufptr, &along, sizeof (along) ); \ + bufptr += sizeof (along); \ + } while (0) +#define ADDPTR(f) do { void *aptr = (f); \ + ADDINT((SIZE_T)aptr); \ + } while (0) + + ADDPTR ( GetActiveWindow ()); + ADDPTR ( GetCapture ()); + ADDPTR ( GetClipboardOwner ()); + ADDPTR ( GetClipboardViewer ()); + ADDPTR ( GetCurrentProcess ()); + ADDINT ( GetCurrentProcessId ()); + ADDPTR ( GetCurrentThread ()); + ADDINT ( GetCurrentThreadId ()); + ADDPTR ( GetDesktopWindow ()); + ADDPTR ( GetFocus ()); + ADDINT ( GetInputState ()); + ADDINT ( GetMessagePos ()); + ADDINT ( GetMessageTime ()); + ADDPTR ( GetOpenClipboardWindow ()); + ADDPTR ( GetProcessHeap ()); + ADDPTR ( GetProcessWindowStation ()); + /* Following function in some cases stops returning events, and cannot + be used as an entropy source. */ + /*ADDINT ( GetQueueStatus (QS_ALLEVENTS));*/ + ADDINT ( GetTickCount ()); gcry_assert ( bufptr-buffer < sizeof (buffer) ); (*add) ( buffer, bufptr-buffer, origin ); -#undef ADD +#undef ADDINT +#undef ADDPTR } /* Get multiword system information: Current caret position, current @@ -888,7 +896,7 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, { HANDLE handle; FILETIME creationTime, exitTime, kernelTime, userTime; - DWORD minimumWorkingSetSize, maximumWorkingSetSize; + SIZE_T minimumWorkingSetSize, maximumWorkingSetSize; handle = GetCurrentThread (); GetThreadTimes (handle, &creationTime, &exitTime, @@ -910,10 +918,9 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, process. */ GetProcessWorkingSetSize (handle, &minimumWorkingSetSize, &maximumWorkingSetSize); - (*add) ( &minimumWorkingSetSize, - sizeof (minimumWorkingSetSize), origin ); - (*add) ( &maximumWorkingSetSize, - sizeof (maximumWorkingSetSize), origin ); + /* On 64-bit system, discard the high 32-bits. */ + (*add) ( &minimumWorkingSetSize, sizeof (int), origin ); + (*add) ( &maximumWorkingSetSize, sizeof (int), origin ); } @@ -961,7 +968,20 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, To make things unambiguous, we detect a CPU new enough to call RDTSC directly by checking for CPUID capabilities, and fall back to QPC if - this isn't present. */ + this isn't present. + + On AMD64, TSC is always available and intrinsic is provided for accessing + it. */ +#ifdef __WIN64__ + { + unsigned __int64 aint64; + + /* Note: cryptlib does not discard upper 32 bits of TSC on WIN64, but does + * on WIN32. Is this correct? */ + aint64 = __rdtsc(); + (*add) (&aint64, sizeof(aint64), origin); + } +#else #ifdef __GNUC__ /* FIXME: We would need to implement the CPU feature tests first. */ /* if (cpu_has_feature_rdtsc) */ @@ -990,6 +1010,7 @@ _gcry_rndw32_gather_random_fast (void (*add)(const void*, size_t, (*add) (&aword, sizeof (aword), origin ); } } +#endif /*__WIN64__*/ } commit f701954555340a503f6e52cc18d58b0c515427b7 Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Disable GCM and AES-NI assembly implementations for WIN64 * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when __WIN64__ defined. * cipher/rijndael-internal.h (USE_AESNI): Ditto. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index e20ea56..693f218 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -67,7 +67,9 @@ #if defined(ENABLE_PCLMUL_SUPPORT) && defined(GCM_USE_TABLES) # if ((defined(__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# define GCM_USE_INTEL_PCLMUL 1 +# ifndef __WIN64__ +# define GCM_USE_INTEL_PCLMUL 1 +# endif # endif # endif #endif /* GCM_USE_INTEL_PCLMUL */ diff --git a/cipher/rijndael-internal.h b/cipher/rijndael-internal.h index 854980b..bd247a9 100644 --- a/cipher/rijndael-internal.h +++ b/cipher/rijndael-internal.h @@ -75,7 +75,9 @@ #ifdef ENABLE_AESNI_SUPPORT # if ((defined (__i386__) && SIZEOF_UNSIGNED_LONG == 4) || defined(__x86_64__)) # if __GNUC__ >= 4 -# define USE_AESNI 1 +# ifndef __WIN64__ +# define USE_AESNI 1 +# endif # endif # endif #endif /* ENABLE_AESNI_SUPPORT */ commit e78560a4b717f7154f910a8ce4128de152f586da Author: Jussi Kivilinna Date: Wed Apr 29 18:18:07 2015 +0300 Disable building mpi assembly routines on WIN64 * mpi/config.links: Disable assembly for host 'x86_64-*mingw32*'. -- Signed-off-by: Jussi Kivilinna diff --git a/mpi/config.links b/mpi/config.links index f44299d..d71918a 100644 --- a/mpi/config.links +++ b/mpi/config.links @@ -132,6 +132,11 @@ case "${host}" in path="amd64" mpi_cpu_arch="x86" ;; + x86_64-*mingw32*) + echo '/* No working assembler modules available */' >>./mpi/asm-syntax.h + path="" + mpi_cpu_arch="x86" + ;; x86_64-*-*) echo '#define ELF_SYNTAX' >>./mpi/asm-syntax.h cat $srcdir/mpi/i386/syntax.h >>./mpi/asm-syntax.h ----------------------------------------------------------------------- Summary of changes: acinclude.m4 | 5 +- cipher/cipher-gcm-intel-pclmul.c | 72 ++++++++++++++++++++++++++ cipher/cipher.c | 4 +- cipher/des.c | 4 +- cipher/md.c | 2 +- cipher/rijndael-aesni.c | 73 +++++++++++++++++++++----- cipher/rijndael-amd64.S | 17 ++++-- cipher/rijndael-internal.h | 8 +-- cipher/rijndael-ssse3-amd64.c | 94 ++++++++++++++++++++++++++------- cipher/rijndael.c | 34 ++++++++++++ cipher/sha1-avx-amd64.S | 12 ++++- cipher/sha1-avx-bmi2-amd64.S | 12 ++++- cipher/sha1-ssse3-amd64.S | 12 ++++- cipher/sha1.c | 51 +++++++++++++----- cipher/sha256-avx-amd64.S | 11 +++- cipher/sha256-avx2-bmi2-amd64.S | 11 +++- cipher/sha256-ssse3-amd64.S | 11 +++- cipher/sha256.c | 60 +++++++++++++++------ cipher/sha512-avx-amd64.S | 11 +++- cipher/sha512-avx2-bmi2-amd64.S | 11 +++- cipher/sha512-ssse3-amd64.S | 11 +++- cipher/sha512.c | 60 +++++++++++++++------ cipher/whirlpool-sse2-amd64.S | 13 +++-- cipher/whirlpool.c | 15 ++++-- configure.ac | 109 +++++++++++++++++++++++++++++++++++++-- mpi/amd64/func_abi.h | 19 +++++++ mpi/amd64/mpih-add1.S | 2 + mpi/amd64/mpih-lshift.S | 2 + mpi/amd64/mpih-mul1.S | 2 + mpi/amd64/mpih-mul2.S | 2 + mpi/amd64/mpih-mul3.S | 3 +- mpi/amd64/mpih-rshift.S | 2 + mpi/amd64/mpih-sub1.S | 2 + mpi/config.links | 12 ++++- random/rndhw.c | 4 +- random/rndw32.c | 83 ++++++++++++++++++----------- 36 files changed, 707 insertions(+), 149 deletions(-) create mode 100644 mpi/amd64/func_abi.h hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Sun May 3 16:40:10 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sun, 03 May 2015 17:40:10 +0300 Subject: [PATCH] Fix WIN64 assembly glue for AES Message-ID: <20150503144010.680.15679.stgit@localhost6.localdomain6> * cipher/rinjdael.c (do_encrypt, do_decrypt) [!HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Change input operands to input+output to mark volatile nature of the used registers. -- Function arguments cannot be passed to assembly block as input operands as target function modifies those input registers. Signed-off-by: Jussi Kivilinna --- cipher/rijndael.c | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/cipher/rijndael.c b/cipher/rijndael.c index 7ebf329..4f063c4 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -671,17 +671,19 @@ do_encrypt (const RIJNDAEL_context *ctx, # else /* Call SystemV ABI function without storing non-volatile XMM registers, * as target function does not use vector instruction sets. */ + const void *key = ctx->keyschenc; + uintptr_t rounds = ctx->rounds; uintptr_t ret; - asm ("movq %[encT], %%r8\n\t" - "callq *%[ret]\n\t" - : [ret] "=a" (ret) - : "0" (_gcry_aes_amd64_encrypt_block), - "D" (ctx->keyschenc), - "S" (bx), - "d" (ax), - "c" (ctx->rounds), - [encT] "r" (encT) - : "cc", "memory", "r8", "r9", "r10", "r11"); + asm volatile ("movq %[encT], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret), + "+D" (key), + "+S" (bx), + "+d" (ax), + "+c" (rounds) + : "0" (_gcry_aes_amd64_encrypt_block), + [encT] "g" (encT) + : "cc", "memory", "r8", "r9", "r10", "r11"); return ret; # endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) @@ -1031,17 +1033,19 @@ do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx, # else /* Call SystemV ABI function without storing non-volatile XMM registers, * as target function does not use vector instruction sets. */ + const void *key = ctx->keyschdec; + uintptr_t rounds = ctx->rounds; uintptr_t ret; - asm ("movq %[dectabs], %%r8\n\t" - "callq *%[ret]\n\t" - : [ret] "=a" (ret) - : "0" (_gcry_aes_amd64_decrypt_block), - "D" (ctx->keyschdec), - "S" (bx), - "d" (ax), - "c" (ctx->rounds), - [dectabs] "r" (&dec_tables) - : "cc", "memory", "r8", "r9", "r10", "r11"); + asm volatile ("movq %[dectabs], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret), + "+D" (key), + "+S" (bx), + "+d" (ax), + "+c" (rounds) + : "0" (_gcry_aes_amd64_decrypt_block), + [dectabs] "g" (&dec_tables) + : "cc", "memory", "r8", "r9", "r10", "r11"); return ret; # endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) From cvs at cvs.gnupg.org Sun May 3 20:54:16 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sun, 03 May 2015 20:54:16 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-210-g24a769a Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 24a769a7c7601dbb85332e550f6fbd121b56df5f (commit) via 2f4fefdbc62857b6e2da26ce111ee140a068c471 (commit) via 06e122baa3321483a47bbf82fd2a4540becfa0c9 (commit) from 66129b3334a5aa54ff8a97981507e4704f759571 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 24a769a7c7601dbb85332e550f6fbd121b56df5f Author: Jussi Kivilinna Date: Sun May 3 17:16:08 2015 +0300 Fix WIN64 assembly glue for AES * cipher/rinjdael.c (do_encrypt, do_decrypt) [!HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Change input operands to input+output to mark volatile nature of the used registers. -- Function arguments cannot be passed to assembly block as input operands as target function modifies those input registers. Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael.c b/cipher/rijndael.c index 7ebf329..4f063c4 100644 --- a/cipher/rijndael.c +++ b/cipher/rijndael.c @@ -671,17 +671,19 @@ do_encrypt (const RIJNDAEL_context *ctx, # else /* Call SystemV ABI function without storing non-volatile XMM registers, * as target function does not use vector instruction sets. */ + const void *key = ctx->keyschenc; + uintptr_t rounds = ctx->rounds; uintptr_t ret; - asm ("movq %[encT], %%r8\n\t" - "callq *%[ret]\n\t" - : [ret] "=a" (ret) - : "0" (_gcry_aes_amd64_encrypt_block), - "D" (ctx->keyschenc), - "S" (bx), - "d" (ax), - "c" (ctx->rounds), - [encT] "r" (encT) - : "cc", "memory", "r8", "r9", "r10", "r11"); + asm volatile ("movq %[encT], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret), + "+D" (key), + "+S" (bx), + "+d" (ax), + "+c" (rounds) + : "0" (_gcry_aes_amd64_encrypt_block), + [encT] "g" (encT) + : "cc", "memory", "r8", "r9", "r10", "r11"); return ret; # endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) @@ -1031,17 +1033,19 @@ do_decrypt (const RIJNDAEL_context *ctx, unsigned char *bx, # else /* Call SystemV ABI function without storing non-volatile XMM registers, * as target function does not use vector instruction sets. */ + const void *key = ctx->keyschdec; + uintptr_t rounds = ctx->rounds; uintptr_t ret; - asm ("movq %[dectabs], %%r8\n\t" - "callq *%[ret]\n\t" - : [ret] "=a" (ret) - : "0" (_gcry_aes_amd64_decrypt_block), - "D" (ctx->keyschdec), - "S" (bx), - "d" (ax), - "c" (ctx->rounds), - [dectabs] "r" (&dec_tables) - : "cc", "memory", "r8", "r9", "r10", "r11"); + asm volatile ("movq %[dectabs], %%r8\n\t" + "callq *%[ret]\n\t" + : [ret] "=a" (ret), + "+D" (key), + "+S" (bx), + "+d" (ax), + "+c" (rounds) + : "0" (_gcry_aes_amd64_decrypt_block), + [dectabs] "g" (&dec_tables) + : "cc", "memory", "r8", "r9", "r10", "r11"); return ret; # endif /* HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS */ #elif defined(USE_ARM_ASM) commit 2f4fefdbc62857b6e2da26ce111ee140a068c471 Author: Jussi Kivilinna Date: Sun May 3 01:24:50 2015 +0300 Add '1 million a characters' test vectors * tests/basic.c (check_digests): Add "!" test vectors for MD5, SHA-384, SHA-512, RIPEMD160 and CRC32. -- Signed-off-by: Jussi Kivilinna diff --git a/tests/basic.c b/tests/basic.c index bb07394..2c664c0 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5391,6 +5391,8 @@ check_digests (void) "TY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser Gene" "ral Public License for more details.", "\xc4\x1a\x5c\x0b\x44\x5f\xba\x1a\xda\xbc\xc0\x38\x0e\x0c\x9e\x33" }, + { GCRY_MD_MD5, "!", + "\x77\x07\xd6\xae\x4e\x02\x7c\x70\xee\xa2\xa9\x35\xc2\x29\x6f\x21" }, { GCRY_MD_SHA1, "abc", "\xA9\x99\x3E\x36\x47\x06\x81\x6A\xBA\x3E" "\x25\x71\x78\x50\xC2\x6C\x9C\xD0\xD8\x9D" }, @@ -5471,6 +5473,10 @@ check_digests (void) "\xe4\x6d\xb4\x28\x33\x77\x99\x49\x94\x0f\xcf\x87\xc2\x2f\x30\xd6" "\x06\x24\x82\x9d\x80\x64\x8a\x07\xa1\x20\x8f\x5f\xf3\x85\xb3\xaa" "\x39\xb8\x61\x00\xfc\x7f\x18\xc6\x82\x23\x4b\x45\xfa\xf1\xbc\x69" }, + { GCRY_MD_SHA384, "!", + "\x9d\x0e\x18\x09\x71\x64\x74\xcb\x08\x6e\x83\x4e\x31\x0a\x4a\x1c" + "\xed\x14\x9e\x9c\x00\xf2\x48\x52\x79\x72\xce\xc5\x70\x4c\x2a\x5b" + "\x07\xb8\xb3\xdc\x38\xec\xc4\xeb\xae\x97\xdd\xd8\x7f\x3d\x89\x85" }, { GCRY_MD_SHA512, "abc", "\xDD\xAF\x35\xA1\x93\x61\x7A\xBA\xCC\x41\x73\x49\xAE\x20\x41\x31" "\x12\xE6\xFA\x4E\x89\xA9\x7E\xA2\x0A\x9E\xEE\xE6\x4B\x55\xD3\x9A" @@ -5489,6 +5495,11 @@ check_digests (void) "\xdd\xec\x62\x0f\xf7\x1a\x1e\x10\x32\x05\x02\xa6\xb0\x1f\x70\x37" "\xbc\xd7\x15\xed\x71\x6c\x78\x20\xc8\x54\x87\xd0\x66\x6a\x17\x83" "\x05\x61\x92\xbe\xcc\x8f\x3b\xbf\x11\x72\x22\x69\x23\x5b\x48\x5c" }, + { GCRY_MD_SHA512, "!", + "\xe7\x18\x48\x3d\x0c\xe7\x69\x64\x4e\x2e\x42\xc7\xbc\x15\xb4\x63" + "\x8e\x1f\x98\xb1\x3b\x20\x44\x28\x56\x32\xa8\x03\xaf\xa9\x73\xeb" + "\xde\x0f\xf2\x44\x87\x7e\xa6\x0a\x4c\xb0\x43\x2c\xe5\x77\xc3\x1b" + "\xeb\x00\x9c\x5c\x2c\x49\xaa\x2e\x4e\xad\xb2\x17\xad\x8c\xc0\x9b" }, { GCRY_MD_RMD160, "", "\x9c\x11\x85\xa5\xc5\xe9\xfc\x54\x61\x28" "\x08\x97\x7e\xe8\xf5\x48\xb2\x25\x8d\x31" }, @@ -5512,6 +5523,9 @@ check_digests (void) "ral Public License for more details.", "\x06\x6d\x3c\x4e\xc9\xba\x89\x75\x16\x90\x96\x4e\xfd\x43\x07\xde" "\x04\xca\x69\x6b" }, + { GCRY_MD_RMD160, "!", + "\x52\x78\x32\x43\xc1\x69\x7b\xdb\xe1\x6d\x37\xf9\x7f\x68\xf0\x83" + "\x25\xdc\x15\x28" }, { GCRY_MD_CRC32, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32, "foo", "\x8c\x73\x65\x21" }, { GCRY_MD_CRC32, @@ -5525,6 +5539,7 @@ check_digests (void) "ral Public License for more details.", "\x4A\x53\x7D\x67" }, { GCRY_MD_CRC32, "123456789", "\xcb\xf4\x39\x26" }, + { GCRY_MD_CRC32, "!", "\xdc\x25\xbf\xbc" }, { GCRY_MD_CRC32_RFC1510, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32_RFC1510, "foo", "\x73\x32\xbc\x33" }, { GCRY_MD_CRC32_RFC1510, "test0123456789", "\xb8\x3e\x88\xd6" }, commit 06e122baa3321483a47bbf82fd2a4540becfa0c9 Author: Jussi Kivilinna Date: Sun May 3 00:34:34 2015 +0300 More optimized CRC implementations * cipher/crc.c (crc32_table, crc24_table): Replace with new table contents. (update_crc32, CRC24_INIT, CRC24_POLY): Remove. (crc32_next, crc32_next4, crc24_init, crc24_next, crc24_next4) (crc24_final): New. (crc24rfc2440_init): Use crc24_init. (crc32_write): Rewrite to use crc32_next & crc32_next4. (crc24_write): Rewrite to use crc24_next & crc24_next4. (crc32_final, crc32rfc1510_final): Use buf_put_be32. (crc24rfc2440_final): Use crc24_final & buf_put_le32. * tests/basic.c (check_digests): Add CRC "123456789" tests. -- Patch adds more optimized CRC implementations generated with universal_crc tool by Danjel McGougan: http://www.mcgougan.se/universal_crc/ Benchmark on Intel Haswell (no-turbo, 3200 Mhz): Before: CRC32 | 2.52 ns/B 378.3 MiB/s 8.07 c/B CRC32RFC1510 | 2.52 ns/B 378.1 MiB/s 8.07 c/B CRC24RFC2440 | 46.62 ns/B 20.46 MiB/s 149.2 c/B After: CRC32 | 0.918 ns/B 1039.3 MiB/s 2.94 c/B CRC32RFC1510 | 0.918 ns/B 1039.0 MiB/s 2.94 c/B CRC24RFC2440 | 0.918 ns/B 1039.4 MiB/s 2.94 c/B Signed-off-by: Jussi Kivilinna diff --git a/cipher/crc.c b/cipher/crc.c index 1322f0d..9105dfe 100644 --- a/cipher/crc.c +++ b/cipher/crc.c @@ -28,125 +28,311 @@ #include "cipher.h" #include "bithelp.h" +#include "bufhelp.h" + + +typedef struct +{ + u32 CRC; + byte buf[4]; +} +CRC_CONTEXT; -/* Table of CRCs of all 8-bit messages. Generated by running code - from RFC 1952 modified to print out the table. */ -static u32 crc32_table[256] = { - 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, - 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, - 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, - 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, - 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, - 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, - 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, - 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, - 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, - 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, - 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, - 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, - 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, - 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, - 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, - 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, - 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, - 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, - 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, - 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, - 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, - 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, - 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, - 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, - 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, - 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, - 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, - 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, - 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, - 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, - 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, - 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, - 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, - 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, - 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, - 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, - 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, - 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, - 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, - 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, - 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, - 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, - 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d -}; /* - * The following function was extracted from RFC 1952 by Simon - * Josefsson, for the Shishi project, and modified to be compatible - * with the modified CRC-32 used by RFC 1510, and subsequently - * modified for GNU Libgcrypt to allow it to be used for calculating - * both unmodified CRC-32 and modified CRC-32 values. Original - * copyright and notice from the document follows: + * Code generated by universal_crc by Danjel McGougan * - * Copyright (c) 1996 L. Peter Deutsch - * - * Permission is granted to copy and distribute this document for - * any purpose and without charge, including translations into - * other languages and incorporation into compilations, provided - * that the copyright notice and this notice are preserved, and - * that any substantive changes or deletions from the original are - * clearly marked. - * - * The copyright on RFCs, and consequently the function below, are - * supposedly also retroactively claimed by the Internet Society - * (according to rfc-editor at rfc-editor.org), with the following - * copyright notice: - * - * Copyright (C) The Internet Society. All Rights Reserved. - * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright - * notice and this paragraph are included on all such copies and - * derivative works. However, this document itself may not be - * modified in any way, such as by removing the copyright notice or - * references to the Internet Society or other Internet - * organizations, except as needed for the purpose of developing - * Internet standards in which case the procedures for copyrights - * defined in the Internet Standards process must be followed, or - * as required to translate it into languages other than English. - * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided - * on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A - * PARTICULAR PURPOSE. + * CRC parameters used: + * bits: 32 + * poly: 0x04c11db7 + * init: 0xffffffff + * xor: 0xffffffff + * reverse: true + * non-direct: false * + * CRC of the string "123456789" is 0xcbf43926 */ -static u32 -update_crc32 (u32 crc, const void *buf_arg, size_t len) -{ - const char *buf = buf_arg; - size_t n; - for (n = 0; n < len; n++) - crc = crc32_table[(crc ^ buf[n]) & 0xff] ^ (crc >> 8); +static const u32 crc32_table[1024] = { + 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, + 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, + 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, + 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, + 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, + 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, + 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, + 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, + 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, + 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, + 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, + 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, + 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, + 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, + 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, + 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, + 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, + 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, + 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, + 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, + 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, + 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, + 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, + 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, + 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, + 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, + 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, + 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, + 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, + 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, + 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, + 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, + 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, + 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, + 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, + 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, + 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, + 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, + 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, + 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, + 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, + 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, + 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, + 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, + 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, + 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, + 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, + 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, + 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, + 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, + 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, + 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, + 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, + 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, + 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, + 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, + 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, + 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, + 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, + 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, + 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, + 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, + 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, + 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d, + 0x00000000, 0x191b3141, 0x32366282, 0x2b2d53c3, + 0x646cc504, 0x7d77f445, 0x565aa786, 0x4f4196c7, + 0xc8d98a08, 0xd1c2bb49, 0xfaefe88a, 0xe3f4d9cb, + 0xacb54f0c, 0xb5ae7e4d, 0x9e832d8e, 0x87981ccf, + 0x4ac21251, 0x53d92310, 0x78f470d3, 0x61ef4192, + 0x2eaed755, 0x37b5e614, 0x1c98b5d7, 0x05838496, + 0x821b9859, 0x9b00a918, 0xb02dfadb, 0xa936cb9a, + 0xe6775d5d, 0xff6c6c1c, 0xd4413fdf, 0xcd5a0e9e, + 0x958424a2, 0x8c9f15e3, 0xa7b24620, 0xbea97761, + 0xf1e8e1a6, 0xe8f3d0e7, 0xc3de8324, 0xdac5b265, + 0x5d5daeaa, 0x44469feb, 0x6f6bcc28, 0x7670fd69, + 0x39316bae, 0x202a5aef, 0x0b07092c, 0x121c386d, + 0xdf4636f3, 0xc65d07b2, 0xed705471, 0xf46b6530, + 0xbb2af3f7, 0xa231c2b6, 0x891c9175, 0x9007a034, + 0x179fbcfb, 0x0e848dba, 0x25a9de79, 0x3cb2ef38, + 0x73f379ff, 0x6ae848be, 0x41c51b7d, 0x58de2a3c, + 0xf0794f05, 0xe9627e44, 0xc24f2d87, 0xdb541cc6, + 0x94158a01, 0x8d0ebb40, 0xa623e883, 0xbf38d9c2, + 0x38a0c50d, 0x21bbf44c, 0x0a96a78f, 0x138d96ce, + 0x5ccc0009, 0x45d73148, 0x6efa628b, 0x77e153ca, + 0xbabb5d54, 0xa3a06c15, 0x888d3fd6, 0x91960e97, + 0xded79850, 0xc7cca911, 0xece1fad2, 0xf5facb93, + 0x7262d75c, 0x6b79e61d, 0x4054b5de, 0x594f849f, + 0x160e1258, 0x0f152319, 0x243870da, 0x3d23419b, + 0x65fd6ba7, 0x7ce65ae6, 0x57cb0925, 0x4ed03864, + 0x0191aea3, 0x188a9fe2, 0x33a7cc21, 0x2abcfd60, + 0xad24e1af, 0xb43fd0ee, 0x9f12832d, 0x8609b26c, + 0xc94824ab, 0xd05315ea, 0xfb7e4629, 0xe2657768, + 0x2f3f79f6, 0x362448b7, 0x1d091b74, 0x04122a35, + 0x4b53bcf2, 0x52488db3, 0x7965de70, 0x607eef31, + 0xe7e6f3fe, 0xfefdc2bf, 0xd5d0917c, 0xcccba03d, + 0x838a36fa, 0x9a9107bb, 0xb1bc5478, 0xa8a76539, + 0x3b83984b, 0x2298a90a, 0x09b5fac9, 0x10aecb88, + 0x5fef5d4f, 0x46f46c0e, 0x6dd93fcd, 0x74c20e8c, + 0xf35a1243, 0xea412302, 0xc16c70c1, 0xd8774180, + 0x9736d747, 0x8e2de606, 0xa500b5c5, 0xbc1b8484, + 0x71418a1a, 0x685abb5b, 0x4377e898, 0x5a6cd9d9, + 0x152d4f1e, 0x0c367e5f, 0x271b2d9c, 0x3e001cdd, + 0xb9980012, 0xa0833153, 0x8bae6290, 0x92b553d1, + 0xddf4c516, 0xc4eff457, 0xefc2a794, 0xf6d996d5, + 0xae07bce9, 0xb71c8da8, 0x9c31de6b, 0x852aef2a, + 0xca6b79ed, 0xd37048ac, 0xf85d1b6f, 0xe1462a2e, + 0x66de36e1, 0x7fc507a0, 0x54e85463, 0x4df36522, + 0x02b2f3e5, 0x1ba9c2a4, 0x30849167, 0x299fa026, + 0xe4c5aeb8, 0xfdde9ff9, 0xd6f3cc3a, 0xcfe8fd7b, + 0x80a96bbc, 0x99b25afd, 0xb29f093e, 0xab84387f, + 0x2c1c24b0, 0x350715f1, 0x1e2a4632, 0x07317773, + 0x4870e1b4, 0x516bd0f5, 0x7a468336, 0x635db277, + 0xcbfad74e, 0xd2e1e60f, 0xf9ccb5cc, 0xe0d7848d, + 0xaf96124a, 0xb68d230b, 0x9da070c8, 0x84bb4189, + 0x03235d46, 0x1a386c07, 0x31153fc4, 0x280e0e85, + 0x674f9842, 0x7e54a903, 0x5579fac0, 0x4c62cb81, + 0x8138c51f, 0x9823f45e, 0xb30ea79d, 0xaa1596dc, + 0xe554001b, 0xfc4f315a, 0xd7626299, 0xce7953d8, + 0x49e14f17, 0x50fa7e56, 0x7bd72d95, 0x62cc1cd4, + 0x2d8d8a13, 0x3496bb52, 0x1fbbe891, 0x06a0d9d0, + 0x5e7ef3ec, 0x4765c2ad, 0x6c48916e, 0x7553a02f, + 0x3a1236e8, 0x230907a9, 0x0824546a, 0x113f652b, + 0x96a779e4, 0x8fbc48a5, 0xa4911b66, 0xbd8a2a27, + 0xf2cbbce0, 0xebd08da1, 0xc0fdde62, 0xd9e6ef23, + 0x14bce1bd, 0x0da7d0fc, 0x268a833f, 0x3f91b27e, + 0x70d024b9, 0x69cb15f8, 0x42e6463b, 0x5bfd777a, + 0xdc656bb5, 0xc57e5af4, 0xee530937, 0xf7483876, + 0xb809aeb1, 0xa1129ff0, 0x8a3fcc33, 0x9324fd72, + 0x00000000, 0x01c26a37, 0x0384d46e, 0x0246be59, + 0x0709a8dc, 0x06cbc2eb, 0x048d7cb2, 0x054f1685, + 0x0e1351b8, 0x0fd13b8f, 0x0d9785d6, 0x0c55efe1, + 0x091af964, 0x08d89353, 0x0a9e2d0a, 0x0b5c473d, + 0x1c26a370, 0x1de4c947, 0x1fa2771e, 0x1e601d29, + 0x1b2f0bac, 0x1aed619b, 0x18abdfc2, 0x1969b5f5, + 0x1235f2c8, 0x13f798ff, 0x11b126a6, 0x10734c91, + 0x153c5a14, 0x14fe3023, 0x16b88e7a, 0x177ae44d, + 0x384d46e0, 0x398f2cd7, 0x3bc9928e, 0x3a0bf8b9, + 0x3f44ee3c, 0x3e86840b, 0x3cc03a52, 0x3d025065, + 0x365e1758, 0x379c7d6f, 0x35dac336, 0x3418a901, + 0x3157bf84, 0x3095d5b3, 0x32d36bea, 0x331101dd, + 0x246be590, 0x25a98fa7, 0x27ef31fe, 0x262d5bc9, + 0x23624d4c, 0x22a0277b, 0x20e69922, 0x2124f315, + 0x2a78b428, 0x2bbade1f, 0x29fc6046, 0x283e0a71, + 0x2d711cf4, 0x2cb376c3, 0x2ef5c89a, 0x2f37a2ad, + 0x709a8dc0, 0x7158e7f7, 0x731e59ae, 0x72dc3399, + 0x7793251c, 0x76514f2b, 0x7417f172, 0x75d59b45, + 0x7e89dc78, 0x7f4bb64f, 0x7d0d0816, 0x7ccf6221, + 0x798074a4, 0x78421e93, 0x7a04a0ca, 0x7bc6cafd, + 0x6cbc2eb0, 0x6d7e4487, 0x6f38fade, 0x6efa90e9, + 0x6bb5866c, 0x6a77ec5b, 0x68315202, 0x69f33835, + 0x62af7f08, 0x636d153f, 0x612bab66, 0x60e9c151, + 0x65a6d7d4, 0x6464bde3, 0x662203ba, 0x67e0698d, + 0x48d7cb20, 0x4915a117, 0x4b531f4e, 0x4a917579, + 0x4fde63fc, 0x4e1c09cb, 0x4c5ab792, 0x4d98dda5, + 0x46c49a98, 0x4706f0af, 0x45404ef6, 0x448224c1, + 0x41cd3244, 0x400f5873, 0x4249e62a, 0x438b8c1d, + 0x54f16850, 0x55330267, 0x5775bc3e, 0x56b7d609, + 0x53f8c08c, 0x523aaabb, 0x507c14e2, 0x51be7ed5, + 0x5ae239e8, 0x5b2053df, 0x5966ed86, 0x58a487b1, + 0x5deb9134, 0x5c29fb03, 0x5e6f455a, 0x5fad2f6d, + 0xe1351b80, 0xe0f771b7, 0xe2b1cfee, 0xe373a5d9, + 0xe63cb35c, 0xe7fed96b, 0xe5b86732, 0xe47a0d05, + 0xef264a38, 0xeee4200f, 0xeca29e56, 0xed60f461, + 0xe82fe2e4, 0xe9ed88d3, 0xebab368a, 0xea695cbd, + 0xfd13b8f0, 0xfcd1d2c7, 0xfe976c9e, 0xff5506a9, + 0xfa1a102c, 0xfbd87a1b, 0xf99ec442, 0xf85cae75, + 0xf300e948, 0xf2c2837f, 0xf0843d26, 0xf1465711, + 0xf4094194, 0xf5cb2ba3, 0xf78d95fa, 0xf64fffcd, + 0xd9785d60, 0xd8ba3757, 0xdafc890e, 0xdb3ee339, + 0xde71f5bc, 0xdfb39f8b, 0xddf521d2, 0xdc374be5, + 0xd76b0cd8, 0xd6a966ef, 0xd4efd8b6, 0xd52db281, + 0xd062a404, 0xd1a0ce33, 0xd3e6706a, 0xd2241a5d, + 0xc55efe10, 0xc49c9427, 0xc6da2a7e, 0xc7184049, + 0xc25756cc, 0xc3953cfb, 0xc1d382a2, 0xc011e895, + 0xcb4dafa8, 0xca8fc59f, 0xc8c97bc6, 0xc90b11f1, + 0xcc440774, 0xcd866d43, 0xcfc0d31a, 0xce02b92d, + 0x91af9640, 0x906dfc77, 0x922b422e, 0x93e92819, + 0x96a63e9c, 0x976454ab, 0x9522eaf2, 0x94e080c5, + 0x9fbcc7f8, 0x9e7eadcf, 0x9c381396, 0x9dfa79a1, + 0x98b56f24, 0x99770513, 0x9b31bb4a, 0x9af3d17d, + 0x8d893530, 0x8c4b5f07, 0x8e0de15e, 0x8fcf8b69, + 0x8a809dec, 0x8b42f7db, 0x89044982, 0x88c623b5, + 0x839a6488, 0x82580ebf, 0x801eb0e6, 0x81dcdad1, + 0x8493cc54, 0x8551a663, 0x8717183a, 0x86d5720d, + 0xa9e2d0a0, 0xa820ba97, 0xaa6604ce, 0xaba46ef9, + 0xaeeb787c, 0xaf29124b, 0xad6fac12, 0xacadc625, + 0xa7f18118, 0xa633eb2f, 0xa4755576, 0xa5b73f41, + 0xa0f829c4, 0xa13a43f3, 0xa37cfdaa, 0xa2be979d, + 0xb5c473d0, 0xb40619e7, 0xb640a7be, 0xb782cd89, + 0xb2cddb0c, 0xb30fb13b, 0xb1490f62, 0xb08b6555, + 0xbbd72268, 0xba15485f, 0xb853f606, 0xb9919c31, + 0xbcde8ab4, 0xbd1ce083, 0xbf5a5eda, 0xbe9834ed, + 0x00000000, 0xb8bc6765, 0xaa09c88b, 0x12b5afee, + 0x8f629757, 0x37def032, 0x256b5fdc, 0x9dd738b9, + 0xc5b428ef, 0x7d084f8a, 0x6fbde064, 0xd7018701, + 0x4ad6bfb8, 0xf26ad8dd, 0xe0df7733, 0x58631056, + 0x5019579f, 0xe8a530fa, 0xfa109f14, 0x42acf871, + 0xdf7bc0c8, 0x67c7a7ad, 0x75720843, 0xcdce6f26, + 0x95ad7f70, 0x2d111815, 0x3fa4b7fb, 0x8718d09e, + 0x1acfe827, 0xa2738f42, 0xb0c620ac, 0x087a47c9, + 0xa032af3e, 0x188ec85b, 0x0a3b67b5, 0xb28700d0, + 0x2f503869, 0x97ec5f0c, 0x8559f0e2, 0x3de59787, + 0x658687d1, 0xdd3ae0b4, 0xcf8f4f5a, 0x7733283f, + 0xeae41086, 0x525877e3, 0x40edd80d, 0xf851bf68, + 0xf02bf8a1, 0x48979fc4, 0x5a22302a, 0xe29e574f, + 0x7f496ff6, 0xc7f50893, 0xd540a77d, 0x6dfcc018, + 0x359fd04e, 0x8d23b72b, 0x9f9618c5, 0x272a7fa0, + 0xbafd4719, 0x0241207c, 0x10f48f92, 0xa848e8f7, + 0x9b14583d, 0x23a83f58, 0x311d90b6, 0x89a1f7d3, + 0x1476cf6a, 0xaccaa80f, 0xbe7f07e1, 0x06c36084, + 0x5ea070d2, 0xe61c17b7, 0xf4a9b859, 0x4c15df3c, + 0xd1c2e785, 0x697e80e0, 0x7bcb2f0e, 0xc377486b, + 0xcb0d0fa2, 0x73b168c7, 0x6104c729, 0xd9b8a04c, + 0x446f98f5, 0xfcd3ff90, 0xee66507e, 0x56da371b, + 0x0eb9274d, 0xb6054028, 0xa4b0efc6, 0x1c0c88a3, + 0x81dbb01a, 0x3967d77f, 0x2bd27891, 0x936e1ff4, + 0x3b26f703, 0x839a9066, 0x912f3f88, 0x299358ed, + 0xb4446054, 0x0cf80731, 0x1e4da8df, 0xa6f1cfba, + 0xfe92dfec, 0x462eb889, 0x549b1767, 0xec277002, + 0x71f048bb, 0xc94c2fde, 0xdbf98030, 0x6345e755, + 0x6b3fa09c, 0xd383c7f9, 0xc1366817, 0x798a0f72, + 0xe45d37cb, 0x5ce150ae, 0x4e54ff40, 0xf6e89825, + 0xae8b8873, 0x1637ef16, 0x048240f8, 0xbc3e279d, + 0x21e91f24, 0x99557841, 0x8be0d7af, 0x335cb0ca, + 0xed59b63b, 0x55e5d15e, 0x47507eb0, 0xffec19d5, + 0x623b216c, 0xda874609, 0xc832e9e7, 0x708e8e82, + 0x28ed9ed4, 0x9051f9b1, 0x82e4565f, 0x3a58313a, + 0xa78f0983, 0x1f336ee6, 0x0d86c108, 0xb53aa66d, + 0xbd40e1a4, 0x05fc86c1, 0x1749292f, 0xaff54e4a, + 0x322276f3, 0x8a9e1196, 0x982bbe78, 0x2097d91d, + 0x78f4c94b, 0xc048ae2e, 0xd2fd01c0, 0x6a4166a5, + 0xf7965e1c, 0x4f2a3979, 0x5d9f9697, 0xe523f1f2, + 0x4d6b1905, 0xf5d77e60, 0xe762d18e, 0x5fdeb6eb, + 0xc2098e52, 0x7ab5e937, 0x680046d9, 0xd0bc21bc, + 0x88df31ea, 0x3063568f, 0x22d6f961, 0x9a6a9e04, + 0x07bda6bd, 0xbf01c1d8, 0xadb46e36, 0x15080953, + 0x1d724e9a, 0xa5ce29ff, 0xb77b8611, 0x0fc7e174, + 0x9210d9cd, 0x2aacbea8, 0x38191146, 0x80a57623, + 0xd8c66675, 0x607a0110, 0x72cfaefe, 0xca73c99b, + 0x57a4f122, 0xef189647, 0xfdad39a9, 0x45115ecc, + 0x764dee06, 0xcef18963, 0xdc44268d, 0x64f841e8, + 0xf92f7951, 0x41931e34, 0x5326b1da, 0xeb9ad6bf, + 0xb3f9c6e9, 0x0b45a18c, 0x19f00e62, 0xa14c6907, + 0x3c9b51be, 0x842736db, 0x96929935, 0x2e2efe50, + 0x2654b999, 0x9ee8defc, 0x8c5d7112, 0x34e11677, + 0xa9362ece, 0x118a49ab, 0x033fe645, 0xbb838120, + 0xe3e09176, 0x5b5cf613, 0x49e959fd, 0xf1553e98, + 0x6c820621, 0xd43e6144, 0xc68bceaa, 0x7e37a9cf, + 0xd67f4138, 0x6ec3265d, 0x7c7689b3, 0xc4caeed6, + 0x591dd66f, 0xe1a1b10a, 0xf3141ee4, 0x4ba87981, + 0x13cb69d7, 0xab770eb2, 0xb9c2a15c, 0x017ec639, + 0x9ca9fe80, 0x241599e5, 0x36a0360b, 0x8e1c516e, + 0x866616a7, 0x3eda71c2, 0x2c6fde2c, 0x94d3b949, + 0x090481f0, 0xb1b8e695, 0xa30d497b, 0x1bb12e1e, + 0x43d23e48, 0xfb6e592d, 0xe9dbf6c3, 0x516791a6, + 0xccb0a91f, 0x740cce7a, 0x66b96194, 0xde0506f1 +}; - return crc; -} +/* CRC32 */ -typedef struct +static inline u32 +crc32_next (u32 crc, byte data) { - u32 CRC; - byte buf[4]; + return (crc >> 8) ^ crc32_table[(crc & 0xff) ^ data]; } -CRC_CONTEXT; -/* CRC32 */ +/* + * Process 4 bytes in one go + */ +static inline u32 +crc32_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc32_table[(crc & 0xff) + 0x300] ^ + crc32_table[((crc >> 8) & 0xff) + 0x200] ^ + crc32_table[((crc >> 16) & 0xff) + 0x100] ^ + crc32_table[(crc >> 24) & 0xff]; + return crc; +} static void crc32_init (void *context, unsigned int flags) @@ -159,12 +345,40 @@ crc32_init (void *context, unsigned int flags) } static void -crc32_write (void *context, const void *inbuf, size_t inlen) +crc32_write (void *context, const void *inbuf_arg, size_t inlen) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - if (!inbuf) + const byte *inbuf = inbuf_arg; + u32 crc; + + if (!inbuf || !inlen) return; - ctx->CRC = update_crc32 (ctx->CRC, inbuf, inlen); + + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc32_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc32_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc32_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc32_next(crc, *inbuf++); + } + + ctx->CRC = crc; } static byte * @@ -179,13 +393,12 @@ crc32_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; ctx->CRC ^= 0xffffffffL; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32 (ctx->buf, ctx->CRC); } /* CRC32 a'la RFC 1510 */ +/* CRC of the string "123456789" is 0x2dfd2d88 */ + static void crc32rfc1510_init (void *context, unsigned int flags) { @@ -200,47 +413,315 @@ static void crc32rfc1510_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 24) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[2] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[3] = (ctx->CRC ) & 0xFF; + buf_put_be32(ctx->buf, ctx->CRC); } /* CRC24 a'la RFC 2440 */ /* - * The following CRC 24 routines are adapted from RFC 2440, which has - * the following copyright notice: - * - * Copyright (C) The Internet Society (1998). All Rights Reserved. + * Code generated by universal_crc by Danjel McGougan * - * This document and translations of it may be copied and furnished - * to others, and derivative works that comment on or otherwise - * explain it or assist in its implementation may be prepared, - * copied, published and distributed, in whole or in part, without - * restriction of any kind, provided that the above copyright notice - * and this paragraph are included on all such copies and derivative - * works. However, this document itself may not be modified in any - * way, such as by removing the copyright notice or references to - * the Internet Society or other Internet organizations, except as - * needed for the purpose of developing Internet standards in which - * case the procedures for copyrights defined in the Internet - * Standards process must be followed, or as required to translate - * it into languages other than English. + * CRC parameters used: + * bits: 24 + * poly: 0x864cfb + * init: 0xb704ce + * xor: 0x000000 + * reverse: false + * non-direct: false * - * The limited permissions granted above are perpetual and will not be - * revoked by the Internet Society or its successors or assigns. - * - * This document and the information contained herein is provided on - * an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET - * ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE - * OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY - * IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR - * PURPOSE. + * CRC of the string "123456789" is 0x21cf02 + */ + +static const u32 crc24_table[1024] = +{ + 0x00000000, 0x00fb4c86, 0x000dd58a, 0x00f6990c, + 0x00e1e693, 0x001aaa15, 0x00ec3319, 0x00177f9f, + 0x003981a1, 0x00c2cd27, 0x0034542b, 0x00cf18ad, + 0x00d86732, 0x00232bb4, 0x00d5b2b8, 0x002efe3e, + 0x00894ec5, 0x00720243, 0x00849b4f, 0x007fd7c9, + 0x0068a856, 0x0093e4d0, 0x00657ddc, 0x009e315a, + 0x00b0cf64, 0x004b83e2, 0x00bd1aee, 0x00465668, + 0x005129f7, 0x00aa6571, 0x005cfc7d, 0x00a7b0fb, + 0x00e9d10c, 0x00129d8a, 0x00e40486, 0x001f4800, + 0x0008379f, 0x00f37b19, 0x0005e215, 0x00feae93, + 0x00d050ad, 0x002b1c2b, 0x00dd8527, 0x0026c9a1, + 0x0031b63e, 0x00cafab8, 0x003c63b4, 0x00c72f32, + 0x00609fc9, 0x009bd34f, 0x006d4a43, 0x009606c5, + 0x0081795a, 0x007a35dc, 0x008cacd0, 0x0077e056, + 0x00591e68, 0x00a252ee, 0x0054cbe2, 0x00af8764, + 0x00b8f8fb, 0x0043b47d, 0x00b52d71, 0x004e61f7, + 0x00d2a319, 0x0029ef9f, 0x00df7693, 0x00243a15, + 0x0033458a, 0x00c8090c, 0x003e9000, 0x00c5dc86, + 0x00eb22b8, 0x00106e3e, 0x00e6f732, 0x001dbbb4, + 0x000ac42b, 0x00f188ad, 0x000711a1, 0x00fc5d27, + 0x005beddc, 0x00a0a15a, 0x00563856, 0x00ad74d0, + 0x00ba0b4f, 0x004147c9, 0x00b7dec5, 0x004c9243, + 0x00626c7d, 0x009920fb, 0x006fb9f7, 0x0094f571, + 0x00838aee, 0x0078c668, 0x008e5f64, 0x007513e2, + 0x003b7215, 0x00c03e93, 0x0036a79f, 0x00cdeb19, + 0x00da9486, 0x0021d800, 0x00d7410c, 0x002c0d8a, + 0x0002f3b4, 0x00f9bf32, 0x000f263e, 0x00f46ab8, + 0x00e31527, 0x001859a1, 0x00eec0ad, 0x00158c2b, + 0x00b23cd0, 0x00497056, 0x00bfe95a, 0x0044a5dc, + 0x0053da43, 0x00a896c5, 0x005e0fc9, 0x00a5434f, + 0x008bbd71, 0x0070f1f7, 0x008668fb, 0x007d247d, + 0x006a5be2, 0x00911764, 0x00678e68, 0x009cc2ee, + 0x00a44733, 0x005f0bb5, 0x00a992b9, 0x0052de3f, + 0x0045a1a0, 0x00beed26, 0x0048742a, 0x00b338ac, + 0x009dc692, 0x00668a14, 0x00901318, 0x006b5f9e, + 0x007c2001, 0x00876c87, 0x0071f58b, 0x008ab90d, + 0x002d09f6, 0x00d64570, 0x0020dc7c, 0x00db90fa, + 0x00ccef65, 0x0037a3e3, 0x00c13aef, 0x003a7669, + 0x00148857, 0x00efc4d1, 0x00195ddd, 0x00e2115b, + 0x00f56ec4, 0x000e2242, 0x00f8bb4e, 0x0003f7c8, + 0x004d963f, 0x00b6dab9, 0x004043b5, 0x00bb0f33, + 0x00ac70ac, 0x00573c2a, 0x00a1a526, 0x005ae9a0, + 0x0074179e, 0x008f5b18, 0x0079c214, 0x00828e92, + 0x0095f10d, 0x006ebd8b, 0x00982487, 0x00636801, + 0x00c4d8fa, 0x003f947c, 0x00c90d70, 0x003241f6, + 0x00253e69, 0x00de72ef, 0x0028ebe3, 0x00d3a765, + 0x00fd595b, 0x000615dd, 0x00f08cd1, 0x000bc057, + 0x001cbfc8, 0x00e7f34e, 0x00116a42, 0x00ea26c4, + 0x0076e42a, 0x008da8ac, 0x007b31a0, 0x00807d26, + 0x009702b9, 0x006c4e3f, 0x009ad733, 0x00619bb5, + 0x004f658b, 0x00b4290d, 0x0042b001, 0x00b9fc87, + 0x00ae8318, 0x0055cf9e, 0x00a35692, 0x00581a14, + 0x00ffaaef, 0x0004e669, 0x00f27f65, 0x000933e3, + 0x001e4c7c, 0x00e500fa, 0x001399f6, 0x00e8d570, + 0x00c62b4e, 0x003d67c8, 0x00cbfec4, 0x0030b242, + 0x0027cddd, 0x00dc815b, 0x002a1857, 0x00d154d1, + 0x009f3526, 0x006479a0, 0x0092e0ac, 0x0069ac2a, + 0x007ed3b5, 0x00859f33, 0x0073063f, 0x00884ab9, + 0x00a6b487, 0x005df801, 0x00ab610d, 0x00502d8b, + 0x00475214, 0x00bc1e92, 0x004a879e, 0x00b1cb18, + 0x00167be3, 0x00ed3765, 0x001bae69, 0x00e0e2ef, + 0x00f79d70, 0x000cd1f6, 0x00fa48fa, 0x0001047c, + 0x002ffa42, 0x00d4b6c4, 0x00222fc8, 0x00d9634e, + 0x00ce1cd1, 0x00355057, 0x00c3c95b, 0x003885dd, + 0x00000000, 0x00488f66, 0x00901ecd, 0x00d891ab, + 0x00db711c, 0x0093fe7a, 0x004b6fd1, 0x0003e0b7, + 0x00b6e338, 0x00fe6c5e, 0x0026fdf5, 0x006e7293, + 0x006d9224, 0x00251d42, 0x00fd8ce9, 0x00b5038f, + 0x006cc771, 0x00244817, 0x00fcd9bc, 0x00b456da, + 0x00b7b66d, 0x00ff390b, 0x0027a8a0, 0x006f27c6, + 0x00da2449, 0x0092ab2f, 0x004a3a84, 0x0002b5e2, + 0x00015555, 0x0049da33, 0x00914b98, 0x00d9c4fe, + 0x00d88ee3, 0x00900185, 0x0048902e, 0x00001f48, + 0x0003ffff, 0x004b7099, 0x0093e132, 0x00db6e54, + 0x006e6ddb, 0x0026e2bd, 0x00fe7316, 0x00b6fc70, + 0x00b51cc7, 0x00fd93a1, 0x0025020a, 0x006d8d6c, + 0x00b44992, 0x00fcc6f4, 0x0024575f, 0x006cd839, + 0x006f388e, 0x0027b7e8, 0x00ff2643, 0x00b7a925, + 0x0002aaaa, 0x004a25cc, 0x0092b467, 0x00da3b01, + 0x00d9dbb6, 0x009154d0, 0x0049c57b, 0x00014a1d, + 0x004b5141, 0x0003de27, 0x00db4f8c, 0x0093c0ea, + 0x0090205d, 0x00d8af3b, 0x00003e90, 0x0048b1f6, + 0x00fdb279, 0x00b53d1f, 0x006dacb4, 0x002523d2, + 0x0026c365, 0x006e4c03, 0x00b6dda8, 0x00fe52ce, + 0x00279630, 0x006f1956, 0x00b788fd, 0x00ff079b, + 0x00fce72c, 0x00b4684a, 0x006cf9e1, 0x00247687, + 0x00917508, 0x00d9fa6e, 0x00016bc5, 0x0049e4a3, + 0x004a0414, 0x00028b72, 0x00da1ad9, 0x009295bf, + 0x0093dfa2, 0x00db50c4, 0x0003c16f, 0x004b4e09, + 0x0048aebe, 0x000021d8, 0x00d8b073, 0x00903f15, + 0x00253c9a, 0x006db3fc, 0x00b52257, 0x00fdad31, + 0x00fe4d86, 0x00b6c2e0, 0x006e534b, 0x0026dc2d, + 0x00ff18d3, 0x00b797b5, 0x006f061e, 0x00278978, + 0x002469cf, 0x006ce6a9, 0x00b47702, 0x00fcf864, + 0x0049fbeb, 0x0001748d, 0x00d9e526, 0x00916a40, + 0x00928af7, 0x00da0591, 0x0002943a, 0x004a1b5c, + 0x0096a282, 0x00de2de4, 0x0006bc4f, 0x004e3329, + 0x004dd39e, 0x00055cf8, 0x00ddcd53, 0x00954235, + 0x002041ba, 0x0068cedc, 0x00b05f77, 0x00f8d011, + 0x00fb30a6, 0x00b3bfc0, 0x006b2e6b, 0x0023a10d, + 0x00fa65f3, 0x00b2ea95, 0x006a7b3e, 0x0022f458, + 0x002114ef, 0x00699b89, 0x00b10a22, 0x00f98544, + 0x004c86cb, 0x000409ad, 0x00dc9806, 0x00941760, + 0x0097f7d7, 0x00df78b1, 0x0007e91a, 0x004f667c, + 0x004e2c61, 0x0006a307, 0x00de32ac, 0x0096bdca, + 0x00955d7d, 0x00ddd21b, 0x000543b0, 0x004dccd6, + 0x00f8cf59, 0x00b0403f, 0x0068d194, 0x00205ef2, + 0x0023be45, 0x006b3123, 0x00b3a088, 0x00fb2fee, + 0x0022eb10, 0x006a6476, 0x00b2f5dd, 0x00fa7abb, + 0x00f99a0c, 0x00b1156a, 0x006984c1, 0x00210ba7, + 0x00940828, 0x00dc874e, 0x000416e5, 0x004c9983, + 0x004f7934, 0x0007f652, 0x00df67f9, 0x0097e89f, + 0x00ddf3c3, 0x00957ca5, 0x004ded0e, 0x00056268, + 0x000682df, 0x004e0db9, 0x00969c12, 0x00de1374, + 0x006b10fb, 0x00239f9d, 0x00fb0e36, 0x00b38150, + 0x00b061e7, 0x00f8ee81, 0x00207f2a, 0x0068f04c, + 0x00b134b2, 0x00f9bbd4, 0x00212a7f, 0x0069a519, + 0x006a45ae, 0x0022cac8, 0x00fa5b63, 0x00b2d405, + 0x0007d78a, 0x004f58ec, 0x0097c947, 0x00df4621, + 0x00dca696, 0x009429f0, 0x004cb85b, 0x0004373d, + 0x00057d20, 0x004df246, 0x009563ed, 0x00ddec8b, + 0x00de0c3c, 0x0096835a, 0x004e12f1, 0x00069d97, + 0x00b39e18, 0x00fb117e, 0x002380d5, 0x006b0fb3, + 0x0068ef04, 0x00206062, 0x00f8f1c9, 0x00b07eaf, + 0x0069ba51, 0x00213537, 0x00f9a49c, 0x00b12bfa, + 0x00b2cb4d, 0x00fa442b, 0x0022d580, 0x006a5ae6, + 0x00df5969, 0x0097d60f, 0x004f47a4, 0x0007c8c2, + 0x00042875, 0x004ca713, 0x009436b8, 0x00dcb9de, + 0x00000000, 0x00d70983, 0x00555f80, 0x00825603, + 0x0051f286, 0x0086fb05, 0x0004ad06, 0x00d3a485, + 0x0059a88b, 0x008ea108, 0x000cf70b, 0x00dbfe88, + 0x00085a0d, 0x00df538e, 0x005d058d, 0x008a0c0e, + 0x00491c91, 0x009e1512, 0x001c4311, 0x00cb4a92, + 0x0018ee17, 0x00cfe794, 0x004db197, 0x009ab814, + 0x0010b41a, 0x00c7bd99, 0x0045eb9a, 0x0092e219, + 0x0041469c, 0x00964f1f, 0x0014191c, 0x00c3109f, + 0x006974a4, 0x00be7d27, 0x003c2b24, 0x00eb22a7, + 0x00388622, 0x00ef8fa1, 0x006dd9a2, 0x00bad021, + 0x0030dc2f, 0x00e7d5ac, 0x006583af, 0x00b28a2c, + 0x00612ea9, 0x00b6272a, 0x00347129, 0x00e378aa, + 0x00206835, 0x00f761b6, 0x007537b5, 0x00a23e36, + 0x00719ab3, 0x00a69330, 0x0024c533, 0x00f3ccb0, + 0x0079c0be, 0x00aec93d, 0x002c9f3e, 0x00fb96bd, + 0x00283238, 0x00ff3bbb, 0x007d6db8, 0x00aa643b, + 0x0029a4ce, 0x00fead4d, 0x007cfb4e, 0x00abf2cd, + 0x00785648, 0x00af5fcb, 0x002d09c8, 0x00fa004b, + 0x00700c45, 0x00a705c6, 0x002553c5, 0x00f25a46, + 0x0021fec3, 0x00f6f740, 0x0074a143, 0x00a3a8c0, + 0x0060b85f, 0x00b7b1dc, 0x0035e7df, 0x00e2ee5c, + 0x00314ad9, 0x00e6435a, 0x00641559, 0x00b31cda, + 0x003910d4, 0x00ee1957, 0x006c4f54, 0x00bb46d7, + 0x0068e252, 0x00bfebd1, 0x003dbdd2, 0x00eab451, + 0x0040d06a, 0x0097d9e9, 0x00158fea, 0x00c28669, + 0x001122ec, 0x00c62b6f, 0x00447d6c, 0x009374ef, + 0x001978e1, 0x00ce7162, 0x004c2761, 0x009b2ee2, + 0x00488a67, 0x009f83e4, 0x001dd5e7, 0x00cadc64, + 0x0009ccfb, 0x00dec578, 0x005c937b, 0x008b9af8, + 0x00583e7d, 0x008f37fe, 0x000d61fd, 0x00da687e, + 0x00506470, 0x00876df3, 0x00053bf0, 0x00d23273, + 0x000196f6, 0x00d69f75, 0x0054c976, 0x0083c0f5, + 0x00a9041b, 0x007e0d98, 0x00fc5b9b, 0x002b5218, + 0x00f8f69d, 0x002fff1e, 0x00ada91d, 0x007aa09e, + 0x00f0ac90, 0x0027a513, 0x00a5f310, 0x0072fa93, + 0x00a15e16, 0x00765795, 0x00f40196, 0x00230815, + 0x00e0188a, 0x00371109, 0x00b5470a, 0x00624e89, + 0x00b1ea0c, 0x0066e38f, 0x00e4b58c, 0x0033bc0f, + 0x00b9b001, 0x006eb982, 0x00ecef81, 0x003be602, + 0x00e84287, 0x003f4b04, 0x00bd1d07, 0x006a1484, + 0x00c070bf, 0x0017793c, 0x00952f3f, 0x004226bc, + 0x00918239, 0x00468bba, 0x00c4ddb9, 0x0013d43a, + 0x0099d834, 0x004ed1b7, 0x00cc87b4, 0x001b8e37, + 0x00c82ab2, 0x001f2331, 0x009d7532, 0x004a7cb1, + 0x00896c2e, 0x005e65ad, 0x00dc33ae, 0x000b3a2d, + 0x00d89ea8, 0x000f972b, 0x008dc128, 0x005ac8ab, + 0x00d0c4a5, 0x0007cd26, 0x00859b25, 0x005292a6, + 0x00813623, 0x00563fa0, 0x00d469a3, 0x00036020, + 0x0080a0d5, 0x0057a956, 0x00d5ff55, 0x0002f6d6, + 0x00d15253, 0x00065bd0, 0x00840dd3, 0x00530450, + 0x00d9085e, 0x000e01dd, 0x008c57de, 0x005b5e5d, + 0x0088fad8, 0x005ff35b, 0x00dda558, 0x000aacdb, + 0x00c9bc44, 0x001eb5c7, 0x009ce3c4, 0x004bea47, + 0x00984ec2, 0x004f4741, 0x00cd1142, 0x001a18c1, + 0x009014cf, 0x00471d4c, 0x00c54b4f, 0x001242cc, + 0x00c1e649, 0x0016efca, 0x0094b9c9, 0x0043b04a, + 0x00e9d471, 0x003eddf2, 0x00bc8bf1, 0x006b8272, + 0x00b826f7, 0x006f2f74, 0x00ed7977, 0x003a70f4, + 0x00b07cfa, 0x00677579, 0x00e5237a, 0x00322af9, + 0x00e18e7c, 0x003687ff, 0x00b4d1fc, 0x0063d87f, + 0x00a0c8e0, 0x0077c163, 0x00f59760, 0x00229ee3, + 0x00f13a66, 0x002633e5, 0x00a465e6, 0x00736c65, + 0x00f9606b, 0x002e69e8, 0x00ac3feb, 0x007b3668, + 0x00a892ed, 0x007f9b6e, 0x00fdcd6d, 0x002ac4ee, + 0x00000000, 0x00520936, 0x00a4126c, 0x00f61b5a, + 0x004825d8, 0x001a2cee, 0x00ec37b4, 0x00be3e82, + 0x006b0636, 0x00390f00, 0x00cf145a, 0x009d1d6c, + 0x002323ee, 0x00712ad8, 0x00873182, 0x00d538b4, + 0x00d60c6c, 0x0084055a, 0x00721e00, 0x00201736, + 0x009e29b4, 0x00cc2082, 0x003a3bd8, 0x006832ee, + 0x00bd0a5a, 0x00ef036c, 0x00191836, 0x004b1100, + 0x00f52f82, 0x00a726b4, 0x00513dee, 0x000334d8, + 0x00ac19d8, 0x00fe10ee, 0x00080bb4, 0x005a0282, + 0x00e43c00, 0x00b63536, 0x00402e6c, 0x0012275a, + 0x00c71fee, 0x009516d8, 0x00630d82, 0x003104b4, + 0x008f3a36, 0x00dd3300, 0x002b285a, 0x0079216c, + 0x007a15b4, 0x00281c82, 0x00de07d8, 0x008c0eee, + 0x0032306c, 0x0060395a, 0x00962200, 0x00c42b36, + 0x00111382, 0x00431ab4, 0x00b501ee, 0x00e708d8, + 0x0059365a, 0x000b3f6c, 0x00fd2436, 0x00af2d00, + 0x00a37f36, 0x00f17600, 0x00076d5a, 0x0055646c, + 0x00eb5aee, 0x00b953d8, 0x004f4882, 0x001d41b4, + 0x00c87900, 0x009a7036, 0x006c6b6c, 0x003e625a, + 0x00805cd8, 0x00d255ee, 0x00244eb4, 0x00764782, + 0x0075735a, 0x00277a6c, 0x00d16136, 0x00836800, + 0x003d5682, 0x006f5fb4, 0x009944ee, 0x00cb4dd8, + 0x001e756c, 0x004c7c5a, 0x00ba6700, 0x00e86e36, + 0x005650b4, 0x00045982, 0x00f242d8, 0x00a04bee, + 0x000f66ee, 0x005d6fd8, 0x00ab7482, 0x00f97db4, + 0x00474336, 0x00154a00, 0x00e3515a, 0x00b1586c, + 0x006460d8, 0x003669ee, 0x00c072b4, 0x00927b82, + 0x002c4500, 0x007e4c36, 0x0088576c, 0x00da5e5a, + 0x00d96a82, 0x008b63b4, 0x007d78ee, 0x002f71d8, + 0x00914f5a, 0x00c3466c, 0x00355d36, 0x00675400, + 0x00b26cb4, 0x00e06582, 0x00167ed8, 0x004477ee, + 0x00fa496c, 0x00a8405a, 0x005e5b00, 0x000c5236, + 0x0046ff6c, 0x0014f65a, 0x00e2ed00, 0x00b0e436, + 0x000edab4, 0x005cd382, 0x00aac8d8, 0x00f8c1ee, + 0x002df95a, 0x007ff06c, 0x0089eb36, 0x00dbe200, + 0x0065dc82, 0x0037d5b4, 0x00c1ceee, 0x0093c7d8, + 0x0090f300, 0x00c2fa36, 0x0034e16c, 0x0066e85a, + 0x00d8d6d8, 0x008adfee, 0x007cc4b4, 0x002ecd82, + 0x00fbf536, 0x00a9fc00, 0x005fe75a, 0x000dee6c, + 0x00b3d0ee, 0x00e1d9d8, 0x0017c282, 0x0045cbb4, + 0x00eae6b4, 0x00b8ef82, 0x004ef4d8, 0x001cfdee, + 0x00a2c36c, 0x00f0ca5a, 0x0006d100, 0x0054d836, + 0x0081e082, 0x00d3e9b4, 0x0025f2ee, 0x0077fbd8, + 0x00c9c55a, 0x009bcc6c, 0x006dd736, 0x003fde00, + 0x003cead8, 0x006ee3ee, 0x0098f8b4, 0x00caf182, + 0x0074cf00, 0x0026c636, 0x00d0dd6c, 0x0082d45a, + 0x0057ecee, 0x0005e5d8, 0x00f3fe82, 0x00a1f7b4, + 0x001fc936, 0x004dc000, 0x00bbdb5a, 0x00e9d26c, + 0x00e5805a, 0x00b7896c, 0x00419236, 0x00139b00, + 0x00ada582, 0x00ffacb4, 0x0009b7ee, 0x005bbed8, + 0x008e866c, 0x00dc8f5a, 0x002a9400, 0x00789d36, + 0x00c6a3b4, 0x0094aa82, 0x0062b1d8, 0x0030b8ee, + 0x00338c36, 0x00618500, 0x00979e5a, 0x00c5976c, + 0x007ba9ee, 0x0029a0d8, 0x00dfbb82, 0x008db2b4, + 0x00588a00, 0x000a8336, 0x00fc986c, 0x00ae915a, + 0x0010afd8, 0x0042a6ee, 0x00b4bdb4, 0x00e6b482, + 0x00499982, 0x001b90b4, 0x00ed8bee, 0x00bf82d8, + 0x0001bc5a, 0x0053b56c, 0x00a5ae36, 0x00f7a700, + 0x00229fb4, 0x00709682, 0x00868dd8, 0x00d484ee, + 0x006aba6c, 0x0038b35a, 0x00cea800, 0x009ca136, + 0x009f95ee, 0x00cd9cd8, 0x003b8782, 0x00698eb4, + 0x00d7b036, 0x0085b900, 0x0073a25a, 0x0021ab6c, + 0x00f493d8, 0x00a69aee, 0x005081b4, 0x00028882, + 0x00bcb600, 0x00eebf36, 0x0018a46c, 0x004aad5a +}; + +static inline +u32 crc24_init (void) +{ + return 0xce04b7; +} + +static inline +u32 crc24_next (u32 crc, byte data) +{ + return (crc >> 8) ^ crc24_table[(crc & 0xff) ^ data]; +} + +/* + * Process 4 bytes in one go */ +static inline +u32 crc24_next4 (u32 crc, u32 data) +{ + crc ^= data; + crc = crc24_table[(crc & 0xff) + 0x300] ^ + crc24_table[((crc >> 8) & 0xff) + 0x200] ^ + crc24_table[((crc >> 16) & 0xff) + 0x100] ^ + crc24_table[(crc >> 24) & 0xff]; + return crc; +} -#define CRC24_INIT 0xb704ceL -#define CRC24_POLY 0x1864cfbL +static inline +u32 crc24_final (u32 crc) +{ + return crc & 0xffffff; +} static void crc24rfc2440_init (void *context, unsigned int flags) @@ -249,36 +730,52 @@ crc24rfc2440_init (void *context, unsigned int flags) (void)flags; - ctx->CRC = CRC24_INIT; + ctx->CRC = crc24_init(); } static void crc24rfc2440_write (void *context, const void *inbuf_arg, size_t inlen) { const unsigned char *inbuf = inbuf_arg; - int i; CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; + u32 crc; - if (!inbuf) + if (!inbuf || !inlen) return; - while (inlen--) { - ctx->CRC ^= (*inbuf++) << 16; - for (i = 0; i < 8; i++) { - ctx->CRC <<= 1; - if (ctx->CRC & 0x1000000) - ctx->CRC ^= CRC24_POLY; + crc = ctx->CRC; + + while (inlen >= 16) + { + inlen -= 16; + crc = crc24_next4(crc, buf_get_le32(&inbuf[0])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[4])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[8])); + crc = crc24_next4(crc, buf_get_le32(&inbuf[12])); + inbuf += 16; + } + + while (inlen >= 4) + { + inlen -= 4; + crc = crc24_next4(crc, buf_get_le32(inbuf)); + inbuf += 4; + } + + while (inlen--) + { + crc = crc24_next(crc, *inbuf++); } - } + + ctx->CRC = crc; } static void crc24rfc2440_final (void *context) { CRC_CONTEXT *ctx = (CRC_CONTEXT *) context; - ctx->buf[0] = (ctx->CRC >> 16) & 0xFF; - ctx->buf[1] = (ctx->CRC >> 8) & 0xFF; - ctx->buf[2] = (ctx->CRC ) & 0xFF; + ctx->CRC = crc24_final(ctx->CRC); + buf_put_le32 (ctx->buf, ctx->CRC); } /* We allow the CRC algorithms even in FIPS mode because they are diff --git a/tests/basic.c b/tests/basic.c index 2cf8dd0..bb07394 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -5524,6 +5524,7 @@ check_digests (void) "TY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser Gene" "ral Public License for more details.", "\x4A\x53\x7D\x67" }, + { GCRY_MD_CRC32, "123456789", "\xcb\xf4\x39\x26" }, { GCRY_MD_CRC32_RFC1510, "", "\x00\x00\x00\x00" }, { GCRY_MD_CRC32_RFC1510, "foo", "\x73\x32\xbc\x33" }, { GCRY_MD_CRC32_RFC1510, "test0123456789", "\xb8\x3e\x88\xd6" }, @@ -5539,8 +5540,10 @@ check_digests (void) { GCRY_MD_CRC32_RFC1510, "\x80\x00\x00\x00", "\xed\x59\xb6\x3b" }, { GCRY_MD_CRC32_RFC1510, "\x00\x00\x00\x01", "\x77\x07\x30\x96" }, #endif + { GCRY_MD_CRC32_RFC1510, "123456789", "\x2d\xfd\x2d\x88" }, { GCRY_MD_CRC24_RFC2440, "", "\xb7\x04\xce" }, { GCRY_MD_CRC24_RFC2440, "foo", "\x4f\xc2\x55" }, + { GCRY_MD_CRC24_RFC2440, "123456789", "\x21\xcf\x02" }, { GCRY_MD_TIGER, "", "\x24\xF0\x13\x0C\x63\xAC\x93\x32\x16\x16\x6E\x76" ----------------------------------------------------------------------- Summary of changes: cipher/crc.c | 817 +++++++++++++++++++++++++++++++++++++++++++----------- cipher/rijndael.c | 44 +-- tests/basic.c | 18 ++ 3 files changed, 699 insertions(+), 180 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jan.svensson at hush.com Tue May 5 11:07:18 2015 From: jan.svensson at hush.com (Jan Svensson) Date: Tue, 05 May 2015 11:07:18 +0200 Subject: Libgcrypt license Message-ID: <20150505090718.550AEC071D@smtp.hushmail.com> Hi, I'm planning to develop some software that is using Libgcrypt and am thinking about the license. On https://www.gnupg.org/documentation/manuals/gcrypt/Library-Copying.html#Library-Copying it says LGPL v2.1 and on https://www.gnupg.org/documentation/manuals/gcrypt/Copying.html#Copying it says GPL v2. Is it a mistake that those two web pages are pointing at LGPL v2.1 and GPL v2, i.e. should it be updated to LGPL v3 and GPL v3? Best regards, Jan From jussi.kivilinna at iki.fi Tue May 5 18:49:49 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 05 May 2015 19:49:49 +0300 Subject: [PATCH] hwf-x86: add EDX as output register for xgetbv asm block Message-ID: <20150505164949.30436.56771.stgit@localhost6.localdomain6> * src/hwf-x86.c (get_xgetbv): Add EDX as output. -- XGETBV instruction modifies EAX:EDX register pair, so we need to mark EDX as output to let compiler know that contents in this register are lost. Signed-off-by: Jussi Kivilinna --- src/hwf-x86.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/hwf-x86.c b/src/hwf-x86.c index 7ee246d..baef2df 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -100,11 +100,11 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, static unsigned int get_xgetbv(void) { - unsigned int t_eax; + unsigned int t_eax, t_edx; asm volatile ("xgetbv\n\t" - : "=a" (t_eax) + : "=a" (t_eax), "=d" (t_edx) : "c" (0) ); @@ -151,11 +151,11 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, static unsigned int get_xgetbv(void) { - unsigned int t_eax; + unsigned int t_eax, t_edx; asm volatile ("xgetbv\n\t" - : "=a" (t_eax) + : "=a" (t_eax), "=d" (t_edx) : "c" (0) ); From hi at filippo.io Tue May 5 22:21:45 2015 From: hi at filippo.io (Filippo Valsorda) Date: Tue, 05 May 2015 21:21:45 +0100 Subject: [PATCH] Fix OOB read in do_vsexp_sscan when newline+EOF is found in escape sequence Message-ID: <1430857305.1305636.263122257.54EE3263@webmail.messagingengine.com> The check for the available bytes performed before doing the look-ahead was off by one, causing a out of bound read. Example input: 22 5c 0a |"\.| Found with afl-fuzz and ASAN. Signed-off-by: Filippo Valsorda --- src/sexp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/sexp.c b/src/sexp.c index 9bc13ca..3d8fcf3 100644 --- a/src/sexp.c +++ b/src/sexp.c @@ -1239,7 +1239,7 @@ do_vsexp_sscan (gcry_sexp_t *retsexp, size_t *erroff, case '\r': /* ignore CR[,LF] */ - if (n && (p[1] == '\n')) + if ((n > 1) && (p[1] == '\n')) { p++; n--; @@ -1249,7 +1249,7 @@ do_vsexp_sscan (gcry_sexp_t *retsexp, size_t *erroff, case '\n': /* ignore LF[,CR] */ - if (n && (p[1] == '\r')) + if ((n > 1) && (p[1] == '\r')) { p++; n--; -- 2.3.6 From wk at gnupg.org Wed May 6 09:06:44 2015 From: wk at gnupg.org (Werner Koch) Date: Wed, 06 May 2015 09:06:44 +0200 Subject: Libgcrypt license In-Reply-To: <20150505090718.550AEC071D@smtp.hushmail.com> (Jan Svensson's message of "Tue, 05 May 2015 11:07:18 +0200") References: <20150505090718.550AEC071D@smtp.hushmail.com> Message-ID: <87k2wmuquz.fsf@vigenere.g10code.de> On Tue, 5 May 2015 11:07, jan.svensson at hush.com said: > Is it a mistake that those two web pages are pointing at LGPL v2.1 and GPL v2, i.e. should it be updated to LGPL v3 and GPL v3? See AUTHORS: License (library): LGPLv2.1+ License (manual and tools): GPLv2+ No, the library won't be changed LGPLv3+ because that would disallow the use with GPLv2only software. Manual and tools may be changed tpo GPLv3+ but I see no immediate reasons for this. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From jussi.kivilinna at iki.fi Fri May 8 17:12:13 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 08 May 2015 18:12:13 +0300 Subject: [PATCH] hwf-x86: use edi for passing value to ebx for i386 cpuid Message-ID: <20150508151213.21032.53903.stgit@localhost6.localdomain6> * src/hwf-x86.c [__i386__] (get_cpuid): Use '=D' for regs[1] instead of '=r'. -- On Win32, %ebx can be assigned for '=r' (regs[1]). This results invalid assembly: pushl %ebx movl %ebx, %ebx cpuid movl %ebx, %ebx popl %ebx So use '=D' (%esi) for regs[1] instead. Signed-off-by: Jussi Kivilinna --- src/hwf-x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/hwf-x86.c b/src/hwf-x86.c index baef2df..399952c 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -81,7 +81,7 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, "cpuid\n\t" "movl %%ebx, %1\n\t" "popl %%ebx\n\t" /* Restore GOT register. */ - : "=a" (regs[0]), "=r" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) + : "=a" (regs[0]), "=D" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "0" (in), "1" (0), "2" (0), "3" (0) : "cc" ); From eugene.zelenko at gmail.com Mon May 11 19:52:02 2015 From: eugene.zelenko at gmail.com (Eugene Zelenko) Date: Mon, 11 May 2015 10:52:02 -0700 Subject: Problem with building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler Message-ID: Hi! I got next errors when tried to build building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler: "../src/mpi.h", line 295.16: 1506-343 (S) Redeclaration of _gcry_mpi_ec_set_mpi differs from previous declaration on line 423 of "../src/gcrypt-int.h". "../src/mpi.h", line 297.16: 1506-343 (S) Redeclaration of _gcry_mpi_ec_set_point differs from previous declaration on line 425 of "../src/gcrypt-int.h". "../src/mpi.h", line 302.16: 1506-343 (S) Redeclaration of _gcry_mpi_ec_new differs from previous declaration on line 418 of "../src/gcrypt-int.h". It seems that functions in question in mpi.h should return pgp_error_t as do their implementation in visibility.c. With best regards, Eugene. From eugene.zelenko at gmail.com Tue May 12 01:20:13 2015 From: eugene.zelenko at gmail.com (Eugene Zelenko) Date: Mon, 11 May 2015 16:20:13 -0700 Subject: Problem with building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler In-Reply-To: References: Message-ID: Hi! By the word, many declarations are duplicated in different headers, so compiling with GCC -Wredundant-decls seems reasonable exercise. Eugene. On Mon, May 11, 2015 at 10:52 AM, Eugene Zelenko wrote: > Hi! > > I got next errors when tried to build building libgcrypt 1.6.3 on AIX > 6.1 with IBM compiler: > > "../src/mpi.h", line 295.16: 1506-343 (S) Redeclaration of > _gcry_mpi_ec_set_mpi differs from previous declaration on line 423 of > "../src/gcrypt-int.h". > "../src/mpi.h", line 297.16: 1506-343 (S) Redeclaration of > _gcry_mpi_ec_set_point differs from previous declaration on line 425 > of "../src/gcrypt-int.h". > "../src/mpi.h", line 302.16: 1506-343 (S) Redeclaration of > _gcry_mpi_ec_new differs from previous declaration on line 418 of > "../src/gcrypt-int.h". > > It seems that functions in question in mpi.h should return pgp_error_t > as do their implementation in visibility.c. > > With best regards, > Eugene. From cvs at cvs.gnupg.org Thu May 14 09:02:32 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Thu, 14 May 2015 09:02:32 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-215-gbac42c6 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via bac42c68b069f17abcca810a21439c7233815747 (commit) via e15beb584a5ebdfc363e1ff15f87102508652d71 (commit) from 5a7d55eed3316f40ca61acbee032bfc285e28803 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit bac42c68b069f17abcca810a21439c7233815747 Author: Jussi Kivilinna Date: Fri May 8 18:07:51 2015 +0300 hwf-x86: use edi for passing value to ebx for i386 cpuid * src/hwf-x86.c [__i386__] (get_cpuid): Use '=D' for regs[1] instead of '=r'. -- On Win32, %ebx can be assigned for '=r' (regs[1]). This results invalid assembly: pushl %ebx movl %ebx, %ebx cpuid movl %ebx, %ebx popl %ebx So use '=D' (%esi) for regs[1] instead. Signed-off-by: Jussi Kivilinna diff --git a/src/hwf-x86.c b/src/hwf-x86.c index baef2df..399952c 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -81,7 +81,7 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, "cpuid\n\t" "movl %%ebx, %1\n\t" "popl %%ebx\n\t" /* Restore GOT register. */ - : "=a" (regs[0]), "=r" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) + : "=a" (regs[0]), "=D" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "0" (in), "1" (0), "2" (0), "3" (0) : "cc" ); commit e15beb584a5ebdfc363e1ff15f87102508652d71 Author: Jussi Kivilinna Date: Mon May 4 20:09:51 2015 +0300 hwf-x86: add EDX as output register for xgetbv asm block * src/hwf-x86.c (get_xgetbv): Add EDX as output. -- XGETBV instruction modifies EAX:EDX register pair, so we need to mark EDX as output to let compiler know that contents in this register are lost. Signed-off-by: Jussi Kivilinna diff --git a/src/hwf-x86.c b/src/hwf-x86.c index 7ee246d..baef2df 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -100,11 +100,11 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, static unsigned int get_xgetbv(void) { - unsigned int t_eax; + unsigned int t_eax, t_edx; asm volatile ("xgetbv\n\t" - : "=a" (t_eax) + : "=a" (t_eax), "=d" (t_edx) : "c" (0) ); @@ -151,11 +151,11 @@ get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx, static unsigned int get_xgetbv(void) { - unsigned int t_eax; + unsigned int t_eax, t_edx; asm volatile ("xgetbv\n\t" - : "=a" (t_eax) + : "=a" (t_eax), "=d" (t_edx) : "c" (0) ); ----------------------------------------------------------------------- Summary of changes: src/hwf-x86.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Thu May 14 09:18:03 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 10:18:03 +0300 Subject: [PATCH] Update documentation for Poly1305-ChaCha20 AEAD, RFC-7539 Message-ID: <20150514071803.28092.29978.stgit@localhost6.localdomain6> * cipher/cipher-poly1305.c: Add RFC-7539 to header. * doc/gcrypt.texi: Update Poly1305 AEAD documentation with mention of RFC-7539; Drop Salsa from supported stream ciphers for Poly1305 AEAD. -- Signed-off-by: Jussi Kivilinna --- cipher/cipher-poly1305.c | 2 +- doc/gcrypt.texi | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/cipher/cipher-poly1305.c b/cipher/cipher-poly1305.c index f283333..965a7b6 100644 --- a/cipher/cipher-poly1305.c +++ b/cipher/cipher-poly1305.c @@ -1,4 +1,4 @@ -/* cipher-pol1305.c - Poly1305 based AEAD cipher mode +/* cipher-poly1305.c - Poly1305 based AEAD cipher mode, RFC-7539 * Copyright (C) 2014 Jussi Kivilinna * * This file is part of Libgcrypt. diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 8683ca8..ab4f685 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -1643,9 +1643,10 @@ Associated Data (AEAD) block cipher mode, which is specified in 'NIST Special Publication 800-38D'. @item GCRY_CIPHER_MODE_POLY1305 - at cindex Poly1305 based AEAD mode -Poly1305 is an Authenticated Encryption with Associated Data (AEAD) -mode, which can be used with ChaCha20 and Salsa20 stream ciphers. + at cindex Poly1305 based AEAD mode with ChaCha20 +This mode implements the Poly1305 Authenticated Encryption with Associated +Data (AEAD) mode according to RFC-7539. This mode can be used with ChaCha20 +stream cipher. @item GCRY_CIPHER_MODE_OCB @cindex OCB, OCB3 @@ -1687,7 +1688,7 @@ and the according constants. Note that some modes are incompatible with some algorithms - in particular, stream mode (@code{GCRY_CIPHER_MODE_STREAM}) only works with stream ciphers. Poly1305 AEAD mode (@code{GCRY_CIPHER_MODE_POLY1305}) only works with -ChaCha and Salsa stream ciphers. The block cipher modes +ChaCha20 stream cipher. The block cipher modes (@code{GCRY_CIPHER_MODE_ECB}, @code{GCRY_CIPHER_MODE_CBC}, @code{GCRY_CIPHER_MODE_CFB}, @code{GCRY_CIPHER_MODE_OFB} and @code{GCRY_CIPHER_MODE_CTR}) will work with any block cipher From jussi.kivilinna at iki.fi Thu May 14 13:11:08 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:08 +0300 Subject: [PATCH 02/10] Enable AMD64 Blowfish implementation on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111108.29891.2731.stgit@localhost6.localdomain6> * cipher/blowfish-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/blowfish.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (do_encrypt, do_encrypt_block, do_decrypt_block) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (blowfish_amd64_ctr_enc, blowfish_amd64_cbc_dec) (blowfish_amd64_cfb_dec): New wrapper functions for bulk assembly functions. .. Signed-off-by: Jussi Kivilinna --- cipher/blowfish-amd64.S | 46 +++++++++++++++++------------ cipher/blowfish.c | 74 ++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 97 insertions(+), 23 deletions(-) diff --git a/cipher/blowfish-amd64.S b/cipher/blowfish-amd64.S index 87b676f..21b63fc 100644 --- a/cipher/blowfish-amd64.S +++ b/cipher/blowfish-amd64.S @@ -20,7 +20,15 @@ #ifdef __x86_64 #include -#if defined(USE_BLOWFISH) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_BLOWFISH) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text @@ -120,7 +128,7 @@ movq RX0, (RIO); .align 8 -.type __blowfish_enc_blk1, at function; +ELF(.type __blowfish_enc_blk1, at function;) __blowfish_enc_blk1: /* input: @@ -145,11 +153,11 @@ __blowfish_enc_blk1: movq %r11, %rbp; ret; -.size __blowfish_enc_blk1,.-__blowfish_enc_blk1; +ELF(.size __blowfish_enc_blk1,.-__blowfish_enc_blk1;) .align 8 .globl _gcry_blowfish_amd64_do_encrypt -.type _gcry_blowfish_amd64_do_encrypt, at function; +ELF(.type _gcry_blowfish_amd64_do_encrypt, at function;) _gcry_blowfish_amd64_do_encrypt: /* input: @@ -171,11 +179,11 @@ _gcry_blowfish_amd64_do_encrypt: movl RX0d, (RX2); ret; -.size _gcry_blowfish_amd64_do_encrypt,.-_gcry_blowfish_amd64_do_encrypt; +ELF(.size _gcry_blowfish_amd64_do_encrypt,.-_gcry_blowfish_amd64_do_encrypt;) .align 8 .globl _gcry_blowfish_amd64_encrypt_block -.type _gcry_blowfish_amd64_encrypt_block, at function; +ELF(.type _gcry_blowfish_amd64_encrypt_block, at function;) _gcry_blowfish_amd64_encrypt_block: /* input: @@ -195,11 +203,11 @@ _gcry_blowfish_amd64_encrypt_block: write_block(); ret; -.size _gcry_blowfish_amd64_encrypt_block,.-_gcry_blowfish_amd64_encrypt_block; +ELF(.size _gcry_blowfish_amd64_encrypt_block,.-_gcry_blowfish_amd64_encrypt_block;) .align 8 .globl _gcry_blowfish_amd64_decrypt_block -.type _gcry_blowfish_amd64_decrypt_block, at function; +ELF(.type _gcry_blowfish_amd64_decrypt_block, at function;) _gcry_blowfish_amd64_decrypt_block: /* input: @@ -231,7 +239,7 @@ _gcry_blowfish_amd64_decrypt_block: movq %r11, %rbp; ret; -.size _gcry_blowfish_amd64_decrypt_block,.-_gcry_blowfish_amd64_decrypt_block; +ELF(.size _gcry_blowfish_amd64_decrypt_block,.-_gcry_blowfish_amd64_decrypt_block;) /********************************************************************** 4-way blowfish, four blocks parallel @@ -319,7 +327,7 @@ _gcry_blowfish_amd64_decrypt_block: bswapq RX3; .align 8 -.type __blowfish_enc_blk4, at function; +ELF(.type __blowfish_enc_blk4, at function;) __blowfish_enc_blk4: /* input: @@ -343,10 +351,10 @@ __blowfish_enc_blk4: outbswap_block4(); ret; -.size __blowfish_enc_blk4,.-__blowfish_enc_blk4; +ELF(.size __blowfish_enc_blk4,.-__blowfish_enc_blk4;) .align 8 -.type __blowfish_dec_blk4, at function; +ELF(.type __blowfish_dec_blk4, at function;) __blowfish_dec_blk4: /* input: @@ -372,11 +380,11 @@ __blowfish_dec_blk4: outbswap_block4(); ret; -.size __blowfish_dec_blk4,.-__blowfish_dec_blk4; +ELF(.size __blowfish_dec_blk4,.-__blowfish_dec_blk4;) .align 8 .globl _gcry_blowfish_amd64_ctr_enc -.type _gcry_blowfish_amd64_ctr_enc, at function; +ELF(.type _gcry_blowfish_amd64_ctr_enc, at function;) _gcry_blowfish_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -429,11 +437,11 @@ _gcry_blowfish_amd64_ctr_enc: popq %rbp; ret; -.size _gcry_blowfish_amd64_ctr_enc,.-_gcry_blowfish_amd64_ctr_enc; +ELF(.size _gcry_blowfish_amd64_ctr_enc,.-_gcry_blowfish_amd64_ctr_enc;) .align 8 .globl _gcry_blowfish_amd64_cbc_dec -.type _gcry_blowfish_amd64_cbc_dec, at function; +ELF(.type _gcry_blowfish_amd64_cbc_dec, at function;) _gcry_blowfish_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -477,11 +485,11 @@ _gcry_blowfish_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_blowfish_amd64_cbc_dec,.-_gcry_blowfish_amd64_cbc_dec; +ELF(.size _gcry_blowfish_amd64_cbc_dec,.-_gcry_blowfish_amd64_cbc_dec;) .align 8 .globl _gcry_blowfish_amd64_cfb_dec -.type _gcry_blowfish_amd64_cfb_dec, at function; +ELF(.type _gcry_blowfish_amd64_cfb_dec, at function;) _gcry_blowfish_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -527,7 +535,7 @@ _gcry_blowfish_amd64_cfb_dec: popq %rbx; popq %rbp; ret; -.size _gcry_blowfish_amd64_cfb_dec,.-_gcry_blowfish_amd64_cfb_dec; +ELF(.size _gcry_blowfish_amd64_cfb_dec,.-_gcry_blowfish_amd64_cfb_dec;) #endif /*defined(USE_BLOWFISH)*/ #endif /*__x86_64*/ diff --git a/cipher/blowfish.c b/cipher/blowfish.c index ae470d8..a3fc26c 100644 --- a/cipher/blowfish.c +++ b/cipher/blowfish.c @@ -45,7 +45,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ (BLOWFISH_ROUNDS == 16) # define USE_AMD64_ASM 1 #endif @@ -280,22 +281,87 @@ extern void _gcry_blowfish_amd64_cbc_dec(BLOWFISH_context *ctx, byte *out, extern void _gcry_blowfish_amd64_cfb_dec(BLOWFISH_context *ctx, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + static void do_encrypt ( BLOWFISH_context *bc, u32 *ret_xl, u32 *ret_xr ) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_do_encrypt, bc, ret_xl, ret_xr, NULL); +#else _gcry_blowfish_amd64_do_encrypt (bc, ret_xl, ret_xr); +#endif } static void do_encrypt_block (BLOWFISH_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_encrypt_block, context, outbuf, inbuf, + NULL); +#else _gcry_blowfish_amd64_encrypt_block (context, outbuf, inbuf); +#endif } static void do_decrypt_block (BLOWFISH_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_decrypt_block, context, outbuf, inbuf, + NULL); +#else _gcry_blowfish_amd64_decrypt_block (context, outbuf, inbuf); +#endif +} + +static inline void +blowfish_amd64_ctr_enc(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_ctr_enc, ctx, out, in, ctr); +#else + _gcry_blowfish_amd64_ctr_enc(ctx, out, in, ctr); +#endif +} + +static inline void +blowfish_amd64_cbc_dec(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_cbc_dec, ctx, out, in, iv); +#else + _gcry_blowfish_amd64_cbc_dec(ctx, out, in, iv); +#endif +} + +static inline void +blowfish_amd64_cfb_dec(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_cfb_dec, ctx, out, in, iv); +#else + _gcry_blowfish_amd64_cfb_dec(ctx, out, in, iv); +#endif } static unsigned int @@ -605,7 +671,7 @@ _gcry_blowfish_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + blowfish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; @@ -674,7 +740,7 @@ _gcry_blowfish_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + blowfish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; @@ -734,7 +800,7 @@ _gcry_blowfish_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + blowfish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; From jussi.kivilinna at iki.fi Thu May 14 13:11:13 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:13 +0300 Subject: [PATCH 03/10] Enable AMD64 Camellia implementations on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111113.29891.99288.stgit@localhost6.localdomain6> * cipher/camellia-aesni-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/camellia-aesni-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/camellia-glue.c (USE_AESNI_AVX, USE_AESNI_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AESNI_AVX ||?USE_AESNI_AVX2] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (_gcry_camellia_aesni_avx_ctr_enc, _gcry_camellia_aesni_avx_cbc_dec) (_gcry_camellia_aesni_avx_cfb_dec, _gcry_camellia_aesni_avx_keygen) (_gcry_camellia_aesni_avx2_ctr_enc, _gcry_camellia_aesni_avx2_cbc_dec) (_gcry_camellia_aesni_avx2_cfb_dec): Add ASM_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna --- cipher/camellia-aesni-avx-amd64.S | 41 ++++++++++++++---------- cipher/camellia-aesni-avx2-amd64.S | 29 +++++++++++------ cipher/camellia-glue.c | 61 +++++++++++++++++++++++++----------- 3 files changed, 85 insertions(+), 46 deletions(-) diff --git a/cipher/camellia-aesni-avx-amd64.S b/cipher/camellia-aesni-avx-amd64.S index 6d157a7..c047a21 100644 --- a/cipher/camellia-aesni-avx-amd64.S +++ b/cipher/camellia-aesni-avx-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT) #ifdef __PIC__ @@ -29,6 +30,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #define CAMELLIA_TABLE_BYTE_LEN 272 /* struct CAMELLIA_context: */ @@ -769,7 +776,7 @@ .text .align 8 -.type __camellia_enc_blk16, at function; +ELF(.type __camellia_enc_blk16, at function;) __camellia_enc_blk16: /* input: @@ -853,10 +860,10 @@ __camellia_enc_blk16: %xmm15, %rax, %rcx, 24); jmp .Lenc_done; -.size __camellia_enc_blk16,.-__camellia_enc_blk16; +ELF(.size __camellia_enc_blk16,.-__camellia_enc_blk16;) .align 8 -.type __camellia_dec_blk16, at function; +ELF(.type __camellia_dec_blk16, at function;) __camellia_dec_blk16: /* input: @@ -938,7 +945,7 @@ __camellia_dec_blk16: ((key_table + (24) * 8) + 4)(CTX)); jmp .Ldec_max24; -.size __camellia_dec_blk16,.-__camellia_dec_blk16; +ELF(.size __camellia_dec_blk16,.-__camellia_dec_blk16;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -948,7 +955,7 @@ __camellia_dec_blk16: .align 8 .globl _gcry_camellia_aesni_avx_ctr_enc -.type _gcry_camellia_aesni_avx_ctr_enc, at function; +ELF(.type _gcry_camellia_aesni_avx_ctr_enc, at function;) _gcry_camellia_aesni_avx_ctr_enc: /* input: @@ -1062,11 +1069,11 @@ _gcry_camellia_aesni_avx_ctr_enc: leave; ret; -.size _gcry_camellia_aesni_avx_ctr_enc,.-_gcry_camellia_aesni_avx_ctr_enc; +ELF(.size _gcry_camellia_aesni_avx_ctr_enc,.-_gcry_camellia_aesni_avx_ctr_enc;) .align 8 .globl _gcry_camellia_aesni_avx_cbc_dec -.type _gcry_camellia_aesni_avx_cbc_dec, at function; +ELF(.type _gcry_camellia_aesni_avx_cbc_dec, at function;) _gcry_camellia_aesni_avx_cbc_dec: /* input: @@ -1130,11 +1137,11 @@ _gcry_camellia_aesni_avx_cbc_dec: leave; ret; -.size _gcry_camellia_aesni_avx_cbc_dec,.-_gcry_camellia_aesni_avx_cbc_dec; +ELF(.size _gcry_camellia_aesni_avx_cbc_dec,.-_gcry_camellia_aesni_avx_cbc_dec;) .align 8 .globl _gcry_camellia_aesni_avx_cfb_dec -.type _gcry_camellia_aesni_avx_cfb_dec, at function; +ELF(.type _gcry_camellia_aesni_avx_cfb_dec, at function;) _gcry_camellia_aesni_avx_cfb_dec: /* input: @@ -1202,7 +1209,7 @@ _gcry_camellia_aesni_avx_cfb_dec: leave; ret; -.size _gcry_camellia_aesni_avx_cfb_dec,.-_gcry_camellia_aesni_avx_cfb_dec; +ELF(.size _gcry_camellia_aesni_avx_cfb_dec,.-_gcry_camellia_aesni_avx_cfb_dec;) /* * IN: @@ -1309,7 +1316,7 @@ _gcry_camellia_aesni_avx_cfb_dec: .text .align 8 -.type __camellia_avx_setup128, at function; +ELF(.type __camellia_avx_setup128, at function;) __camellia_avx_setup128: /* input: * %rdi: ctx, CTX; subkey storage at key_table(CTX) @@ -1650,10 +1657,10 @@ __camellia_avx_setup128: vzeroall; ret; -.size __camellia_avx_setup128,.-__camellia_avx_setup128; +ELF(.size __camellia_avx_setup128,.-__camellia_avx_setup128;) .align 8 -.type __camellia_avx_setup256, at function; +ELF(.type __camellia_avx_setup256, at function;) __camellia_avx_setup256: /* input: @@ -2127,11 +2134,11 @@ __camellia_avx_setup256: vzeroall; ret; -.size __camellia_avx_setup256,.-__camellia_avx_setup256; +ELF(.size __camellia_avx_setup256,.-__camellia_avx_setup256;) .align 8 .globl _gcry_camellia_aesni_avx_keygen -.type _gcry_camellia_aesni_avx_keygen, at function; +ELF(.type _gcry_camellia_aesni_avx_keygen, at function;) _gcry_camellia_aesni_avx_keygen: /* input: @@ -2159,7 +2166,7 @@ _gcry_camellia_aesni_avx_keygen: vpor %xmm2, %xmm1, %xmm1; jmp __camellia_avx_setup256; -.size _gcry_camellia_aesni_avx_keygen,.-_gcry_camellia_aesni_avx_keygen; +ELF(.size _gcry_camellia_aesni_avx_keygen,.-_gcry_camellia_aesni_avx_keygen;) #endif /*defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT)*/ #endif /*__x86_64*/ diff --git a/cipher/camellia-aesni-avx2-amd64.S b/cipher/camellia-aesni-avx2-amd64.S index 25f48bc..a3fa229 100644 --- a/cipher/camellia-aesni-avx2-amd64.S +++ b/cipher/camellia-aesni-avx2-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT) #ifdef __PIC__ @@ -29,6 +30,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #define CAMELLIA_TABLE_BYTE_LEN 272 /* struct CAMELLIA_context: */ @@ -748,7 +755,7 @@ .text .align 8 -.type __camellia_enc_blk32, at function; +ELF(.type __camellia_enc_blk32, at function;) __camellia_enc_blk32: /* input: @@ -832,10 +839,10 @@ __camellia_enc_blk32: %ymm15, %rax, %rcx, 24); jmp .Lenc_done; -.size __camellia_enc_blk32,.-__camellia_enc_blk32; +ELF(.size __camellia_enc_blk32,.-__camellia_enc_blk32;) .align 8 -.type __camellia_dec_blk32, at function; +ELF(.type __camellia_dec_blk32, at function;) __camellia_dec_blk32: /* input: @@ -917,7 +924,7 @@ __camellia_dec_blk32: ((key_table + (24) * 8) + 4)(CTX)); jmp .Ldec_max24; -.size __camellia_dec_blk32,.-__camellia_dec_blk32; +ELF(.size __camellia_dec_blk32,.-__camellia_dec_blk32;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -927,7 +934,7 @@ __camellia_dec_blk32: .align 8 .globl _gcry_camellia_aesni_avx2_ctr_enc -.type _gcry_camellia_aesni_avx2_ctr_enc, at function; +ELF(.type _gcry_camellia_aesni_avx2_ctr_enc, at function;) _gcry_camellia_aesni_avx2_ctr_enc: /* input: @@ -1111,11 +1118,11 @@ _gcry_camellia_aesni_avx2_ctr_enc: leave; ret; -.size _gcry_camellia_aesni_avx2_ctr_enc,.-_gcry_camellia_aesni_avx2_ctr_enc; +ELF(.size _gcry_camellia_aesni_avx2_ctr_enc,.-_gcry_camellia_aesni_avx2_ctr_enc;) .align 8 .globl _gcry_camellia_aesni_avx2_cbc_dec -.type _gcry_camellia_aesni_avx2_cbc_dec, at function; +ELF(.type _gcry_camellia_aesni_avx2_cbc_dec, at function;) _gcry_camellia_aesni_avx2_cbc_dec: /* input: @@ -1183,11 +1190,11 @@ _gcry_camellia_aesni_avx2_cbc_dec: leave; ret; -.size _gcry_camellia_aesni_avx2_cbc_dec,.-_gcry_camellia_aesni_avx2_cbc_dec; +ELF(.size _gcry_camellia_aesni_avx2_cbc_dec,.-_gcry_camellia_aesni_avx2_cbc_dec;) .align 8 .globl _gcry_camellia_aesni_avx2_cfb_dec -.type _gcry_camellia_aesni_avx2_cfb_dec, at function; +ELF(.type _gcry_camellia_aesni_avx2_cfb_dec, at function;) _gcry_camellia_aesni_avx2_cfb_dec: /* input: @@ -1257,7 +1264,7 @@ _gcry_camellia_aesni_avx2_cfb_dec: leave; ret; -.size _gcry_camellia_aesni_avx2_cfb_dec,.-_gcry_camellia_aesni_avx2_cfb_dec; +ELF(.size _gcry_camellia_aesni_avx2_cfb_dec,.-_gcry_camellia_aesni_avx2_cfb_dec;) #endif /*defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT)*/ #endif /*__x86_64*/ diff --git a/cipher/camellia-glue.c b/cipher/camellia-glue.c index f18d135..5032321 100644 --- a/cipher/camellia-glue.c +++ b/cipher/camellia-glue.c @@ -75,7 +75,8 @@ /* USE_AESNI inidicates whether to compile with Intel AES-NI/AVX code. */ #undef USE_AESNI_AVX #if defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT) -# if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AESNI_AVX 1 # endif #endif @@ -83,7 +84,8 @@ /* USE_AESNI_AVX2 inidicates whether to compile with Intel AES-NI/AVX2 code. */ #undef USE_AESNI_AVX2 #if defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT) -# if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AESNI_AVX2 1 # endif #endif @@ -100,6 +102,20 @@ typedef struct #endif /*USE_AESNI_AVX2*/ } CAMELLIA_context; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_AESNI_AVX) || defined(USE_AESNI_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + #ifdef USE_AESNI_AVX /* Assembler implementations of Camellia using AES-NI and AVX. Process data in 16 block same time. @@ -107,21 +123,21 @@ typedef struct extern void _gcry_camellia_aesni_avx_ctr_enc(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_cbc_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_cfb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, const unsigned char *key, - unsigned int keylen); + unsigned int keylen) ASM_FUNC_ABI; #endif #ifdef USE_AESNI_AVX2 @@ -131,17 +147,17 @@ extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, extern void _gcry_camellia_aesni_avx2_ctr_enc(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_cbc_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_cfb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif static const char *selftest(void); @@ -318,7 +334,7 @@ _gcry_camellia_ctr_enc(void *context, unsigned char *ctr, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -347,8 +363,11 @@ _gcry_camellia_ctr_enc(void *context, unsigned char *ctr, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ @@ -409,7 +428,7 @@ _gcry_camellia_cbc_dec(void *context, unsigned char *iv, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK;; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -437,8 +456,11 @@ _gcry_camellia_cbc_dec(void *context, unsigned char *iv, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ @@ -491,7 +513,7 @@ _gcry_camellia_cfb_dec(void *context, unsigned char *iv, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -519,8 +541,11 @@ _gcry_camellia_cfb_dec(void *context, unsigned char *iv, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ From jussi.kivilinna at iki.fi Thu May 14 13:11:29 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:29 +0300 Subject: [PATCH 06/10] Enable AMD64 3DES implementation on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111129.29891.69965.stgit@localhost6.localdomain6> * cipher/des-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/des.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (tripledes_ecb_crypt) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (tripledes_amd64_ctr_enc, tripledes_amd64_cbc_dec) (tripledes_amd64_cfb_dec): New wrapper functions for bulk assembly functions. -- Signed-off-by: Jussi Kivilinna --- cipher/des-amd64.S | 29 +++++++++++++++---------- cipher/des.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 75 insertions(+), 15 deletions(-) diff --git a/cipher/des-amd64.S b/cipher/des-amd64.S index e8b2c56..307d211 100644 --- a/cipher/des-amd64.S +++ b/cipher/des-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(USE_DES) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_DES) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) #ifdef __PIC__ # define RIP (%rip) @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text #define s1 0 @@ -185,7 +192,7 @@ .align 8 .globl _gcry_3des_amd64_crypt_block -.type _gcry_3des_amd64_crypt_block, at function; +ELF(.type _gcry_3des_amd64_crypt_block, at function;) _gcry_3des_amd64_crypt_block: /* input: @@ -271,7 +278,7 @@ _gcry_3des_amd64_crypt_block: popq %rbp; ret; -.size _gcry_3des_amd64_crypt_block,.-_gcry_3des_amd64_crypt_block; +ELF(.size _gcry_3des_amd64_crypt_block,.-_gcry_3des_amd64_crypt_block;) /*********************************************************************** * 3-way 3DES @@ -458,7 +465,7 @@ _gcry_3des_amd64_crypt_block: movl right##d, 4(io); .align 8 -.type _gcry_3des_amd64_crypt_blk3, at function; +ELF(.type _gcry_3des_amd64_crypt_blk3, at function;) _gcry_3des_amd64_crypt_blk3: /* input: * %rdi: round keys, CTX @@ -528,11 +535,11 @@ _gcry_3des_amd64_crypt_blk3: final_permutation3(RR, RL); ret; -.size _gcry_3des_amd64_crypt_blk3,.-_gcry_3des_amd64_crypt_blk3; +ELF(.size _gcry_3des_amd64_crypt_blk3,.-_gcry_3des_amd64_crypt_blk3;) .align 8 .globl _gcry_3des_amd64_cbc_dec -.type _gcry_3des_amd64_cbc_dec, at function; +ELF(.type _gcry_3des_amd64_cbc_dec, at function;) _gcry_3des_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -604,11 +611,11 @@ _gcry_3des_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec; +ELF(.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec;) .align 8 .globl _gcry_3des_amd64_ctr_enc -.type _gcry_3des_amd64_ctr_enc, at function; +ELF(.type _gcry_3des_amd64_ctr_enc, at function;) _gcry_3des_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -682,11 +689,11 @@ _gcry_3des_amd64_ctr_enc: popq %rbp; ret; -.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec; +ELF(.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec;) .align 8 .globl _gcry_3des_amd64_cfb_dec -.type _gcry_3des_amd64_cfb_dec, at function; +ELF(.type _gcry_3des_amd64_cfb_dec, at function;) _gcry_3des_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -757,7 +764,7 @@ _gcry_3des_amd64_cfb_dec: popq %rbx; popq %rbp; ret; -.size _gcry_3des_amd64_cfb_dec,.-_gcry_3des_amd64_cfb_dec; +ELF(.size _gcry_3des_amd64_cfb_dec,.-_gcry_3des_amd64_cfb_dec;) .data .align 16 diff --git a/cipher/des.c b/cipher/des.c index d4863d1..be62763 100644 --- a/cipher/des.c +++ b/cipher/des.c @@ -127,7 +127,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -771,6 +772,24 @@ extern void _gcry_3des_amd64_cfb_dec(const void *keys, byte *out, #define TRIPLEDES_ECB_BURN_STACK (8 * sizeof(void *)) +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + /* * Electronic Codebook Mode Triple-DES encryption/decryption of data * according to 'mode'. Sometimes this mode is named 'EDE' mode @@ -784,11 +803,45 @@ tripledes_ecb_crypt (struct _tripledes_ctx *ctx, const byte * from, keys = mode ? ctx->decrypt_subkeys : ctx->encrypt_subkeys; +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_crypt_block, keys, to, from, NULL); +#else _gcry_3des_amd64_crypt_block(keys, to, from); +#endif return 0; } +static inline void +tripledes_amd64_ctr_enc(const void *keys, byte *out, const byte *in, byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_ctr_enc, keys, out, in, ctr); +#else + _gcry_3des_amd64_ctr_enc(keys, out, in, ctr); +#endif +} + +static inline void +tripledes_amd64_cbc_dec(const void *keys, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_cbc_dec, keys, out, in, iv); +#else + _gcry_3des_amd64_cbc_dec(keys, out, in, iv); +#endif +} + +static inline void +tripledes_amd64_cfb_dec(const void *keys, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_cfb_dec, keys, out, in, iv); +#else + _gcry_3des_amd64_cfb_dec(keys, out, in, iv); +#endif +} + #else /*USE_AMD64_ASM*/ #define TRIPLEDES_ECB_BURN_STACK 32 @@ -871,7 +924,7 @@ _gcry_3des_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_ctr_enc(ctx->encrypt_subkeys, outbuf, inbuf, ctr); + tripledes_amd64_ctr_enc(ctx->encrypt_subkeys, outbuf, inbuf, ctr); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; @@ -926,7 +979,7 @@ _gcry_3des_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_cbc_dec(ctx->decrypt_subkeys, outbuf, inbuf, iv); + tripledes_amd64_cbc_dec(ctx->decrypt_subkeys, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; @@ -974,7 +1027,7 @@ _gcry_3des_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_cfb_dec(ctx->encrypt_subkeys, outbuf, inbuf, iv); + tripledes_amd64_cfb_dec(ctx->encrypt_subkeys, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; From jussi.kivilinna at iki.fi Thu May 14 13:11:03 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:03 +0300 Subject: [PATCH 01/10] Enable AMD64 arcfour implementation on WIN64 Message-ID: <20150514111103.29891.37464.stgit@localhost6.localdomain6> * cipher/arcfour-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/arcfour.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (do_encrypt, do_decrypt) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Use assembly block to call AMD64 assembly function. -- Signed-off-by: Jussi Kivilinna --- cipher/arcfour-amd64.S | 13 ++++++++++--- cipher/arcfour.c | 17 ++++++++++++++++- 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/cipher/arcfour-amd64.S b/cipher/arcfour-amd64.S index 8b8031a..2e52ea0 100644 --- a/cipher/arcfour-amd64.S +++ b/cipher/arcfour-amd64.S @@ -15,12 +15,19 @@ #ifdef __x86_64__ #include -#if defined(USE_ARCFOUR) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_ARCFOUR) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 16 .globl _gcry_arcfour_amd64 -.type _gcry_arcfour_amd64, at function +ELF(.type _gcry_arcfour_amd64, at function) _gcry_arcfour_amd64: push %rbp push %rbx @@ -91,7 +98,7 @@ _gcry_arcfour_amd64: pop %rbp ret .L__gcry_arcfour_amd64_end: -.size _gcry_arcfour_amd64,.L__gcry_arcfour_amd64_end-_gcry_arcfour_amd64 +ELF(.size _gcry_arcfour_amd64,.L__gcry_arcfour_amd64_end-_gcry_arcfour_amd64) #endif #endif diff --git a/cipher/arcfour.c b/cipher/arcfour.c index 27537bf..44e8ef4 100644 --- a/cipher/arcfour.c +++ b/cipher/arcfour.c @@ -33,7 +33,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -53,7 +54,21 @@ static void encrypt_stream (void *context, byte *outbuf, const byte *inbuf, size_t length) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + const void *fn = _gcry_arcfour_amd64; + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (context), + "+S" (length), + "+d" (inbuf), + "+c" (outbuf) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +#else _gcry_arcfour_amd64 (context, length, inbuf, outbuf ); +#endif } #else /*!USE_AMD64_ASM*/ From jussi.kivilinna at iki.fi Thu May 14 13:11:19 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:19 +0300 Subject: [PATCH 04/10] Enable AMD64 CAST5 implementation on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111118.29891.12226.stgit@localhost6.localdomain6> * cipher/cast5-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (RIP): Remove. (GET_EXTERN_POINTER): Use 'leaq' version on WIN64. (ELF): New macro to mask lines with ELF specific commands. * cipher/cast5.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (do_encrypt_block, do_decrypt_block) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (cast5_amd64_ctr_enc, cast5_amd64_cbc_dec) (cast5_amd64_cfb_dec): New wrapper functions for bulk assembly functions. -- Signed-off-by: Jussi Kivilinna --- cipher/cast5-amd64.S | 43 ++++++++++++++++++-------------- cipher/cast5.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 86 insertions(+), 24 deletions(-) diff --git a/cipher/cast5-amd64.S b/cipher/cast5-amd64.S index 41fbb74..a5f078e 100644 --- a/cipher/cast5-amd64.S +++ b/cipher/cast5-amd64.S @@ -20,14 +20,19 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_CAST5) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_CAST5) -#ifdef __PIC__ -# define RIP %rip +#if defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) || !defined(__PIC__) +# define GET_EXTERN_POINTER(name, reg) leaq name, reg +#else # define GET_EXTERN_POINTER(name, reg) movq name at GOTPCREL(%rip), reg +#endif + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ #else -# define RIP -# define GET_EXTERN_POINTER(name, reg) leaq name, reg +# define ELF(...) /*_*/ #endif .text @@ -180,7 +185,7 @@ .align 8 .globl _gcry_cast5_amd64_encrypt_block -.type _gcry_cast5_amd64_encrypt_block, at function; +ELF(.type _gcry_cast5_amd64_encrypt_block, at function;) _gcry_cast5_amd64_encrypt_block: /* input: @@ -216,11 +221,11 @@ _gcry_cast5_amd64_encrypt_block: popq %rbx; popq %rbp; ret; -.size _gcry_cast5_amd64_encrypt_block,.-_gcry_cast5_amd64_encrypt_block; +ELF(.size _gcry_cast5_amd64_encrypt_block,.-_gcry_cast5_amd64_encrypt_block;) .align 8 .globl _gcry_cast5_amd64_decrypt_block -.type _gcry_cast5_amd64_decrypt_block, at function; +ELF(.type _gcry_cast5_amd64_decrypt_block, at function;) _gcry_cast5_amd64_decrypt_block: /* input: @@ -256,7 +261,7 @@ _gcry_cast5_amd64_decrypt_block: popq %rbx; popq %rbp; ret; -.size _gcry_cast5_amd64_decrypt_block,.-_gcry_cast5_amd64_decrypt_block; +ELF(.size _gcry_cast5_amd64_decrypt_block,.-_gcry_cast5_amd64_decrypt_block;) /********************************************************************** 4-way cast5, four blocks parallel @@ -359,7 +364,7 @@ _gcry_cast5_amd64_decrypt_block: rorq $32, d; .align 8 -.type __cast5_enc_blk4, at function; +ELF(.type __cast5_enc_blk4, at function;) __cast5_enc_blk4: /* input: @@ -384,10 +389,10 @@ __cast5_enc_blk4: outbswap_block4(RLR0, RLR1, RLR2, RLR3); ret; -.size __cast5_enc_blk4,.-__cast5_enc_blk4; +ELF(.size __cast5_enc_blk4,.-__cast5_enc_blk4;) .align 8 -.type __cast5_dec_blk4, at function; +ELF(.type __cast5_dec_blk4, at function;) __cast5_dec_blk4: /* input: @@ -414,11 +419,11 @@ __cast5_dec_blk4: outbswap_block4(RLR0, RLR1, RLR2, RLR3); ret; -.size __cast5_dec_blk4,.-__cast5_dec_blk4; +ELF(.size __cast5_dec_blk4,.-__cast5_dec_blk4;) .align 8 .globl _gcry_cast5_amd64_ctr_enc -.type _gcry_cast5_amd64_ctr_enc, at function; +ELF(.type _gcry_cast5_amd64_ctr_enc, at function;) _gcry_cast5_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -472,11 +477,11 @@ _gcry_cast5_amd64_ctr_enc: popq %rbx; popq %rbp; ret -.size _gcry_cast5_amd64_ctr_enc,.-_gcry_cast5_amd64_ctr_enc; +ELF(.size _gcry_cast5_amd64_ctr_enc,.-_gcry_cast5_amd64_ctr_enc;) .align 8 .globl _gcry_cast5_amd64_cbc_dec -.type _gcry_cast5_amd64_cbc_dec, at function; +ELF(.type _gcry_cast5_amd64_cbc_dec, at function;) _gcry_cast5_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -526,11 +531,11 @@ _gcry_cast5_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_cast5_amd64_cbc_dec,.-_gcry_cast5_amd64_cbc_dec; +ELF(.size _gcry_cast5_amd64_cbc_dec,.-_gcry_cast5_amd64_cbc_dec;) .align 8 .globl _gcry_cast5_amd64_cfb_dec -.type _gcry_cast5_amd64_cfb_dec, at function; +ELF(.type _gcry_cast5_amd64_cfb_dec, at function;) _gcry_cast5_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -581,7 +586,7 @@ _gcry_cast5_amd64_cfb_dec: popq %rbp; ret; -.size _gcry_cast5_amd64_cfb_dec,.-_gcry_cast5_amd64_cfb_dec; +ELF(.size _gcry_cast5_amd64_cfb_dec,.-_gcry_cast5_amd64_cfb_dec;) #endif /*defined(USE_CAST5)*/ #endif /*__x86_64*/ diff --git a/cipher/cast5.c b/cipher/cast5.c index 115e1e6..94dcee7 100644 --- a/cipher/cast5.c +++ b/cipher/cast5.c @@ -48,7 +48,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -372,16 +373,72 @@ extern void _gcry_cast5_amd64_cbc_dec(CAST5_context *ctx, byte *out, extern void _gcry_cast5_amd64_cfb_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + static void do_encrypt_block (CAST5_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_encrypt_block, context, outbuf, inbuf, NULL); +#else _gcry_cast5_amd64_encrypt_block (context, outbuf, inbuf); +#endif } static void do_decrypt_block (CAST5_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_decrypt_block, context, outbuf, inbuf, NULL); +#else _gcry_cast5_amd64_decrypt_block (context, outbuf, inbuf); +#endif +} + +static void +cast5_amd64_ctr_enc(CAST5_context *ctx, byte *out, const byte *in, byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_ctr_enc, ctx, out, in, ctr); +#else + _gcry_cast5_amd64_ctr_enc (ctx, out, in, ctr); +#endif +} + +static void +cast5_amd64_cbc_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_cbc_dec, ctx, out, in, iv); +#else + _gcry_cast5_amd64_cbc_dec (ctx, out, in, iv); +#endif +} + +static void +cast5_amd64_cfb_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_cfb_dec, ctx, out, in, iv); +#else + _gcry_cast5_amd64_cfb_dec (ctx, out, in, iv); +#endif } static unsigned int @@ -396,7 +453,7 @@ static unsigned int decrypt_block (void *context, byte *outbuf, const byte *inbuf) { CAST5_context *c = (CAST5_context *) context; - _gcry_cast5_amd64_decrypt_block (c, outbuf, inbuf); + do_decrypt_block (c, outbuf, inbuf); return /*burn_stack*/ (2*8); } @@ -582,7 +639,7 @@ _gcry_cast5_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + cast5_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; @@ -651,7 +708,7 @@ _gcry_cast5_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + cast5_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; @@ -710,7 +767,7 @@ _gcry_cast5_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + cast5_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; From jussi.kivilinna at iki.fi Thu May 14 13:11:24 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:24 +0300 Subject: [PATCH 05/10] Enable AMD64 ChaCha20 implementations on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111124.29891.43425.stgit@localhost6.localdomain6> * cipher/chacha20-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20-ssse3-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20.c (USE_SSE2, USE_SSSE3, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (chacha20_blocks_t, _gcry_chacha20_amd64_sse2_blocks) (_gcry_chacha20_amd64_ssse3_blocks, _gcry_chacha20_amd64_avx2_blocks) (_gcry_chacha20_armv7_neon_blocks, chacha20_blocks): Add ASM_FUNC_ABI. (chacha20_core): Add ASM_EXTRA_STACK. -- Signed-off-by: Jussi Kivilinna --- cipher/chacha20-avx2-amd64.S | 13 ++++++++++-- cipher/chacha20-sse2-amd64.S | 13 ++++++++++-- cipher/chacha20-ssse3-amd64.S | 13 ++++++++++-- cipher/chacha20.c | 43 +++++++++++++++++++++++++++++++---------- 4 files changed, 63 insertions(+), 19 deletions(-) diff --git a/cipher/chacha20-avx2-amd64.S b/cipher/chacha20-avx2-amd64.S index 1f33de8..12bed35 100644 --- a/cipher/chacha20-avx2-amd64.S +++ b/cipher/chacha20-avx2-amd64.S @@ -26,7 +26,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) && USE_CHACHA20 #ifdef __PIC__ @@ -35,11 +36,17 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_chacha20_amd64_avx2_blocks -.type _gcry_chacha20_amd64_avx2_blocks, at function; +ELF(.type _gcry_chacha20_amd64_avx2_blocks, at function;) _gcry_chacha20_amd64_avx2_blocks: .Lchacha_blocks_avx2_local: vzeroupper @@ -938,7 +945,7 @@ _gcry_chacha20_amd64_avx2_blocks: vzeroall movl $(63 + 512), %eax ret -.size _gcry_chacha20_amd64_avx2_blocks,.-_gcry_chacha20_amd64_avx2_blocks; +ELF(.size _gcry_chacha20_amd64_avx2_blocks,.-_gcry_chacha20_amd64_avx2_blocks;) .data .align 16 diff --git a/cipher/chacha20-sse2-amd64.S b/cipher/chacha20-sse2-amd64.S index 4811f40..2b9842c 100644 --- a/cipher/chacha20-sse2-amd64.S +++ b/cipher/chacha20-sse2-amd64.S @@ -26,13 +26,20 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && USE_CHACHA20 +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && USE_CHACHA20 + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 8 .globl _gcry_chacha20_amd64_sse2_blocks -.type _gcry_chacha20_amd64_sse2_blocks, at function; +ELF(.type _gcry_chacha20_amd64_sse2_blocks, at function;) _gcry_chacha20_amd64_sse2_blocks: .Lchacha_blocks_sse2_local: pushq %rbx @@ -646,7 +653,7 @@ _gcry_chacha20_amd64_sse2_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_chacha20_amd64_sse2_blocks,.-_gcry_chacha20_amd64_sse2_blocks; +ELF(.size _gcry_chacha20_amd64_sse2_blocks,.-_gcry_chacha20_amd64_sse2_blocks;) #endif /*defined(USE_CHACHA20)*/ #endif /*__x86_64*/ diff --git a/cipher/chacha20-ssse3-amd64.S b/cipher/chacha20-ssse3-amd64.S index 50c2ff8..a1a843f 100644 --- a/cipher/chacha20-ssse3-amd64.S +++ b/cipher/chacha20-ssse3-amd64.S @@ -26,7 +26,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && USE_CHACHA20 #ifdef __PIC__ @@ -35,11 +36,17 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_chacha20_amd64_ssse3_blocks -.type _gcry_chacha20_amd64_ssse3_blocks, at function; +ELF(.type _gcry_chacha20_amd64_ssse3_blocks, at function;) _gcry_chacha20_amd64_ssse3_blocks: .Lchacha_blocks_ssse3_local: pushq %rbx @@ -614,7 +621,7 @@ _gcry_chacha20_amd64_ssse3_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_chacha20_amd64_ssse3_blocks,.-_gcry_chacha20_amd64_ssse3_blocks; +ELF(.size _gcry_chacha20_amd64_ssse3_blocks,.-_gcry_chacha20_amd64_ssse3_blocks;) .data .align 16; diff --git a/cipher/chacha20.c b/cipher/chacha20.c index 2eaeffd..e25e239 100644 --- a/cipher/chacha20.c +++ b/cipher/chacha20.c @@ -50,20 +50,23 @@ /* USE_SSE2 indicates whether to compile with Intel SSE2 code. */ #undef USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSE2 1 #endif /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) # define USE_SSSE3 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) # define USE_AVX2 1 #endif @@ -82,8 +85,23 @@ struct CHACHA20_context_s; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if (defined(USE_SSE2) || defined(USE_SSSE3) || defined(USE_AVX2)) && \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + + typedef unsigned int (* chacha20_blocks_t)(u32 *state, const byte *src, - byte *dst, size_t bytes); + byte *dst, + size_t bytes) ASM_FUNC_ABI; typedef struct CHACHA20_context_s { @@ -97,28 +115,32 @@ typedef struct CHACHA20_context_s #ifdef USE_SSE2 unsigned int _gcry_chacha20_amd64_sse2_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_SSE2 */ #ifdef USE_SSSE3 unsigned int _gcry_chacha20_amd64_ssse3_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_SSSE3 */ #ifdef USE_AVX2 unsigned int _gcry_chacha20_amd64_avx2_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_AVX2 */ #ifdef USE_NEON unsigned int _gcry_chacha20_armv7_neon_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_NEON */ @@ -141,7 +163,7 @@ static const char *selftest (void); #ifndef USE_SSE2 -static unsigned int +ASM_FUNC_ABI static unsigned int chacha20_blocks (u32 *state, const byte *src, byte *dst, size_t bytes) { u32 pad[CHACHA20_INPUT_LENGTH]; @@ -269,7 +291,8 @@ chacha20_blocks (u32 *state, const byte *src, byte *dst, size_t bytes) static unsigned int chacha20_core(u32 *dst, struct CHACHA20_context_s *ctx) { - return ctx->blocks(ctx->input, NULL, (byte *)dst, CHACHA20_BLOCK_SIZE); + return ctx->blocks(ctx->input, NULL, (byte *)dst, CHACHA20_BLOCK_SIZE) + + ASM_EXTRA_STACK; } From jussi.kivilinna at iki.fi Thu May 14 13:11:34 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:34 +0300 Subject: [PATCH 07/10] Enable AMD64 Poly1305 implementations on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111134.29891.48913.stgit@localhost6.localdomain6> * cipher/poly1305-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-internal.h (POLY1305_SYSV_FUNC_ABI): New. (POLY1305_USE_SSE2, POLY1305_USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (OPS_FUNC_ABI): New. (poly1305_ops_t): Use OPS_FUNC_ABI. * cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, _gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, _gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_init_ext_ref32) (poly1305_blocks_ref32, poly1305_finish_ext_ref32) (poly1305_init_ext_ref8, poly1305_blocks_ref8) (poly1305_finish_ext_ref8): Use OPS_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna --- cipher/poly1305-avx2-amd64.S | 22 +++++++++++++++------- cipher/poly1305-internal.h | 27 ++++++++++++++++++++++----- cipher/poly1305-sse2-amd64.S | 22 +++++++++++++++------- cipher/poly1305.c | 33 ++++++++++++++++++--------------- 4 files changed, 70 insertions(+), 34 deletions(-) diff --git a/cipher/poly1305-avx2-amd64.S b/cipher/poly1305-avx2-amd64.S index 0ba7e76..9362a5a 100644 --- a/cipher/poly1305-avx2-amd64.S +++ b/cipher/poly1305-avx2-amd64.S @@ -25,15 +25,23 @@ #include -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + .text .align 8 .globl _gcry_poly1305_amd64_avx2_init_ext -.type _gcry_poly1305_amd64_avx2_init_ext, at function; +ELF(.type _gcry_poly1305_amd64_avx2_init_ext, at function;) _gcry_poly1305_amd64_avx2_init_ext: .Lpoly1305_init_ext_avx2_local: xor %edx, %edx @@ -391,12 +399,12 @@ _gcry_poly1305_amd64_avx2_init_ext: popq %r13 popq %r12 ret -.size _gcry_poly1305_amd64_avx2_init_ext,.-_gcry_poly1305_amd64_avx2_init_ext; +ELF(.size _gcry_poly1305_amd64_avx2_init_ext,.-_gcry_poly1305_amd64_avx2_init_ext;) .align 8 .globl _gcry_poly1305_amd64_avx2_blocks -.type _gcry_poly1305_amd64_avx2_blocks, at function; +ELF(.type _gcry_poly1305_amd64_avx2_blocks, at function;) _gcry_poly1305_amd64_avx2_blocks: .Lpoly1305_blocks_avx2_local: vzeroupper @@ -717,12 +725,12 @@ _gcry_poly1305_amd64_avx2_blocks: leave addq $8, %rax ret -.size _gcry_poly1305_amd64_avx2_blocks,.-_gcry_poly1305_amd64_avx2_blocks; +ELF(.size _gcry_poly1305_amd64_avx2_blocks,.-_gcry_poly1305_amd64_avx2_blocks;) .align 8 .globl _gcry_poly1305_amd64_avx2_finish_ext -.type _gcry_poly1305_amd64_avx2_finish_ext, at function; +ELF(.type _gcry_poly1305_amd64_avx2_finish_ext, at function;) _gcry_poly1305_amd64_avx2_finish_ext: .Lpoly1305_finish_ext_avx2_local: vzeroupper @@ -949,6 +957,6 @@ _gcry_poly1305_amd64_avx2_finish_ext: popq %rbp addq $(8*5), %rax ret -.size _gcry_poly1305_amd64_avx2_finish_ext,.-_gcry_poly1305_amd64_avx2_finish_ext; +ELF(.size _gcry_poly1305_amd64_avx2_finish_ext,.-_gcry_poly1305_amd64_avx2_finish_ext;) #endif diff --git a/cipher/poly1305-internal.h b/cipher/poly1305-internal.h index dfc0c04..bcbe5df 100644 --- a/cipher/poly1305-internal.h +++ b/cipher/poly1305-internal.h @@ -44,24 +44,30 @@ #define POLY1305_REF_ALIGNMENT sizeof(void *) +#undef POLY1305_SYSV_FUNC_ABI + /* POLY1305_USE_SSE2 indicates whether to compile with AMD64 SSE2 code. */ #undef POLY1305_USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define POLY1305_USE_SSE2 1 # define POLY1305_SSE2_BLOCKSIZE 32 # define POLY1305_SSE2_STATESIZE 248 # define POLY1305_SSE2_ALIGNMENT 16 +# define POLY1305_SYSV_FUNC_ABI 1 #endif /* POLY1305_USE_AVX2 indicates whether to compile with AMD64 AVX2 code. */ #undef POLY1305_USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) # define POLY1305_USE_AVX2 1 # define POLY1305_AVX2_BLOCKSIZE 64 # define POLY1305_AVX2_STATESIZE 328 # define POLY1305_AVX2_ALIGNMENT 32 +# define POLY1305_SYSV_FUNC_ABI 1 #endif @@ -112,6 +118,17 @@ #endif +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef OPS_FUNC_ABI +#if defined(POLY1305_SYSV_FUNC_ABI) && \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) +# define OPS_FUNC_ABI __attribute__((sysv_abi)) +#else +# define OPS_FUNC_ABI +#endif + + typedef struct poly1305_key_s { byte b[POLY1305_KEYLEN]; @@ -121,10 +138,10 @@ typedef struct poly1305_key_s typedef struct poly1305_ops_s { size_t block_size; - void (*init_ext) (void *ctx, const poly1305_key_t * key); - unsigned int (*blocks) (void *ctx, const byte * m, size_t bytes); + void (*init_ext) (void *ctx, const poly1305_key_t * key) OPS_FUNC_ABI; + unsigned int (*blocks) (void *ctx, const byte * m, size_t bytes) OPS_FUNC_ABI; unsigned int (*finish_ext) (void *ctx, const byte * m, size_t remaining, - byte mac[POLY1305_TAGLEN]); + byte mac[POLY1305_TAGLEN]) OPS_FUNC_ABI; } poly1305_ops_t; diff --git a/cipher/poly1305-sse2-amd64.S b/cipher/poly1305-sse2-amd64.S index 106b119..219eb07 100644 --- a/cipher/poly1305-sse2-amd64.S +++ b/cipher/poly1305-sse2-amd64.S @@ -25,14 +25,22 @@ #include -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_poly1305_amd64_sse2_init_ext -.type _gcry_poly1305_amd64_sse2_init_ext, at function; +ELF(.type _gcry_poly1305_amd64_sse2_init_ext, at function;) _gcry_poly1305_amd64_sse2_init_ext: .Lpoly1305_init_ext_x86_local: xor %edx, %edx @@ -273,12 +281,12 @@ _gcry_poly1305_amd64_sse2_init_ext: popq %r13 popq %r12 ret -.size _gcry_poly1305_amd64_sse2_init_ext,.-_gcry_poly1305_amd64_sse2_init_ext; +ELF(.size _gcry_poly1305_amd64_sse2_init_ext,.-_gcry_poly1305_amd64_sse2_init_ext;) .align 8 .globl _gcry_poly1305_amd64_sse2_finish_ext -.type _gcry_poly1305_amd64_sse2_finish_ext, at function; +ELF(.type _gcry_poly1305_amd64_sse2_finish_ext, at function;) _gcry_poly1305_amd64_sse2_finish_ext: .Lpoly1305_finish_ext_x86_local: pushq %rbp @@ -424,12 +432,12 @@ _gcry_poly1305_amd64_sse2_finish_ext: popq %rbp addq $8, %rax ret -.size _gcry_poly1305_amd64_sse2_finish_ext,.-_gcry_poly1305_amd64_sse2_finish_ext; +ELF(.size _gcry_poly1305_amd64_sse2_finish_ext,.-_gcry_poly1305_amd64_sse2_finish_ext;) .align 8 .globl _gcry_poly1305_amd64_sse2_blocks -.type _gcry_poly1305_amd64_sse2_blocks, at function; +ELF(.type _gcry_poly1305_amd64_sse2_blocks, at function;) _gcry_poly1305_amd64_sse2_blocks: .Lpoly1305_blocks_x86_local: pushq %rbp @@ -1030,6 +1038,6 @@ _gcry_poly1305_amd64_sse2_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_poly1305_amd64_sse2_blocks,.-_gcry_poly1305_amd64_sse2_blocks; +ELF(.size _gcry_poly1305_amd64_sse2_blocks,.-_gcry_poly1305_amd64_sse2_blocks;) #endif diff --git a/cipher/poly1305.c b/cipher/poly1305.c index 28dbbf8..1adf0e7 100644 --- a/cipher/poly1305.c +++ b/cipher/poly1305.c @@ -40,12 +40,13 @@ static const char *selftest (void); #ifdef POLY1305_USE_SSE2 -void _gcry_poly1305_amd64_sse2_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_amd64_sse2_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_sse2_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_sse2_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_amd64_sse2_ops = { POLY1305_SSE2_BLOCKSIZE, @@ -59,12 +60,13 @@ static const poly1305_ops_t poly1305_amd64_sse2_ops = { #ifdef POLY1305_USE_AVX2 -void _gcry_poly1305_amd64_avx2_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_amd64_avx2_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_avx2_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_avx2_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_amd64_avx2_ops = { POLY1305_AVX2_BLOCKSIZE, @@ -78,12 +80,13 @@ static const poly1305_ops_t poly1305_amd64_avx2_ops = { #ifdef POLY1305_USE_NEON -void _gcry_poly1305_armv7_neon_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_armv7_neon_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_armv7_neon_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_armv7_neon_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_armv7_neon_ops = { POLY1305_NEON_BLOCKSIZE, @@ -110,7 +113,7 @@ typedef struct poly1305_state_ref32_s } poly1305_state_ref32_t; -static void +static OPS_FUNC_ABI void poly1305_init_ext_ref32 (void *state, const poly1305_key_t * key) { poly1305_state_ref32_t *st = (poly1305_state_ref32_t *) state; @@ -142,7 +145,7 @@ poly1305_init_ext_ref32 (void *state, const poly1305_key_t * key) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_blocks_ref32 (void *state, const byte * m, size_t bytes) { poly1305_state_ref32_t *st = (poly1305_state_ref32_t *) state; @@ -230,7 +233,7 @@ poly1305_blocks_ref32 (void *state, const byte * m, size_t bytes) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_finish_ext_ref32 (void *state, const byte * m, size_t remaining, byte mac[POLY1305_TAGLEN]) { @@ -370,7 +373,7 @@ typedef struct poly1305_state_ref8_t } poly1305_state_ref8_t; -static void +static OPS_FUNC_ABI void poly1305_init_ext_ref8 (void *state, const poly1305_key_t * key) { poly1305_state_ref8_t *st = (poly1305_state_ref8_t *) state; @@ -471,7 +474,7 @@ poly1305_freeze_ref8 (byte h[17]) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_blocks_ref8 (void *state, const byte * m, size_t bytes) { poly1305_state_ref8_t *st = (poly1305_state_ref8_t *) state; @@ -519,7 +522,7 @@ poly1305_blocks_ref8 (void *state, const byte * m, size_t bytes) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_finish_ext_ref8 (void *state, const byte * m, size_t remaining, byte mac[POLY1305_TAGLEN]) { From jussi.kivilinna at iki.fi Thu May 14 13:11:39 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:39 +0300 Subject: [PATCH 08/10] Enable AMD64 Salsa20 implementation on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111139.29891.88180.stgit@localhost6.localdomain6> * cipher/salsa20-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/salsa20.c (USE_AMD64): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AMD64] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup) (_gcry_salsa20_amd64_encrypt_blocks): Add ASM_FUNC_ABI. [USE_AMD64] (salsa20_core): Add ASM_EXTRA_STACK. (salsa20_do_encrypt_stream) [USE_AMD64]: Add ASM_EXTRA_STACK. -- Signed-off-by: Jussi Kivilinna --- cipher/salsa20-amd64.S | 17 ++++++++++++----- cipher/salsa20.c | 26 +++++++++++++++++++++----- 2 files changed, 33 insertions(+), 10 deletions(-) diff --git a/cipher/salsa20-amd64.S b/cipher/salsa20-amd64.S index 7046dbb..470c32a 100644 --- a/cipher/salsa20-amd64.S +++ b/cipher/salsa20-amd64.S @@ -25,13 +25,20 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SALSA20) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SALSA20) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 8 .globl _gcry_salsa20_amd64_keysetup -.type _gcry_salsa20_amd64_keysetup, at function; +ELF(.type _gcry_salsa20_amd64_keysetup, at function;) _gcry_salsa20_amd64_keysetup: movl 0(%rsi),%r8d movl 4(%rsi),%r9d @@ -83,7 +90,7 @@ _gcry_salsa20_amd64_keysetup: .align 8 .globl _gcry_salsa20_amd64_ivsetup -.type _gcry_salsa20_amd64_ivsetup, at function; +ELF(.type _gcry_salsa20_amd64_ivsetup, at function;) _gcry_salsa20_amd64_ivsetup: movl 0(%rsi),%r8d movl 4(%rsi),%esi @@ -97,7 +104,7 @@ _gcry_salsa20_amd64_ivsetup: .align 8 .globl _gcry_salsa20_amd64_encrypt_blocks -.type _gcry_salsa20_amd64_encrypt_blocks, at function; +ELF(.type _gcry_salsa20_amd64_encrypt_blocks, at function;) _gcry_salsa20_amd64_encrypt_blocks: /* * Modifications to original implementation: @@ -918,7 +925,7 @@ _gcry_salsa20_amd64_encrypt_blocks: add $64,%rdi add $64,%rsi jmp .L_bytes_are_64_128_or_192 -.size _gcry_salsa20_amd64_encrypt_blocks,.-_gcry_salsa20_amd64_encrypt_blocks; +ELF(.size _gcry_salsa20_amd64_encrypt_blocks,.-_gcry_salsa20_amd64_encrypt_blocks;) #endif /*defined(USE_SALSA20)*/ #endif /*__x86_64*/ diff --git a/cipher/salsa20.c b/cipher/salsa20.c index d75fe51..fa3d23b 100644 --- a/cipher/salsa20.c +++ b/cipher/salsa20.c @@ -43,7 +43,8 @@ /* USE_AMD64 indicates whether to compile with AMD64 code. */ #undef USE_AMD64 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64 1 #endif @@ -118,12 +119,25 @@ static const char *selftest (void); #ifdef USE_AMD64 + +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + /* AMD64 assembly implementations of Salsa20. */ -void _gcry_salsa20_amd64_keysetup(u32 *ctxinput, const void *key, int keybits); -void _gcry_salsa20_amd64_ivsetup(u32 *ctxinput, const void *iv); +void _gcry_salsa20_amd64_keysetup(u32 *ctxinput, const void *key, int keybits) + ASM_FUNC_ABI; +void _gcry_salsa20_amd64_ivsetup(u32 *ctxinput, const void *iv) + ASM_FUNC_ABI; unsigned int _gcry_salsa20_amd64_encrypt_blocks(u32 *ctxinput, const void *src, void *dst, - size_t len, int rounds); + size_t len, int rounds) ASM_FUNC_ABI; static void salsa20_keysetup(SALSA20_context_t *ctx, const byte *key, int keylen) @@ -141,7 +155,8 @@ static unsigned int salsa20_core (u32 *dst, SALSA20_context_t *ctx, unsigned int rounds) { memset(dst, 0, SALSA20_BLOCK_SIZE); - return _gcry_salsa20_amd64_encrypt_blocks(ctx->input, dst, dst, 1, rounds); + return _gcry_salsa20_amd64_encrypt_blocks(ctx->input, dst, dst, 1, rounds) + + ASM_EXTRA_STACK; } #else /* USE_AMD64 */ @@ -418,6 +433,7 @@ salsa20_do_encrypt_stream (SALSA20_context_t *ctx, size_t nblocks = length / SALSA20_BLOCK_SIZE; burn = _gcry_salsa20_amd64_encrypt_blocks(ctx->input, inbuf, outbuf, nblocks, rounds); + burn += ASM_EXTRA_STACK; length -= SALSA20_BLOCK_SIZE * nblocks; outbuf += SALSA20_BLOCK_SIZE * nblocks; inbuf += SALSA20_BLOCK_SIZE * nblocks; From jussi.kivilinna at iki.fi Thu May 14 13:11:44 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:44 +0300 Subject: [PATCH 09/10] Enable AMD64 Serpent implementations on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111144.29891.33565.stgit@localhost6.localdomain6> * cipher/serpent-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/serpent-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20.c (USE_SSE2, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSE2 || USE_AVX2] (ASM_FUNC_ABI): New. (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec) (_gcry_serpent_sse2_cfb_dec, _gcry_serpent_avx2_ctr_enc) (_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Add ASM_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna --- cipher/serpent-avx2-amd64.S | 29 ++++++++++++++++++----------- cipher/serpent-sse2-amd64.S | 29 ++++++++++++++++++----------- cipher/serpent.c | 30 ++++++++++++++++++++++-------- 3 files changed, 58 insertions(+), 30 deletions(-) diff --git a/cipher/serpent-avx2-amd64.S b/cipher/serpent-avx2-amd64.S index 03d29ae..3f59f06 100644 --- a/cipher/serpent-avx2-amd64.S +++ b/cipher/serpent-avx2-amd64.S @@ -20,9 +20,16 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SERPENT) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SERPENT) && \ defined(ENABLE_AVX2_SUPPORT) +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #ifdef __PIC__ # define RIP (%rip) #else @@ -404,7 +411,7 @@ .text .align 8 -.type __serpent_enc_blk16, at function; +ELF(.type __serpent_enc_blk16, at function;) __serpent_enc_blk16: /* input: * %rdi: ctx, CTX @@ -489,10 +496,10 @@ __serpent_enc_blk16: transpose_4x4(RB4, RB1, RB2, RB0, RB3, RTMP0, RTMP1); ret; -.size __serpent_enc_blk16,.-__serpent_enc_blk16; +ELF(.size __serpent_enc_blk16,.-__serpent_enc_blk16;) .align 8 -.type __serpent_dec_blk16, at function; +ELF(.type __serpent_dec_blk16, at function;) __serpent_dec_blk16: /* input: * %rdi: ctx, CTX @@ -579,7 +586,7 @@ __serpent_dec_blk16: transpose_4x4(RB0, RB1, RB2, RB3, RB4, RTMP0, RTMP1); ret; -.size __serpent_dec_blk16,.-__serpent_dec_blk16; +ELF(.size __serpent_dec_blk16,.-__serpent_dec_blk16;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -589,7 +596,7 @@ __serpent_dec_blk16: .align 8 .globl _gcry_serpent_avx2_ctr_enc -.type _gcry_serpent_avx2_ctr_enc, at function; +ELF(.type _gcry_serpent_avx2_ctr_enc, at function;) _gcry_serpent_avx2_ctr_enc: /* input: * %rdi: ctx, CTX @@ -695,11 +702,11 @@ _gcry_serpent_avx2_ctr_enc: vzeroall; ret -.size _gcry_serpent_avx2_ctr_enc,.-_gcry_serpent_avx2_ctr_enc; +ELF(.size _gcry_serpent_avx2_ctr_enc,.-_gcry_serpent_avx2_ctr_enc;) .align 8 .globl _gcry_serpent_avx2_cbc_dec -.type _gcry_serpent_avx2_cbc_dec, at function; +ELF(.type _gcry_serpent_avx2_cbc_dec, at function;) _gcry_serpent_avx2_cbc_dec: /* input: * %rdi: ctx, CTX @@ -746,11 +753,11 @@ _gcry_serpent_avx2_cbc_dec: vzeroall; ret -.size _gcry_serpent_avx2_cbc_dec,.-_gcry_serpent_avx2_cbc_dec; +ELF(.size _gcry_serpent_avx2_cbc_dec,.-_gcry_serpent_avx2_cbc_dec;) .align 8 .globl _gcry_serpent_avx2_cfb_dec -.type _gcry_serpent_avx2_cfb_dec, at function; +ELF(.type _gcry_serpent_avx2_cfb_dec, at function;) _gcry_serpent_avx2_cfb_dec: /* input: * %rdi: ctx, CTX @@ -799,7 +806,7 @@ _gcry_serpent_avx2_cfb_dec: vzeroall; ret -.size _gcry_serpent_avx2_cfb_dec,.-_gcry_serpent_avx2_cfb_dec; +ELF(.size _gcry_serpent_avx2_cfb_dec,.-_gcry_serpent_avx2_cfb_dec;) .data .align 16 diff --git a/cipher/serpent-sse2-amd64.S b/cipher/serpent-sse2-amd64.S index 395f660..adbf4e2 100644 --- a/cipher/serpent-sse2-amd64.S +++ b/cipher/serpent-sse2-amd64.S @@ -20,7 +20,14 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SERPENT) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SERPENT) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif #ifdef __PIC__ # define RIP (%rip) @@ -427,7 +434,7 @@ .text .align 8 -.type __serpent_enc_blk8, at function; +ELF(.type __serpent_enc_blk8, at function;) __serpent_enc_blk8: /* input: * %rdi: ctx, CTX @@ -512,10 +519,10 @@ __serpent_enc_blk8: transpose_4x4(RB4, RB1, RB2, RB0, RB3, RTMP0, RTMP1); ret; -.size __serpent_enc_blk8,.-__serpent_enc_blk8; +ELF(.size __serpent_enc_blk8,.-__serpent_enc_blk8;) .align 8 -.type __serpent_dec_blk8, at function; +ELF(.type __serpent_dec_blk8, at function;) __serpent_dec_blk8: /* input: * %rdi: ctx, CTX @@ -602,11 +609,11 @@ __serpent_dec_blk8: transpose_4x4(RB0, RB1, RB2, RB3, RB4, RTMP0, RTMP1); ret; -.size __serpent_dec_blk8,.-__serpent_dec_blk8; +ELF(.size __serpent_dec_blk8,.-__serpent_dec_blk8;) .align 8 .globl _gcry_serpent_sse2_ctr_enc -.type _gcry_serpent_sse2_ctr_enc, at function; +ELF(.type _gcry_serpent_sse2_ctr_enc, at function;) _gcry_serpent_sse2_ctr_enc: /* input: * %rdi: ctx, CTX @@ -732,11 +739,11 @@ _gcry_serpent_sse2_ctr_enc: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_ctr_enc,.-_gcry_serpent_sse2_ctr_enc; +ELF(.size _gcry_serpent_sse2_ctr_enc,.-_gcry_serpent_sse2_ctr_enc;) .align 8 .globl _gcry_serpent_sse2_cbc_dec -.type _gcry_serpent_sse2_cbc_dec, at function; +ELF(.type _gcry_serpent_sse2_cbc_dec, at function;) _gcry_serpent_sse2_cbc_dec: /* input: * %rdi: ctx, CTX @@ -793,11 +800,11 @@ _gcry_serpent_sse2_cbc_dec: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_cbc_dec,.-_gcry_serpent_sse2_cbc_dec; +ELF(.size _gcry_serpent_sse2_cbc_dec,.-_gcry_serpent_sse2_cbc_dec;) .align 8 .globl _gcry_serpent_sse2_cfb_dec -.type _gcry_serpent_sse2_cfb_dec, at function; +ELF(.type _gcry_serpent_sse2_cfb_dec, at function;) _gcry_serpent_sse2_cfb_dec: /* input: * %rdi: ctx, CTX @@ -857,7 +864,7 @@ _gcry_serpent_sse2_cfb_dec: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_cfb_dec,.-_gcry_serpent_sse2_cfb_dec; +ELF(.size _gcry_serpent_sse2_cfb_dec,.-_gcry_serpent_sse2_cfb_dec;) #endif /*defined(USE_SERPENT)*/ #endif /*__x86_64*/ diff --git a/cipher/serpent.c b/cipher/serpent.c index 0be49da..7d0e112 100644 --- a/cipher/serpent.c +++ b/cipher/serpent.c @@ -34,13 +34,15 @@ /* USE_SSE2 indicates whether to compile with AMD64 SSE2 code. */ #undef USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSE2 1 #endif /* USE_AVX2 indicates whether to compile with AMD64 AVX2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # if defined(ENABLE_AVX2_SUPPORT) # define USE_AVX2 1 # endif @@ -86,6 +88,18 @@ typedef struct serpent_context } serpent_context_t; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#if defined(USE_SSE2) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# else +# define ASM_FUNC_ABI +# endif +#endif + + #ifdef USE_SSE2 /* Assembler implementations of Serpent using SSE2. Process 8 block in parallel. @@ -93,17 +107,17 @@ typedef struct serpent_context extern void _gcry_serpent_sse2_ctr_enc(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_cbc_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_cfb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 @@ -113,17 +127,17 @@ extern void _gcry_serpent_sse2_cfb_dec(serpent_context_t *ctx, extern void _gcry_serpent_avx2_ctr_enc(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_cbc_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_cfb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif #ifdef USE_NEON From jussi.kivilinna at iki.fi Thu May 14 13:11:49 2015 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 14 May 2015 14:11:49 +0300 Subject: [PATCH 10/10] Enable AMD64 Twofish implementation on WIN64 In-Reply-To: <20150514111103.29891.37464.stgit@localhost6.localdomain6> References: <20150514111103.29891.37464.stgit@localhost6.localdomain6> Message-ID: <20150514111149.29891.85792.stgit@localhost6.localdomain6> * cipher/twofish-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/twofish.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (twofish_amd64_encrypt_block, twofish_amd64_decrypt_block) (twofish_amd64_ctr_enc, twofish_amd64_cbc_dec) (twofish_amd64_cfb_dec): New wrapper functions for AMD64 assembly functions. -- Signed-off-by: Jussi Kivilinna --- cipher/twofish-amd64.S | 37 +++++++++++++-------- cipher/twofish.c | 84 +++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 100 insertions(+), 21 deletions(-) diff --git a/cipher/twofish-amd64.S b/cipher/twofish-amd64.S index a225307..ea88b94 100644 --- a/cipher/twofish-amd64.S +++ b/cipher/twofish-amd64.S @@ -20,7 +20,14 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_TWOFISH) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_TWOFISH) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif #ifdef __PIC__ # define RIP %rip @@ -166,7 +173,7 @@ .align 8 .globl _gcry_twofish_amd64_encrypt_block -.type _gcry_twofish_amd64_encrypt_block, at function; +ELF(.type _gcry_twofish_amd64_encrypt_block, at function;) _gcry_twofish_amd64_encrypt_block: /* input: @@ -205,11 +212,11 @@ _gcry_twofish_amd64_encrypt_block: addq $(3 * 8), %rsp; ret; -.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block; +ELF(.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block;) .align 8 .globl _gcry_twofish_amd64_decrypt_block -.type _gcry_twofish_amd64_decrypt_block, at function; +ELF(.type _gcry_twofish_amd64_decrypt_block, at function;) _gcry_twofish_amd64_decrypt_block: /* input: @@ -248,7 +255,7 @@ _gcry_twofish_amd64_decrypt_block: addq $(3 * 8), %rsp; ret; -.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block; +ELF(.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block;) #undef CTX @@ -462,7 +469,7 @@ _gcry_twofish_amd64_decrypt_block: outunpack3(RAB, 2); .align 8 -.type __twofish_enc_blk3, at function; +ELF(.type __twofish_enc_blk3, at function;) __twofish_enc_blk3: /* input: @@ -485,10 +492,10 @@ __twofish_enc_blk3: outunpack_enc3(); ret; -.size __twofish_enc_blk3,.-__twofish_enc_blk3; +ELF(.size __twofish_enc_blk3,.-__twofish_enc_blk3;) .align 8 -.type __twofish_dec_blk3, at function; +ELF(.type __twofish_dec_blk3, at function;) __twofish_dec_blk3: /* input: @@ -511,11 +518,11 @@ __twofish_dec_blk3: outunpack_dec3(); ret; -.size __twofish_dec_blk3,.-__twofish_dec_blk3; +ELF(.size __twofish_dec_blk3,.-__twofish_dec_blk3;) .align 8 .globl _gcry_twofish_amd64_ctr_enc -.type _gcry_twofish_amd64_ctr_enc, at function; +ELF(.type _gcry_twofish_amd64_ctr_enc, at function;) _gcry_twofish_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -593,11 +600,11 @@ _gcry_twofish_amd64_ctr_enc: addq $(8 * 8), %rsp; ret; -.size _gcry_twofish_amd64_ctr_enc,.-_gcry_twofish_amd64_ctr_enc; +ELF(.size _gcry_twofish_amd64_ctr_enc,.-_gcry_twofish_amd64_ctr_enc;) .align 8 .globl _gcry_twofish_amd64_cbc_dec -.type _gcry_twofish_amd64_cbc_dec, at function; +ELF(.type _gcry_twofish_amd64_cbc_dec, at function;) _gcry_twofish_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -659,11 +666,11 @@ _gcry_twofish_amd64_cbc_dec: addq $(9 * 8), %rsp; ret; -.size _gcry_twofish_amd64_cbc_dec,.-_gcry_twofish_amd64_cbc_dec; +ELF(.size _gcry_twofish_amd64_cbc_dec,.-_gcry_twofish_amd64_cbc_dec;) .align 8 .globl _gcry_twofish_amd64_cfb_dec -.type _gcry_twofish_amd64_cfb_dec, at function; +ELF(.type _gcry_twofish_amd64_cfb_dec, at function;) _gcry_twofish_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -725,7 +732,7 @@ _gcry_twofish_amd64_cfb_dec: addq $(8 * 8), %rsp; ret; -.size _gcry_twofish_amd64_cfb_dec,.-_gcry_twofish_amd64_cfb_dec; +ELF(.size _gcry_twofish_amd64_cfb_dec,.-_gcry_twofish_amd64_cfb_dec;) #endif /*USE_TWOFISH*/ #endif /*__x86_64*/ diff --git a/cipher/twofish.c b/cipher/twofish.c index ecd76e3..ce83fad 100644 --- a/cipher/twofish.c +++ b/cipher/twofish.c @@ -53,7 +53,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -754,6 +755,77 @@ extern void _gcry_twofish_amd64_cbc_dec(const TWOFISH_context *c, byte *out, extern void _gcry_twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + +static inline void +twofish_amd64_encrypt_block(const TWOFISH_context *c, byte *out, const byte *in) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_encrypt_block, c, out, in, NULL); +#else + _gcry_twofish_amd64_encrypt_block(c, out, in); +#endif +} + +static inline void +twofish_amd64_decrypt_block(const TWOFISH_context *c, byte *out, const byte *in) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_decrypt_block, c, out, in, NULL); +#else + _gcry_twofish_amd64_decrypt_block(c, out, in); +#endif +} + +static inline void +twofish_amd64_ctr_enc(const TWOFISH_context *c, byte *out, const byte *in, + byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_ctr_enc, c, out, in, ctr); +#else + _gcry_twofish_amd64_ctr_enc(c, out, in, ctr); +#endif +} + +static inline void +twofish_amd64_cbc_dec(const TWOFISH_context *c, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_cbc_dec, c, out, in, iv); +#else + _gcry_twofish_amd64_cbc_dec(c, out, in, iv); +#endif +} + +static inline void +twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_cfb_dec, c, out, in, iv); +#else + _gcry_twofish_amd64_cfb_dec(c, out, in, iv); +#endif +} + #elif defined(USE_ARM_ASM) /* Assembly implementations of Twofish. */ @@ -833,7 +905,7 @@ static unsigned int twofish_encrypt (void *context, byte *out, const byte *in) { TWOFISH_context *ctx = context; - _gcry_twofish_amd64_encrypt_block(ctx, out, in); + twofish_amd64_encrypt_block(ctx, out, in); return /*burn_stack*/ (4*sizeof (void*)); } @@ -900,7 +972,7 @@ static unsigned int twofish_decrypt (void *context, byte *out, const byte *in) { TWOFISH_context *ctx = context; - _gcry_twofish_amd64_decrypt_block(ctx, out, in); + twofish_amd64_decrypt_block(ctx, out, in); return /*burn_stack*/ (4*sizeof (void*)); } @@ -980,7 +1052,7 @@ _gcry_twofish_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + twofish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; @@ -1038,7 +1110,7 @@ _gcry_twofish_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + twofish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; @@ -1087,7 +1159,7 @@ _gcry_twofish_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + twofish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; From cvs at cvs.gnupg.org Sun May 17 15:17:44 2015 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sun, 17 May 2015 15:17:44 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-226-g9b0c6c8 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 9b0c6c8141ae9bd056392a3f6b5704b505fc8501 (commit) via eb0ed576893b6c7990dbcb568510f831d246cea6 (commit) via 12bc93ca8187b8061c2e705427ef22f5a71d29b0 (commit) via 8d7de4dbf7732c6eb9e9853ad7c19c89075ace6f (commit) via b65e9e71d5ee992db5c96793c6af999545daad28 (commit) via 9597cfddf03c467825da152be5ca0d12a8c30d88 (commit) via 6a6646df80386204675d8b149ab60e74d7ca124c (commit) via 9a4fb3709864bf3e3918800d44ff576590cd4e92 (commit) via e05682093ffb003b589a697428d918d755ac631d (commit) via c46b015bedba7ce0db68929bd33a86a54ab3d919 (commit) via ee8fc4edcb3466b03246c8720b90731bf274ff1d (commit) from bac42c68b069f17abcca810a21439c7233815747 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 9b0c6c8141ae9bd056392a3f6b5704b505fc8501 Author: Jussi Kivilinna Date: Thu May 14 13:07:34 2015 +0300 Enable AMD64 Twofish implementation on WIN64 * cipher/twofish-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/twofish.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (twofish_amd64_encrypt_block, twofish_amd64_decrypt_block) (twofish_amd64_ctr_enc, twofish_amd64_cbc_dec) (twofish_amd64_cfb_dec): New wrapper functions for AMD64 assembly functions. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/twofish-amd64.S b/cipher/twofish-amd64.S index a225307..ea88b94 100644 --- a/cipher/twofish-amd64.S +++ b/cipher/twofish-amd64.S @@ -20,7 +20,14 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_TWOFISH) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_TWOFISH) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif #ifdef __PIC__ # define RIP %rip @@ -166,7 +173,7 @@ .align 8 .globl _gcry_twofish_amd64_encrypt_block -.type _gcry_twofish_amd64_encrypt_block, at function; +ELF(.type _gcry_twofish_amd64_encrypt_block, at function;) _gcry_twofish_amd64_encrypt_block: /* input: @@ -205,11 +212,11 @@ _gcry_twofish_amd64_encrypt_block: addq $(3 * 8), %rsp; ret; -.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block; +ELF(.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block;) .align 8 .globl _gcry_twofish_amd64_decrypt_block -.type _gcry_twofish_amd64_decrypt_block, at function; +ELF(.type _gcry_twofish_amd64_decrypt_block, at function;) _gcry_twofish_amd64_decrypt_block: /* input: @@ -248,7 +255,7 @@ _gcry_twofish_amd64_decrypt_block: addq $(3 * 8), %rsp; ret; -.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block; +ELF(.size _gcry_twofish_amd64_encrypt_block,.-_gcry_twofish_amd64_encrypt_block;) #undef CTX @@ -462,7 +469,7 @@ _gcry_twofish_amd64_decrypt_block: outunpack3(RAB, 2); .align 8 -.type __twofish_enc_blk3, at function; +ELF(.type __twofish_enc_blk3, at function;) __twofish_enc_blk3: /* input: @@ -485,10 +492,10 @@ __twofish_enc_blk3: outunpack_enc3(); ret; -.size __twofish_enc_blk3,.-__twofish_enc_blk3; +ELF(.size __twofish_enc_blk3,.-__twofish_enc_blk3;) .align 8 -.type __twofish_dec_blk3, at function; +ELF(.type __twofish_dec_blk3, at function;) __twofish_dec_blk3: /* input: @@ -511,11 +518,11 @@ __twofish_dec_blk3: outunpack_dec3(); ret; -.size __twofish_dec_blk3,.-__twofish_dec_blk3; +ELF(.size __twofish_dec_blk3,.-__twofish_dec_blk3;) .align 8 .globl _gcry_twofish_amd64_ctr_enc -.type _gcry_twofish_amd64_ctr_enc, at function; +ELF(.type _gcry_twofish_amd64_ctr_enc, at function;) _gcry_twofish_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -593,11 +600,11 @@ _gcry_twofish_amd64_ctr_enc: addq $(8 * 8), %rsp; ret; -.size _gcry_twofish_amd64_ctr_enc,.-_gcry_twofish_amd64_ctr_enc; +ELF(.size _gcry_twofish_amd64_ctr_enc,.-_gcry_twofish_amd64_ctr_enc;) .align 8 .globl _gcry_twofish_amd64_cbc_dec -.type _gcry_twofish_amd64_cbc_dec, at function; +ELF(.type _gcry_twofish_amd64_cbc_dec, at function;) _gcry_twofish_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -659,11 +666,11 @@ _gcry_twofish_amd64_cbc_dec: addq $(9 * 8), %rsp; ret; -.size _gcry_twofish_amd64_cbc_dec,.-_gcry_twofish_amd64_cbc_dec; +ELF(.size _gcry_twofish_amd64_cbc_dec,.-_gcry_twofish_amd64_cbc_dec;) .align 8 .globl _gcry_twofish_amd64_cfb_dec -.type _gcry_twofish_amd64_cfb_dec, at function; +ELF(.type _gcry_twofish_amd64_cfb_dec, at function;) _gcry_twofish_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -725,7 +732,7 @@ _gcry_twofish_amd64_cfb_dec: addq $(8 * 8), %rsp; ret; -.size _gcry_twofish_amd64_cfb_dec,.-_gcry_twofish_amd64_cfb_dec; +ELF(.size _gcry_twofish_amd64_cfb_dec,.-_gcry_twofish_amd64_cfb_dec;) #endif /*USE_TWOFISH*/ #endif /*__x86_64*/ diff --git a/cipher/twofish.c b/cipher/twofish.c index ecd76e3..ce83fad 100644 --- a/cipher/twofish.c +++ b/cipher/twofish.c @@ -53,7 +53,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -754,6 +755,77 @@ extern void _gcry_twofish_amd64_cbc_dec(const TWOFISH_context *c, byte *out, extern void _gcry_twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + +static inline void +twofish_amd64_encrypt_block(const TWOFISH_context *c, byte *out, const byte *in) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_encrypt_block, c, out, in, NULL); +#else + _gcry_twofish_amd64_encrypt_block(c, out, in); +#endif +} + +static inline void +twofish_amd64_decrypt_block(const TWOFISH_context *c, byte *out, const byte *in) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_decrypt_block, c, out, in, NULL); +#else + _gcry_twofish_amd64_decrypt_block(c, out, in); +#endif +} + +static inline void +twofish_amd64_ctr_enc(const TWOFISH_context *c, byte *out, const byte *in, + byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_ctr_enc, c, out, in, ctr); +#else + _gcry_twofish_amd64_ctr_enc(c, out, in, ctr); +#endif +} + +static inline void +twofish_amd64_cbc_dec(const TWOFISH_context *c, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_cbc_dec, c, out, in, iv); +#else + _gcry_twofish_amd64_cbc_dec(c, out, in, iv); +#endif +} + +static inline void +twofish_amd64_cfb_dec(const TWOFISH_context *c, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn(_gcry_twofish_amd64_cfb_dec, c, out, in, iv); +#else + _gcry_twofish_amd64_cfb_dec(c, out, in, iv); +#endif +} + #elif defined(USE_ARM_ASM) /* Assembly implementations of Twofish. */ @@ -833,7 +905,7 @@ static unsigned int twofish_encrypt (void *context, byte *out, const byte *in) { TWOFISH_context *ctx = context; - _gcry_twofish_amd64_encrypt_block(ctx, out, in); + twofish_amd64_encrypt_block(ctx, out, in); return /*burn_stack*/ (4*sizeof (void*)); } @@ -900,7 +972,7 @@ static unsigned int twofish_decrypt (void *context, byte *out, const byte *in) { TWOFISH_context *ctx = context; - _gcry_twofish_amd64_decrypt_block(ctx, out, in); + twofish_amd64_decrypt_block(ctx, out, in); return /*burn_stack*/ (4*sizeof (void*)); } @@ -980,7 +1052,7 @@ _gcry_twofish_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + twofish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; @@ -1038,7 +1110,7 @@ _gcry_twofish_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + twofish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; @@ -1087,7 +1159,7 @@ _gcry_twofish_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_twofish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + twofish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * TWOFISH_BLOCKSIZE; commit eb0ed576893b6c7990dbcb568510f831d246cea6 Author: Jussi Kivilinna Date: Thu May 14 13:07:48 2015 +0300 Enable AMD64 Serpent implementations on WIN64 * cipher/serpent-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/serpent-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20.c (USE_SSE2, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSE2 || USE_AVX2] (ASM_FUNC_ABI): New. (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec) (_gcry_serpent_sse2_cfb_dec, _gcry_serpent_avx2_ctr_enc) (_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Add ASM_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/serpent-avx2-amd64.S b/cipher/serpent-avx2-amd64.S index 03d29ae..3f59f06 100644 --- a/cipher/serpent-avx2-amd64.S +++ b/cipher/serpent-avx2-amd64.S @@ -20,9 +20,16 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SERPENT) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SERPENT) && \ defined(ENABLE_AVX2_SUPPORT) +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #ifdef __PIC__ # define RIP (%rip) #else @@ -404,7 +411,7 @@ .text .align 8 -.type __serpent_enc_blk16, at function; +ELF(.type __serpent_enc_blk16, at function;) __serpent_enc_blk16: /* input: * %rdi: ctx, CTX @@ -489,10 +496,10 @@ __serpent_enc_blk16: transpose_4x4(RB4, RB1, RB2, RB0, RB3, RTMP0, RTMP1); ret; -.size __serpent_enc_blk16,.-__serpent_enc_blk16; +ELF(.size __serpent_enc_blk16,.-__serpent_enc_blk16;) .align 8 -.type __serpent_dec_blk16, at function; +ELF(.type __serpent_dec_blk16, at function;) __serpent_dec_blk16: /* input: * %rdi: ctx, CTX @@ -579,7 +586,7 @@ __serpent_dec_blk16: transpose_4x4(RB0, RB1, RB2, RB3, RB4, RTMP0, RTMP1); ret; -.size __serpent_dec_blk16,.-__serpent_dec_blk16; +ELF(.size __serpent_dec_blk16,.-__serpent_dec_blk16;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -589,7 +596,7 @@ __serpent_dec_blk16: .align 8 .globl _gcry_serpent_avx2_ctr_enc -.type _gcry_serpent_avx2_ctr_enc, at function; +ELF(.type _gcry_serpent_avx2_ctr_enc, at function;) _gcry_serpent_avx2_ctr_enc: /* input: * %rdi: ctx, CTX @@ -695,11 +702,11 @@ _gcry_serpent_avx2_ctr_enc: vzeroall; ret -.size _gcry_serpent_avx2_ctr_enc,.-_gcry_serpent_avx2_ctr_enc; +ELF(.size _gcry_serpent_avx2_ctr_enc,.-_gcry_serpent_avx2_ctr_enc;) .align 8 .globl _gcry_serpent_avx2_cbc_dec -.type _gcry_serpent_avx2_cbc_dec, at function; +ELF(.type _gcry_serpent_avx2_cbc_dec, at function;) _gcry_serpent_avx2_cbc_dec: /* input: * %rdi: ctx, CTX @@ -746,11 +753,11 @@ _gcry_serpent_avx2_cbc_dec: vzeroall; ret -.size _gcry_serpent_avx2_cbc_dec,.-_gcry_serpent_avx2_cbc_dec; +ELF(.size _gcry_serpent_avx2_cbc_dec,.-_gcry_serpent_avx2_cbc_dec;) .align 8 .globl _gcry_serpent_avx2_cfb_dec -.type _gcry_serpent_avx2_cfb_dec, at function; +ELF(.type _gcry_serpent_avx2_cfb_dec, at function;) _gcry_serpent_avx2_cfb_dec: /* input: * %rdi: ctx, CTX @@ -799,7 +806,7 @@ _gcry_serpent_avx2_cfb_dec: vzeroall; ret -.size _gcry_serpent_avx2_cfb_dec,.-_gcry_serpent_avx2_cfb_dec; +ELF(.size _gcry_serpent_avx2_cfb_dec,.-_gcry_serpent_avx2_cfb_dec;) .data .align 16 diff --git a/cipher/serpent-sse2-amd64.S b/cipher/serpent-sse2-amd64.S index 395f660..adbf4e2 100644 --- a/cipher/serpent-sse2-amd64.S +++ b/cipher/serpent-sse2-amd64.S @@ -20,7 +20,14 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SERPENT) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SERPENT) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif #ifdef __PIC__ # define RIP (%rip) @@ -427,7 +434,7 @@ .text .align 8 -.type __serpent_enc_blk8, at function; +ELF(.type __serpent_enc_blk8, at function;) __serpent_enc_blk8: /* input: * %rdi: ctx, CTX @@ -512,10 +519,10 @@ __serpent_enc_blk8: transpose_4x4(RB4, RB1, RB2, RB0, RB3, RTMP0, RTMP1); ret; -.size __serpent_enc_blk8,.-__serpent_enc_blk8; +ELF(.size __serpent_enc_blk8,.-__serpent_enc_blk8;) .align 8 -.type __serpent_dec_blk8, at function; +ELF(.type __serpent_dec_blk8, at function;) __serpent_dec_blk8: /* input: * %rdi: ctx, CTX @@ -602,11 +609,11 @@ __serpent_dec_blk8: transpose_4x4(RB0, RB1, RB2, RB3, RB4, RTMP0, RTMP1); ret; -.size __serpent_dec_blk8,.-__serpent_dec_blk8; +ELF(.size __serpent_dec_blk8,.-__serpent_dec_blk8;) .align 8 .globl _gcry_serpent_sse2_ctr_enc -.type _gcry_serpent_sse2_ctr_enc, at function; +ELF(.type _gcry_serpent_sse2_ctr_enc, at function;) _gcry_serpent_sse2_ctr_enc: /* input: * %rdi: ctx, CTX @@ -732,11 +739,11 @@ _gcry_serpent_sse2_ctr_enc: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_ctr_enc,.-_gcry_serpent_sse2_ctr_enc; +ELF(.size _gcry_serpent_sse2_ctr_enc,.-_gcry_serpent_sse2_ctr_enc;) .align 8 .globl _gcry_serpent_sse2_cbc_dec -.type _gcry_serpent_sse2_cbc_dec, at function; +ELF(.type _gcry_serpent_sse2_cbc_dec, at function;) _gcry_serpent_sse2_cbc_dec: /* input: * %rdi: ctx, CTX @@ -793,11 +800,11 @@ _gcry_serpent_sse2_cbc_dec: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_cbc_dec,.-_gcry_serpent_sse2_cbc_dec; +ELF(.size _gcry_serpent_sse2_cbc_dec,.-_gcry_serpent_sse2_cbc_dec;) .align 8 .globl _gcry_serpent_sse2_cfb_dec -.type _gcry_serpent_sse2_cfb_dec, at function; +ELF(.type _gcry_serpent_sse2_cfb_dec, at function;) _gcry_serpent_sse2_cfb_dec: /* input: * %rdi: ctx, CTX @@ -857,7 +864,7 @@ _gcry_serpent_sse2_cfb_dec: pxor RNOT, RNOT; ret -.size _gcry_serpent_sse2_cfb_dec,.-_gcry_serpent_sse2_cfb_dec; +ELF(.size _gcry_serpent_sse2_cfb_dec,.-_gcry_serpent_sse2_cfb_dec;) #endif /*defined(USE_SERPENT)*/ #endif /*__x86_64*/ diff --git a/cipher/serpent.c b/cipher/serpent.c index 0be49da..7d0e112 100644 --- a/cipher/serpent.c +++ b/cipher/serpent.c @@ -34,13 +34,15 @@ /* USE_SSE2 indicates whether to compile with AMD64 SSE2 code. */ #undef USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSE2 1 #endif /* USE_AVX2 indicates whether to compile with AMD64 AVX2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # if defined(ENABLE_AVX2_SUPPORT) # define USE_AVX2 1 # endif @@ -86,6 +88,18 @@ typedef struct serpent_context } serpent_context_t; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#if defined(USE_SSE2) || defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# else +# define ASM_FUNC_ABI +# endif +#endif + + #ifdef USE_SSE2 /* Assembler implementations of Serpent using SSE2. Process 8 block in parallel. @@ -93,17 +107,17 @@ typedef struct serpent_context extern void _gcry_serpent_sse2_ctr_enc(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_cbc_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_serpent_sse2_cfb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif #ifdef USE_AVX2 @@ -113,17 +127,17 @@ extern void _gcry_serpent_sse2_cfb_dec(serpent_context_t *ctx, extern void _gcry_serpent_avx2_ctr_enc(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_cbc_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_serpent_avx2_cfb_dec(serpent_context_t *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif #ifdef USE_NEON commit 12bc93ca8187b8061c2e705427ef22f5a71d29b0 Author: Jussi Kivilinna Date: Thu May 14 12:37:21 2015 +0300 Enable AMD64 Salsa20 implementation on WIN64 * cipher/salsa20-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/salsa20.c (USE_AMD64): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AMD64] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup) (_gcry_salsa20_amd64_encrypt_blocks): Add ASM_FUNC_ABI. [USE_AMD64] (salsa20_core): Add ASM_EXTRA_STACK. (salsa20_do_encrypt_stream) [USE_AMD64]: Add ASM_EXTRA_STACK. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/salsa20-amd64.S b/cipher/salsa20-amd64.S index 7046dbb..470c32a 100644 --- a/cipher/salsa20-amd64.S +++ b/cipher/salsa20-amd64.S @@ -25,13 +25,20 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_SALSA20) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_SALSA20) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 8 .globl _gcry_salsa20_amd64_keysetup -.type _gcry_salsa20_amd64_keysetup, at function; +ELF(.type _gcry_salsa20_amd64_keysetup, at function;) _gcry_salsa20_amd64_keysetup: movl 0(%rsi),%r8d movl 4(%rsi),%r9d @@ -83,7 +90,7 @@ _gcry_salsa20_amd64_keysetup: .align 8 .globl _gcry_salsa20_amd64_ivsetup -.type _gcry_salsa20_amd64_ivsetup, at function; +ELF(.type _gcry_salsa20_amd64_ivsetup, at function;) _gcry_salsa20_amd64_ivsetup: movl 0(%rsi),%r8d movl 4(%rsi),%esi @@ -97,7 +104,7 @@ _gcry_salsa20_amd64_ivsetup: .align 8 .globl _gcry_salsa20_amd64_encrypt_blocks -.type _gcry_salsa20_amd64_encrypt_blocks, at function; +ELF(.type _gcry_salsa20_amd64_encrypt_blocks, at function;) _gcry_salsa20_amd64_encrypt_blocks: /* * Modifications to original implementation: @@ -918,7 +925,7 @@ _gcry_salsa20_amd64_encrypt_blocks: add $64,%rdi add $64,%rsi jmp .L_bytes_are_64_128_or_192 -.size _gcry_salsa20_amd64_encrypt_blocks,.-_gcry_salsa20_amd64_encrypt_blocks; +ELF(.size _gcry_salsa20_amd64_encrypt_blocks,.-_gcry_salsa20_amd64_encrypt_blocks;) #endif /*defined(USE_SALSA20)*/ #endif /*__x86_64*/ diff --git a/cipher/salsa20.c b/cipher/salsa20.c index d75fe51..fa3d23b 100644 --- a/cipher/salsa20.c +++ b/cipher/salsa20.c @@ -43,7 +43,8 @@ /* USE_AMD64 indicates whether to compile with AMD64 code. */ #undef USE_AMD64 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64 1 #endif @@ -118,12 +119,25 @@ static const char *selftest (void); #ifdef USE_AMD64 + +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + /* AMD64 assembly implementations of Salsa20. */ -void _gcry_salsa20_amd64_keysetup(u32 *ctxinput, const void *key, int keybits); -void _gcry_salsa20_amd64_ivsetup(u32 *ctxinput, const void *iv); +void _gcry_salsa20_amd64_keysetup(u32 *ctxinput, const void *key, int keybits) + ASM_FUNC_ABI; +void _gcry_salsa20_amd64_ivsetup(u32 *ctxinput, const void *iv) + ASM_FUNC_ABI; unsigned int _gcry_salsa20_amd64_encrypt_blocks(u32 *ctxinput, const void *src, void *dst, - size_t len, int rounds); + size_t len, int rounds) ASM_FUNC_ABI; static void salsa20_keysetup(SALSA20_context_t *ctx, const byte *key, int keylen) @@ -141,7 +155,8 @@ static unsigned int salsa20_core (u32 *dst, SALSA20_context_t *ctx, unsigned int rounds) { memset(dst, 0, SALSA20_BLOCK_SIZE); - return _gcry_salsa20_amd64_encrypt_blocks(ctx->input, dst, dst, 1, rounds); + return _gcry_salsa20_amd64_encrypt_blocks(ctx->input, dst, dst, 1, rounds) + + ASM_EXTRA_STACK; } #else /* USE_AMD64 */ @@ -418,6 +433,7 @@ salsa20_do_encrypt_stream (SALSA20_context_t *ctx, size_t nblocks = length / SALSA20_BLOCK_SIZE; burn = _gcry_salsa20_amd64_encrypt_blocks(ctx->input, inbuf, outbuf, nblocks, rounds); + burn += ASM_EXTRA_STACK; length -= SALSA20_BLOCK_SIZE * nblocks; outbuf += SALSA20_BLOCK_SIZE * nblocks; inbuf += SALSA20_BLOCK_SIZE * nblocks; commit 8d7de4dbf7732c6eb9e9853ad7c19c89075ace6f Author: Jussi Kivilinna Date: Thu May 14 12:39:39 2015 +0300 Enable AMD64 Poly1305 implementations on WIN64 * cipher/poly1305-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-internal.h (POLY1305_SYSV_FUNC_ABI): New. (POLY1305_USE_SSE2, POLY1305_USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (OPS_FUNC_ABI): New. (poly1305_ops_t): Use OPS_FUNC_ABI. * cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, _gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, _gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_init_ext_ref32) (poly1305_blocks_ref32, poly1305_finish_ext_ref32) (poly1305_init_ext_ref8, poly1305_blocks_ref8) (poly1305_finish_ext_ref8): Use OPS_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/poly1305-avx2-amd64.S b/cipher/poly1305-avx2-amd64.S index 0ba7e76..9362a5a 100644 --- a/cipher/poly1305-avx2-amd64.S +++ b/cipher/poly1305-avx2-amd64.S @@ -25,15 +25,23 @@ #include -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + + .text .align 8 .globl _gcry_poly1305_amd64_avx2_init_ext -.type _gcry_poly1305_amd64_avx2_init_ext, at function; +ELF(.type _gcry_poly1305_amd64_avx2_init_ext, at function;) _gcry_poly1305_amd64_avx2_init_ext: .Lpoly1305_init_ext_avx2_local: xor %edx, %edx @@ -391,12 +399,12 @@ _gcry_poly1305_amd64_avx2_init_ext: popq %r13 popq %r12 ret -.size _gcry_poly1305_amd64_avx2_init_ext,.-_gcry_poly1305_amd64_avx2_init_ext; +ELF(.size _gcry_poly1305_amd64_avx2_init_ext,.-_gcry_poly1305_amd64_avx2_init_ext;) .align 8 .globl _gcry_poly1305_amd64_avx2_blocks -.type _gcry_poly1305_amd64_avx2_blocks, at function; +ELF(.type _gcry_poly1305_amd64_avx2_blocks, at function;) _gcry_poly1305_amd64_avx2_blocks: .Lpoly1305_blocks_avx2_local: vzeroupper @@ -717,12 +725,12 @@ _gcry_poly1305_amd64_avx2_blocks: leave addq $8, %rax ret -.size _gcry_poly1305_amd64_avx2_blocks,.-_gcry_poly1305_amd64_avx2_blocks; +ELF(.size _gcry_poly1305_amd64_avx2_blocks,.-_gcry_poly1305_amd64_avx2_blocks;) .align 8 .globl _gcry_poly1305_amd64_avx2_finish_ext -.type _gcry_poly1305_amd64_avx2_finish_ext, at function; +ELF(.type _gcry_poly1305_amd64_avx2_finish_ext, at function;) _gcry_poly1305_amd64_avx2_finish_ext: .Lpoly1305_finish_ext_avx2_local: vzeroupper @@ -949,6 +957,6 @@ _gcry_poly1305_amd64_avx2_finish_ext: popq %rbp addq $(8*5), %rax ret -.size _gcry_poly1305_amd64_avx2_finish_ext,.-_gcry_poly1305_amd64_avx2_finish_ext; +ELF(.size _gcry_poly1305_amd64_avx2_finish_ext,.-_gcry_poly1305_amd64_avx2_finish_ext;) #endif diff --git a/cipher/poly1305-internal.h b/cipher/poly1305-internal.h index dfc0c04..bcbe5df 100644 --- a/cipher/poly1305-internal.h +++ b/cipher/poly1305-internal.h @@ -44,24 +44,30 @@ #define POLY1305_REF_ALIGNMENT sizeof(void *) +#undef POLY1305_SYSV_FUNC_ABI + /* POLY1305_USE_SSE2 indicates whether to compile with AMD64 SSE2 code. */ #undef POLY1305_USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define POLY1305_USE_SSE2 1 # define POLY1305_SSE2_BLOCKSIZE 32 # define POLY1305_SSE2_STATESIZE 248 # define POLY1305_SSE2_ALIGNMENT 16 +# define POLY1305_SYSV_FUNC_ABI 1 #endif /* POLY1305_USE_AVX2 indicates whether to compile with AMD64 AVX2 code. */ #undef POLY1305_USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) # define POLY1305_USE_AVX2 1 # define POLY1305_AVX2_BLOCKSIZE 64 # define POLY1305_AVX2_STATESIZE 328 # define POLY1305_AVX2_ALIGNMENT 32 +# define POLY1305_SYSV_FUNC_ABI 1 #endif @@ -112,6 +118,17 @@ #endif +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef OPS_FUNC_ABI +#if defined(POLY1305_SYSV_FUNC_ABI) && \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) +# define OPS_FUNC_ABI __attribute__((sysv_abi)) +#else +# define OPS_FUNC_ABI +#endif + + typedef struct poly1305_key_s { byte b[POLY1305_KEYLEN]; @@ -121,10 +138,10 @@ typedef struct poly1305_key_s typedef struct poly1305_ops_s { size_t block_size; - void (*init_ext) (void *ctx, const poly1305_key_t * key); - unsigned int (*blocks) (void *ctx, const byte * m, size_t bytes); + void (*init_ext) (void *ctx, const poly1305_key_t * key) OPS_FUNC_ABI; + unsigned int (*blocks) (void *ctx, const byte * m, size_t bytes) OPS_FUNC_ABI; unsigned int (*finish_ext) (void *ctx, const byte * m, size_t remaining, - byte mac[POLY1305_TAGLEN]); + byte mac[POLY1305_TAGLEN]) OPS_FUNC_ABI; } poly1305_ops_t; diff --git a/cipher/poly1305-sse2-amd64.S b/cipher/poly1305-sse2-amd64.S index 106b119..219eb07 100644 --- a/cipher/poly1305-sse2-amd64.S +++ b/cipher/poly1305-sse2-amd64.S @@ -25,14 +25,22 @@ #include -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_poly1305_amd64_sse2_init_ext -.type _gcry_poly1305_amd64_sse2_init_ext, at function; +ELF(.type _gcry_poly1305_amd64_sse2_init_ext, at function;) _gcry_poly1305_amd64_sse2_init_ext: .Lpoly1305_init_ext_x86_local: xor %edx, %edx @@ -273,12 +281,12 @@ _gcry_poly1305_amd64_sse2_init_ext: popq %r13 popq %r12 ret -.size _gcry_poly1305_amd64_sse2_init_ext,.-_gcry_poly1305_amd64_sse2_init_ext; +ELF(.size _gcry_poly1305_amd64_sse2_init_ext,.-_gcry_poly1305_amd64_sse2_init_ext;) .align 8 .globl _gcry_poly1305_amd64_sse2_finish_ext -.type _gcry_poly1305_amd64_sse2_finish_ext, at function; +ELF(.type _gcry_poly1305_amd64_sse2_finish_ext, at function;) _gcry_poly1305_amd64_sse2_finish_ext: .Lpoly1305_finish_ext_x86_local: pushq %rbp @@ -424,12 +432,12 @@ _gcry_poly1305_amd64_sse2_finish_ext: popq %rbp addq $8, %rax ret -.size _gcry_poly1305_amd64_sse2_finish_ext,.-_gcry_poly1305_amd64_sse2_finish_ext; +ELF(.size _gcry_poly1305_amd64_sse2_finish_ext,.-_gcry_poly1305_amd64_sse2_finish_ext;) .align 8 .globl _gcry_poly1305_amd64_sse2_blocks -.type _gcry_poly1305_amd64_sse2_blocks, at function; +ELF(.type _gcry_poly1305_amd64_sse2_blocks, at function;) _gcry_poly1305_amd64_sse2_blocks: .Lpoly1305_blocks_x86_local: pushq %rbp @@ -1030,6 +1038,6 @@ _gcry_poly1305_amd64_sse2_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_poly1305_amd64_sse2_blocks,.-_gcry_poly1305_amd64_sse2_blocks; +ELF(.size _gcry_poly1305_amd64_sse2_blocks,.-_gcry_poly1305_amd64_sse2_blocks;) #endif diff --git a/cipher/poly1305.c b/cipher/poly1305.c index 28dbbf8..1adf0e7 100644 --- a/cipher/poly1305.c +++ b/cipher/poly1305.c @@ -40,12 +40,13 @@ static const char *selftest (void); #ifdef POLY1305_USE_SSE2 -void _gcry_poly1305_amd64_sse2_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_amd64_sse2_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_sse2_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_sse2_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_amd64_sse2_ops = { POLY1305_SSE2_BLOCKSIZE, @@ -59,12 +60,13 @@ static const poly1305_ops_t poly1305_amd64_sse2_ops = { #ifdef POLY1305_USE_AVX2 -void _gcry_poly1305_amd64_avx2_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_amd64_avx2_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_avx2_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_amd64_avx2_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_amd64_avx2_ops = { POLY1305_AVX2_BLOCKSIZE, @@ -78,12 +80,13 @@ static const poly1305_ops_t poly1305_amd64_avx2_ops = { #ifdef POLY1305_USE_NEON -void _gcry_poly1305_armv7_neon_init_ext(void *state, const poly1305_key_t *key); +void _gcry_poly1305_armv7_neon_init_ext(void *state, const poly1305_key_t *key) + OPS_FUNC_ABI; unsigned int _gcry_poly1305_armv7_neon_finish_ext(void *state, const byte *m, size_t remaining, - byte mac[16]); + byte mac[16]) OPS_FUNC_ABI; unsigned int _gcry_poly1305_armv7_neon_blocks(void *ctx, const byte *m, - size_t bytes); + size_t bytes) OPS_FUNC_ABI; static const poly1305_ops_t poly1305_armv7_neon_ops = { POLY1305_NEON_BLOCKSIZE, @@ -110,7 +113,7 @@ typedef struct poly1305_state_ref32_s } poly1305_state_ref32_t; -static void +static OPS_FUNC_ABI void poly1305_init_ext_ref32 (void *state, const poly1305_key_t * key) { poly1305_state_ref32_t *st = (poly1305_state_ref32_t *) state; @@ -142,7 +145,7 @@ poly1305_init_ext_ref32 (void *state, const poly1305_key_t * key) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_blocks_ref32 (void *state, const byte * m, size_t bytes) { poly1305_state_ref32_t *st = (poly1305_state_ref32_t *) state; @@ -230,7 +233,7 @@ poly1305_blocks_ref32 (void *state, const byte * m, size_t bytes) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_finish_ext_ref32 (void *state, const byte * m, size_t remaining, byte mac[POLY1305_TAGLEN]) { @@ -370,7 +373,7 @@ typedef struct poly1305_state_ref8_t } poly1305_state_ref8_t; -static void +static OPS_FUNC_ABI void poly1305_init_ext_ref8 (void *state, const poly1305_key_t * key) { poly1305_state_ref8_t *st = (poly1305_state_ref8_t *) state; @@ -471,7 +474,7 @@ poly1305_freeze_ref8 (byte h[17]) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_blocks_ref8 (void *state, const byte * m, size_t bytes) { poly1305_state_ref8_t *st = (poly1305_state_ref8_t *) state; @@ -519,7 +522,7 @@ poly1305_blocks_ref8 (void *state, const byte * m, size_t bytes) } -static unsigned int +static OPS_FUNC_ABI unsigned int poly1305_finish_ext_ref8 (void *state, const byte * m, size_t remaining, byte mac[POLY1305_TAGLEN]) { commit b65e9e71d5ee992db5c96793c6af999545daad28 Author: Jussi Kivilinna Date: Thu May 14 10:31:18 2015 +0300 Enable AMD64 3DES implementation on WIN64 * cipher/des-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/des.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (tripledes_ecb_crypt) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (tripledes_amd64_ctr_enc, tripledes_amd64_cbc_dec) (tripledes_amd64_cfb_dec): New wrapper functions for bulk assembly functions. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/des-amd64.S b/cipher/des-amd64.S index e8b2c56..307d211 100644 --- a/cipher/des-amd64.S +++ b/cipher/des-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(USE_DES) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_DES) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) #ifdef __PIC__ # define RIP (%rip) @@ -28,6 +29,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text #define s1 0 @@ -185,7 +192,7 @@ .align 8 .globl _gcry_3des_amd64_crypt_block -.type _gcry_3des_amd64_crypt_block, at function; +ELF(.type _gcry_3des_amd64_crypt_block, at function;) _gcry_3des_amd64_crypt_block: /* input: @@ -271,7 +278,7 @@ _gcry_3des_amd64_crypt_block: popq %rbp; ret; -.size _gcry_3des_amd64_crypt_block,.-_gcry_3des_amd64_crypt_block; +ELF(.size _gcry_3des_amd64_crypt_block,.-_gcry_3des_amd64_crypt_block;) /*********************************************************************** * 3-way 3DES @@ -458,7 +465,7 @@ _gcry_3des_amd64_crypt_block: movl right##d, 4(io); .align 8 -.type _gcry_3des_amd64_crypt_blk3, at function; +ELF(.type _gcry_3des_amd64_crypt_blk3, at function;) _gcry_3des_amd64_crypt_blk3: /* input: * %rdi: round keys, CTX @@ -528,11 +535,11 @@ _gcry_3des_amd64_crypt_blk3: final_permutation3(RR, RL); ret; -.size _gcry_3des_amd64_crypt_blk3,.-_gcry_3des_amd64_crypt_blk3; +ELF(.size _gcry_3des_amd64_crypt_blk3,.-_gcry_3des_amd64_crypt_blk3;) .align 8 .globl _gcry_3des_amd64_cbc_dec -.type _gcry_3des_amd64_cbc_dec, at function; +ELF(.type _gcry_3des_amd64_cbc_dec, at function;) _gcry_3des_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -604,11 +611,11 @@ _gcry_3des_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec; +ELF(.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec;) .align 8 .globl _gcry_3des_amd64_ctr_enc -.type _gcry_3des_amd64_ctr_enc, at function; +ELF(.type _gcry_3des_amd64_ctr_enc, at function;) _gcry_3des_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -682,11 +689,11 @@ _gcry_3des_amd64_ctr_enc: popq %rbp; ret; -.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec; +ELF(.size _gcry_3des_amd64_cbc_dec,.-_gcry_3des_amd64_cbc_dec;) .align 8 .globl _gcry_3des_amd64_cfb_dec -.type _gcry_3des_amd64_cfb_dec, at function; +ELF(.type _gcry_3des_amd64_cfb_dec, at function;) _gcry_3des_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -757,7 +764,7 @@ _gcry_3des_amd64_cfb_dec: popq %rbx; popq %rbp; ret; -.size _gcry_3des_amd64_cfb_dec,.-_gcry_3des_amd64_cfb_dec; +ELF(.size _gcry_3des_amd64_cfb_dec,.-_gcry_3des_amd64_cfb_dec;) .data .align 16 diff --git a/cipher/des.c b/cipher/des.c index d4863d1..be62763 100644 --- a/cipher/des.c +++ b/cipher/des.c @@ -127,7 +127,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -771,6 +772,24 @@ extern void _gcry_3des_amd64_cfb_dec(const void *keys, byte *out, #define TRIPLEDES_ECB_BURN_STACK (8 * sizeof(void *)) +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + /* * Electronic Codebook Mode Triple-DES encryption/decryption of data * according to 'mode'. Sometimes this mode is named 'EDE' mode @@ -784,11 +803,45 @@ tripledes_ecb_crypt (struct _tripledes_ctx *ctx, const byte * from, keys = mode ? ctx->decrypt_subkeys : ctx->encrypt_subkeys; +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_crypt_block, keys, to, from, NULL); +#else _gcry_3des_amd64_crypt_block(keys, to, from); +#endif return 0; } +static inline void +tripledes_amd64_ctr_enc(const void *keys, byte *out, const byte *in, byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_ctr_enc, keys, out, in, ctr); +#else + _gcry_3des_amd64_ctr_enc(keys, out, in, ctr); +#endif +} + +static inline void +tripledes_amd64_cbc_dec(const void *keys, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_cbc_dec, keys, out, in, iv); +#else + _gcry_3des_amd64_cbc_dec(keys, out, in, iv); +#endif +} + +static inline void +tripledes_amd64_cfb_dec(const void *keys, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_3des_amd64_cfb_dec, keys, out, in, iv); +#else + _gcry_3des_amd64_cfb_dec(keys, out, in, iv); +#endif +} + #else /*USE_AMD64_ASM*/ #define TRIPLEDES_ECB_BURN_STACK 32 @@ -871,7 +924,7 @@ _gcry_3des_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_ctr_enc(ctx->encrypt_subkeys, outbuf, inbuf, ctr); + tripledes_amd64_ctr_enc(ctx->encrypt_subkeys, outbuf, inbuf, ctr); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; @@ -926,7 +979,7 @@ _gcry_3des_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_cbc_dec(ctx->decrypt_subkeys, outbuf, inbuf, iv); + tripledes_amd64_cbc_dec(ctx->decrypt_subkeys, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; @@ -974,7 +1027,7 @@ _gcry_3des_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 3 block chunks. */ while (nblocks >= 3) { - _gcry_3des_amd64_cfb_dec(ctx->encrypt_subkeys, outbuf, inbuf, iv); + tripledes_amd64_cfb_dec(ctx->encrypt_subkeys, outbuf, inbuf, iv); nblocks -= 3; outbuf += 3 * DES_BLOCKSIZE; commit 9597cfddf03c467825da152be5ca0d12a8c30d88 Author: Jussi Kivilinna Date: Tue May 5 21:02:43 2015 +0300 Enable AMD64 ChaCha20 implementations on WIN64 * cipher/chacha20-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20-ssse3-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/chacha20.c (USE_SSE2, USE_SSSE3, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (chacha20_blocks_t, _gcry_chacha20_amd64_sse2_blocks) (_gcry_chacha20_amd64_ssse3_blocks, _gcry_chacha20_amd64_avx2_blocks) (_gcry_chacha20_armv7_neon_blocks, chacha20_blocks): Add ASM_FUNC_ABI. (chacha20_core): Add ASM_EXTRA_STACK. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/chacha20-avx2-amd64.S b/cipher/chacha20-avx2-amd64.S index 1f33de8..12bed35 100644 --- a/cipher/chacha20-avx2-amd64.S +++ b/cipher/chacha20-avx2-amd64.S @@ -26,7 +26,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) && USE_CHACHA20 #ifdef __PIC__ @@ -35,11 +36,17 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_chacha20_amd64_avx2_blocks -.type _gcry_chacha20_amd64_avx2_blocks, at function; +ELF(.type _gcry_chacha20_amd64_avx2_blocks, at function;) _gcry_chacha20_amd64_avx2_blocks: .Lchacha_blocks_avx2_local: vzeroupper @@ -938,7 +945,7 @@ _gcry_chacha20_amd64_avx2_blocks: vzeroall movl $(63 + 512), %eax ret -.size _gcry_chacha20_amd64_avx2_blocks,.-_gcry_chacha20_amd64_avx2_blocks; +ELF(.size _gcry_chacha20_amd64_avx2_blocks,.-_gcry_chacha20_amd64_avx2_blocks;) .data .align 16 diff --git a/cipher/chacha20-sse2-amd64.S b/cipher/chacha20-sse2-amd64.S index 4811f40..2b9842c 100644 --- a/cipher/chacha20-sse2-amd64.S +++ b/cipher/chacha20-sse2-amd64.S @@ -26,13 +26,20 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && USE_CHACHA20 +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && USE_CHACHA20 + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 8 .globl _gcry_chacha20_amd64_sse2_blocks -.type _gcry_chacha20_amd64_sse2_blocks, at function; +ELF(.type _gcry_chacha20_amd64_sse2_blocks, at function;) _gcry_chacha20_amd64_sse2_blocks: .Lchacha_blocks_sse2_local: pushq %rbx @@ -646,7 +653,7 @@ _gcry_chacha20_amd64_sse2_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_chacha20_amd64_sse2_blocks,.-_gcry_chacha20_amd64_sse2_blocks; +ELF(.size _gcry_chacha20_amd64_sse2_blocks,.-_gcry_chacha20_amd64_sse2_blocks;) #endif /*defined(USE_CHACHA20)*/ #endif /*__x86_64*/ diff --git a/cipher/chacha20-ssse3-amd64.S b/cipher/chacha20-ssse3-amd64.S index 50c2ff8..a1a843f 100644 --- a/cipher/chacha20-ssse3-amd64.S +++ b/cipher/chacha20-ssse3-amd64.S @@ -26,7 +26,8 @@ #ifdef __x86_64__ #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) && USE_CHACHA20 #ifdef __PIC__ @@ -35,11 +36,17 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + .text .align 8 .globl _gcry_chacha20_amd64_ssse3_blocks -.type _gcry_chacha20_amd64_ssse3_blocks, at function; +ELF(.type _gcry_chacha20_amd64_ssse3_blocks, at function;) _gcry_chacha20_amd64_ssse3_blocks: .Lchacha_blocks_ssse3_local: pushq %rbx @@ -614,7 +621,7 @@ _gcry_chacha20_amd64_ssse3_blocks: pxor %xmm8, %xmm8 pxor %xmm0, %xmm0 ret -.size _gcry_chacha20_amd64_ssse3_blocks,.-_gcry_chacha20_amd64_ssse3_blocks; +ELF(.size _gcry_chacha20_amd64_ssse3_blocks,.-_gcry_chacha20_amd64_ssse3_blocks;) .data .align 16; diff --git a/cipher/chacha20.c b/cipher/chacha20.c index 2eaeffd..e25e239 100644 --- a/cipher/chacha20.c +++ b/cipher/chacha20.c @@ -50,20 +50,23 @@ /* USE_SSE2 indicates whether to compile with Intel SSE2 code. */ #undef USE_SSE2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_SSE2 1 #endif /* USE_SSSE3 indicates whether to compile with Intel SSSE3 code. */ #undef USE_SSSE3 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(HAVE_GCC_INLINE_ASM_SSSE3) # define USE_SSSE3 1 #endif /* USE_AVX2 indicates whether to compile with Intel AVX2 code. */ #undef USE_AVX2 -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AVX2_SUPPORT) # define USE_AVX2 1 #endif @@ -82,8 +85,23 @@ struct CHACHA20_context_s; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if (defined(USE_SSE2) || defined(USE_SSSE3) || defined(USE_AVX2)) && \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +#else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +#endif + + typedef unsigned int (* chacha20_blocks_t)(u32 *state, const byte *src, - byte *dst, size_t bytes); + byte *dst, + size_t bytes) ASM_FUNC_ABI; typedef struct CHACHA20_context_s { @@ -97,28 +115,32 @@ typedef struct CHACHA20_context_s #ifdef USE_SSE2 unsigned int _gcry_chacha20_amd64_sse2_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_SSE2 */ #ifdef USE_SSSE3 unsigned int _gcry_chacha20_amd64_ssse3_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_SSSE3 */ #ifdef USE_AVX2 unsigned int _gcry_chacha20_amd64_avx2_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_AVX2 */ #ifdef USE_NEON unsigned int _gcry_chacha20_armv7_neon_blocks(u32 *state, const byte *in, - byte *out, size_t bytes); + byte *out, + size_t bytes) ASM_FUNC_ABI; #endif /* USE_NEON */ @@ -141,7 +163,7 @@ static const char *selftest (void); #ifndef USE_SSE2 -static unsigned int +ASM_FUNC_ABI static unsigned int chacha20_blocks (u32 *state, const byte *src, byte *dst, size_t bytes) { u32 pad[CHACHA20_INPUT_LENGTH]; @@ -269,7 +291,8 @@ chacha20_blocks (u32 *state, const byte *src, byte *dst, size_t bytes) static unsigned int chacha20_core(u32 *dst, struct CHACHA20_context_s *ctx) { - return ctx->blocks(ctx->input, NULL, (byte *)dst, CHACHA20_BLOCK_SIZE); + return ctx->blocks(ctx->input, NULL, (byte *)dst, CHACHA20_BLOCK_SIZE) + + ASM_EXTRA_STACK; } commit 6a6646df80386204675d8b149ab60e74d7ca124c Author: Jussi Kivilinna Date: Tue May 5 20:46:10 2015 +0300 Enable AMD64 CAST5 implementation on WIN64 * cipher/cast5-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (RIP): Remove. (GET_EXTERN_POINTER): Use 'leaq' version on WIN64. (ELF): New macro to mask lines with ELF specific commands. * cipher/cast5.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (do_encrypt_block, do_decrypt_block) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (cast5_amd64_ctr_enc, cast5_amd64_cbc_dec) (cast5_amd64_cfb_dec): New wrapper functions for bulk assembly functions. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cast5-amd64.S b/cipher/cast5-amd64.S index 41fbb74..a5f078e 100644 --- a/cipher/cast5-amd64.S +++ b/cipher/cast5-amd64.S @@ -20,14 +20,19 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && defined(USE_CAST5) +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_CAST5) -#ifdef __PIC__ -# define RIP %rip +#if defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS) || !defined(__PIC__) +# define GET_EXTERN_POINTER(name, reg) leaq name, reg +#else # define GET_EXTERN_POINTER(name, reg) movq name at GOTPCREL(%rip), reg +#endif + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ #else -# define RIP -# define GET_EXTERN_POINTER(name, reg) leaq name, reg +# define ELF(...) /*_*/ #endif .text @@ -180,7 +185,7 @@ .align 8 .globl _gcry_cast5_amd64_encrypt_block -.type _gcry_cast5_amd64_encrypt_block, at function; +ELF(.type _gcry_cast5_amd64_encrypt_block, at function;) _gcry_cast5_amd64_encrypt_block: /* input: @@ -216,11 +221,11 @@ _gcry_cast5_amd64_encrypt_block: popq %rbx; popq %rbp; ret; -.size _gcry_cast5_amd64_encrypt_block,.-_gcry_cast5_amd64_encrypt_block; +ELF(.size _gcry_cast5_amd64_encrypt_block,.-_gcry_cast5_amd64_encrypt_block;) .align 8 .globl _gcry_cast5_amd64_decrypt_block -.type _gcry_cast5_amd64_decrypt_block, at function; +ELF(.type _gcry_cast5_amd64_decrypt_block, at function;) _gcry_cast5_amd64_decrypt_block: /* input: @@ -256,7 +261,7 @@ _gcry_cast5_amd64_decrypt_block: popq %rbx; popq %rbp; ret; -.size _gcry_cast5_amd64_decrypt_block,.-_gcry_cast5_amd64_decrypt_block; +ELF(.size _gcry_cast5_amd64_decrypt_block,.-_gcry_cast5_amd64_decrypt_block;) /********************************************************************** 4-way cast5, four blocks parallel @@ -359,7 +364,7 @@ _gcry_cast5_amd64_decrypt_block: rorq $32, d; .align 8 -.type __cast5_enc_blk4, at function; +ELF(.type __cast5_enc_blk4, at function;) __cast5_enc_blk4: /* input: @@ -384,10 +389,10 @@ __cast5_enc_blk4: outbswap_block4(RLR0, RLR1, RLR2, RLR3); ret; -.size __cast5_enc_blk4,.-__cast5_enc_blk4; +ELF(.size __cast5_enc_blk4,.-__cast5_enc_blk4;) .align 8 -.type __cast5_dec_blk4, at function; +ELF(.type __cast5_dec_blk4, at function;) __cast5_dec_blk4: /* input: @@ -414,11 +419,11 @@ __cast5_dec_blk4: outbswap_block4(RLR0, RLR1, RLR2, RLR3); ret; -.size __cast5_dec_blk4,.-__cast5_dec_blk4; +ELF(.size __cast5_dec_blk4,.-__cast5_dec_blk4;) .align 8 .globl _gcry_cast5_amd64_ctr_enc -.type _gcry_cast5_amd64_ctr_enc, at function; +ELF(.type _gcry_cast5_amd64_ctr_enc, at function;) _gcry_cast5_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -472,11 +477,11 @@ _gcry_cast5_amd64_ctr_enc: popq %rbx; popq %rbp; ret -.size _gcry_cast5_amd64_ctr_enc,.-_gcry_cast5_amd64_ctr_enc; +ELF(.size _gcry_cast5_amd64_ctr_enc,.-_gcry_cast5_amd64_ctr_enc;) .align 8 .globl _gcry_cast5_amd64_cbc_dec -.type _gcry_cast5_amd64_cbc_dec, at function; +ELF(.type _gcry_cast5_amd64_cbc_dec, at function;) _gcry_cast5_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -526,11 +531,11 @@ _gcry_cast5_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_cast5_amd64_cbc_dec,.-_gcry_cast5_amd64_cbc_dec; +ELF(.size _gcry_cast5_amd64_cbc_dec,.-_gcry_cast5_amd64_cbc_dec;) .align 8 .globl _gcry_cast5_amd64_cfb_dec -.type _gcry_cast5_amd64_cfb_dec, at function; +ELF(.type _gcry_cast5_amd64_cfb_dec, at function;) _gcry_cast5_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -581,7 +586,7 @@ _gcry_cast5_amd64_cfb_dec: popq %rbp; ret; -.size _gcry_cast5_amd64_cfb_dec,.-_gcry_cast5_amd64_cfb_dec; +ELF(.size _gcry_cast5_amd64_cfb_dec,.-_gcry_cast5_amd64_cfb_dec;) #endif /*defined(USE_CAST5)*/ #endif /*__x86_64*/ diff --git a/cipher/cast5.c b/cipher/cast5.c index 115e1e6..94dcee7 100644 --- a/cipher/cast5.c +++ b/cipher/cast5.c @@ -48,7 +48,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -372,16 +373,72 @@ extern void _gcry_cast5_amd64_cbc_dec(CAST5_context *ctx, byte *out, extern void _gcry_cast5_amd64_cfb_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + static void do_encrypt_block (CAST5_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_encrypt_block, context, outbuf, inbuf, NULL); +#else _gcry_cast5_amd64_encrypt_block (context, outbuf, inbuf); +#endif } static void do_decrypt_block (CAST5_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_decrypt_block, context, outbuf, inbuf, NULL); +#else _gcry_cast5_amd64_decrypt_block (context, outbuf, inbuf); +#endif +} + +static void +cast5_amd64_ctr_enc(CAST5_context *ctx, byte *out, const byte *in, byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_ctr_enc, ctx, out, in, ctr); +#else + _gcry_cast5_amd64_ctr_enc (ctx, out, in, ctr); +#endif +} + +static void +cast5_amd64_cbc_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_cbc_dec, ctx, out, in, iv); +#else + _gcry_cast5_amd64_cbc_dec (ctx, out, in, iv); +#endif +} + +static void +cast5_amd64_cfb_dec(CAST5_context *ctx, byte *out, const byte *in, byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_cast5_amd64_cfb_dec, ctx, out, in, iv); +#else + _gcry_cast5_amd64_cfb_dec (ctx, out, in, iv); +#endif } static unsigned int @@ -396,7 +453,7 @@ static unsigned int decrypt_block (void *context, byte *outbuf, const byte *inbuf) { CAST5_context *c = (CAST5_context *) context; - _gcry_cast5_amd64_decrypt_block (c, outbuf, inbuf); + do_decrypt_block (c, outbuf, inbuf); return /*burn_stack*/ (2*8); } @@ -582,7 +639,7 @@ _gcry_cast5_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + cast5_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; @@ -651,7 +708,7 @@ _gcry_cast5_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + cast5_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; @@ -710,7 +767,7 @@ _gcry_cast5_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_cast5_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + cast5_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * CAST5_BLOCKSIZE; commit 9a4fb3709864bf3e3918800d44ff576590cd4e92 Author: Jussi Kivilinna Date: Thu May 14 13:33:07 2015 +0300 Enable AMD64 Camellia implementations on WIN64 * cipher/camellia-aesni-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/camellia-aesni-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/camellia-glue.c (USE_AESNI_AVX, USE_AESNI_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_AESNI_AVX ||?USE_AESNI_AVX2] (ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (_gcry_camellia_aesni_avx_ctr_enc, _gcry_camellia_aesni_avx_cbc_dec) (_gcry_camellia_aesni_avx_cfb_dec, _gcry_camellia_aesni_avx_keygen) (_gcry_camellia_aesni_avx2_ctr_enc, _gcry_camellia_aesni_avx2_cbc_dec) (_gcry_camellia_aesni_avx2_cfb_dec): Add ASM_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/camellia-aesni-avx-amd64.S b/cipher/camellia-aesni-avx-amd64.S index 6d157a7..c047a21 100644 --- a/cipher/camellia-aesni-avx-amd64.S +++ b/cipher/camellia-aesni-avx-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT) #ifdef __PIC__ @@ -29,6 +30,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #define CAMELLIA_TABLE_BYTE_LEN 272 /* struct CAMELLIA_context: */ @@ -769,7 +776,7 @@ .text .align 8 -.type __camellia_enc_blk16, at function; +ELF(.type __camellia_enc_blk16, at function;) __camellia_enc_blk16: /* input: @@ -853,10 +860,10 @@ __camellia_enc_blk16: %xmm15, %rax, %rcx, 24); jmp .Lenc_done; -.size __camellia_enc_blk16,.-__camellia_enc_blk16; +ELF(.size __camellia_enc_blk16,.-__camellia_enc_blk16;) .align 8 -.type __camellia_dec_blk16, at function; +ELF(.type __camellia_dec_blk16, at function;) __camellia_dec_blk16: /* input: @@ -938,7 +945,7 @@ __camellia_dec_blk16: ((key_table + (24) * 8) + 4)(CTX)); jmp .Ldec_max24; -.size __camellia_dec_blk16,.-__camellia_dec_blk16; +ELF(.size __camellia_dec_blk16,.-__camellia_dec_blk16;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -948,7 +955,7 @@ __camellia_dec_blk16: .align 8 .globl _gcry_camellia_aesni_avx_ctr_enc -.type _gcry_camellia_aesni_avx_ctr_enc, at function; +ELF(.type _gcry_camellia_aesni_avx_ctr_enc, at function;) _gcry_camellia_aesni_avx_ctr_enc: /* input: @@ -1062,11 +1069,11 @@ _gcry_camellia_aesni_avx_ctr_enc: leave; ret; -.size _gcry_camellia_aesni_avx_ctr_enc,.-_gcry_camellia_aesni_avx_ctr_enc; +ELF(.size _gcry_camellia_aesni_avx_ctr_enc,.-_gcry_camellia_aesni_avx_ctr_enc;) .align 8 .globl _gcry_camellia_aesni_avx_cbc_dec -.type _gcry_camellia_aesni_avx_cbc_dec, at function; +ELF(.type _gcry_camellia_aesni_avx_cbc_dec, at function;) _gcry_camellia_aesni_avx_cbc_dec: /* input: @@ -1130,11 +1137,11 @@ _gcry_camellia_aesni_avx_cbc_dec: leave; ret; -.size _gcry_camellia_aesni_avx_cbc_dec,.-_gcry_camellia_aesni_avx_cbc_dec; +ELF(.size _gcry_camellia_aesni_avx_cbc_dec,.-_gcry_camellia_aesni_avx_cbc_dec;) .align 8 .globl _gcry_camellia_aesni_avx_cfb_dec -.type _gcry_camellia_aesni_avx_cfb_dec, at function; +ELF(.type _gcry_camellia_aesni_avx_cfb_dec, at function;) _gcry_camellia_aesni_avx_cfb_dec: /* input: @@ -1202,7 +1209,7 @@ _gcry_camellia_aesni_avx_cfb_dec: leave; ret; -.size _gcry_camellia_aesni_avx_cfb_dec,.-_gcry_camellia_aesni_avx_cfb_dec; +ELF(.size _gcry_camellia_aesni_avx_cfb_dec,.-_gcry_camellia_aesni_avx_cfb_dec;) /* * IN: @@ -1309,7 +1316,7 @@ _gcry_camellia_aesni_avx_cfb_dec: .text .align 8 -.type __camellia_avx_setup128, at function; +ELF(.type __camellia_avx_setup128, at function;) __camellia_avx_setup128: /* input: * %rdi: ctx, CTX; subkey storage at key_table(CTX) @@ -1650,10 +1657,10 @@ __camellia_avx_setup128: vzeroall; ret; -.size __camellia_avx_setup128,.-__camellia_avx_setup128; +ELF(.size __camellia_avx_setup128,.-__camellia_avx_setup128;) .align 8 -.type __camellia_avx_setup256, at function; +ELF(.type __camellia_avx_setup256, at function;) __camellia_avx_setup256: /* input: @@ -2127,11 +2134,11 @@ __camellia_avx_setup256: vzeroall; ret; -.size __camellia_avx_setup256,.-__camellia_avx_setup256; +ELF(.size __camellia_avx_setup256,.-__camellia_avx_setup256;) .align 8 .globl _gcry_camellia_aesni_avx_keygen -.type _gcry_camellia_aesni_avx_keygen, at function; +ELF(.type _gcry_camellia_aesni_avx_keygen, at function;) _gcry_camellia_aesni_avx_keygen: /* input: @@ -2159,7 +2166,7 @@ _gcry_camellia_aesni_avx_keygen: vpor %xmm2, %xmm1, %xmm1; jmp __camellia_avx_setup256; -.size _gcry_camellia_aesni_avx_keygen,.-_gcry_camellia_aesni_avx_keygen; +ELF(.size _gcry_camellia_aesni_avx_keygen,.-_gcry_camellia_aesni_avx_keygen;) #endif /*defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT)*/ #endif /*__x86_64*/ diff --git a/cipher/camellia-aesni-avx2-amd64.S b/cipher/camellia-aesni-avx2-amd64.S index 25f48bc..a3fa229 100644 --- a/cipher/camellia-aesni-avx2-amd64.S +++ b/cipher/camellia-aesni-avx2-amd64.S @@ -20,7 +20,8 @@ #ifdef __x86_64 #include -#if defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT) #ifdef __PIC__ @@ -29,6 +30,12 @@ # define RIP #endif +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + #define CAMELLIA_TABLE_BYTE_LEN 272 /* struct CAMELLIA_context: */ @@ -748,7 +755,7 @@ .text .align 8 -.type __camellia_enc_blk32, at function; +ELF(.type __camellia_enc_blk32, at function;) __camellia_enc_blk32: /* input: @@ -832,10 +839,10 @@ __camellia_enc_blk32: %ymm15, %rax, %rcx, 24); jmp .Lenc_done; -.size __camellia_enc_blk32,.-__camellia_enc_blk32; +ELF(.size __camellia_enc_blk32,.-__camellia_enc_blk32;) .align 8 -.type __camellia_dec_blk32, at function; +ELF(.type __camellia_dec_blk32, at function;) __camellia_dec_blk32: /* input: @@ -917,7 +924,7 @@ __camellia_dec_blk32: ((key_table + (24) * 8) + 4)(CTX)); jmp .Ldec_max24; -.size __camellia_dec_blk32,.-__camellia_dec_blk32; +ELF(.size __camellia_dec_blk32,.-__camellia_dec_blk32;) #define inc_le128(x, minus_one, tmp) \ vpcmpeqq minus_one, x, tmp; \ @@ -927,7 +934,7 @@ __camellia_dec_blk32: .align 8 .globl _gcry_camellia_aesni_avx2_ctr_enc -.type _gcry_camellia_aesni_avx2_ctr_enc, at function; +ELF(.type _gcry_camellia_aesni_avx2_ctr_enc, at function;) _gcry_camellia_aesni_avx2_ctr_enc: /* input: @@ -1111,11 +1118,11 @@ _gcry_camellia_aesni_avx2_ctr_enc: leave; ret; -.size _gcry_camellia_aesni_avx2_ctr_enc,.-_gcry_camellia_aesni_avx2_ctr_enc; +ELF(.size _gcry_camellia_aesni_avx2_ctr_enc,.-_gcry_camellia_aesni_avx2_ctr_enc;) .align 8 .globl _gcry_camellia_aesni_avx2_cbc_dec -.type _gcry_camellia_aesni_avx2_cbc_dec, at function; +ELF(.type _gcry_camellia_aesni_avx2_cbc_dec, at function;) _gcry_camellia_aesni_avx2_cbc_dec: /* input: @@ -1183,11 +1190,11 @@ _gcry_camellia_aesni_avx2_cbc_dec: leave; ret; -.size _gcry_camellia_aesni_avx2_cbc_dec,.-_gcry_camellia_aesni_avx2_cbc_dec; +ELF(.size _gcry_camellia_aesni_avx2_cbc_dec,.-_gcry_camellia_aesni_avx2_cbc_dec;) .align 8 .globl _gcry_camellia_aesni_avx2_cfb_dec -.type _gcry_camellia_aesni_avx2_cfb_dec, at function; +ELF(.type _gcry_camellia_aesni_avx2_cfb_dec, at function;) _gcry_camellia_aesni_avx2_cfb_dec: /* input: @@ -1257,7 +1264,7 @@ _gcry_camellia_aesni_avx2_cfb_dec: leave; ret; -.size _gcry_camellia_aesni_avx2_cfb_dec,.-_gcry_camellia_aesni_avx2_cfb_dec; +ELF(.size _gcry_camellia_aesni_avx2_cfb_dec,.-_gcry_camellia_aesni_avx2_cfb_dec;) #endif /*defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT)*/ #endif /*__x86_64*/ diff --git a/cipher/camellia-glue.c b/cipher/camellia-glue.c index f18d135..5032321 100644 --- a/cipher/camellia-glue.c +++ b/cipher/camellia-glue.c @@ -75,7 +75,8 @@ /* USE_AESNI inidicates whether to compile with Intel AES-NI/AVX code. */ #undef USE_AESNI_AVX #if defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX_SUPPORT) -# if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AESNI_AVX 1 # endif #endif @@ -83,7 +84,8 @@ /* USE_AESNI_AVX2 inidicates whether to compile with Intel AES-NI/AVX2 code. */ #undef USE_AESNI_AVX2 #if defined(ENABLE_AESNI_SUPPORT) && defined(ENABLE_AVX2_SUPPORT) -# if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +# if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AESNI_AVX2 1 # endif #endif @@ -100,6 +102,20 @@ typedef struct #endif /*USE_AESNI_AVX2*/ } CAMELLIA_context; +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#undef ASM_EXTRA_STACK +#if defined(USE_AESNI_AVX) || defined(USE_AESNI_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# define ASM_EXTRA_STACK (10 * 16) +# else +# define ASM_FUNC_ABI +# define ASM_EXTRA_STACK 0 +# endif +#endif + #ifdef USE_AESNI_AVX /* Assembler implementations of Camellia using AES-NI and AVX. Process data in 16 block same time. @@ -107,21 +123,21 @@ typedef struct extern void _gcry_camellia_aesni_avx_ctr_enc(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_cbc_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_cfb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, const unsigned char *key, - unsigned int keylen); + unsigned int keylen) ASM_FUNC_ABI; #endif #ifdef USE_AESNI_AVX2 @@ -131,17 +147,17 @@ extern void _gcry_camellia_aesni_avx_keygen(CAMELLIA_context *ctx, extern void _gcry_camellia_aesni_avx2_ctr_enc(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *ctr); + unsigned char *ctr) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_cbc_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; extern void _gcry_camellia_aesni_avx2_cfb_dec(CAMELLIA_context *ctx, unsigned char *out, const unsigned char *in, - unsigned char *iv); + unsigned char *iv) ASM_FUNC_ABI; #endif static const char *selftest(void); @@ -318,7 +334,7 @@ _gcry_camellia_ctr_enc(void *context, unsigned char *ctr, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -347,8 +363,11 @@ _gcry_camellia_ctr_enc(void *context, unsigned char *ctr, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ @@ -409,7 +428,7 @@ _gcry_camellia_cbc_dec(void *context, unsigned char *iv, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK;; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -437,8 +456,11 @@ _gcry_camellia_cbc_dec(void *context, unsigned char *iv, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ @@ -491,7 +513,7 @@ _gcry_camellia_cfb_dec(void *context, unsigned char *iv, if (did_use_aesni_avx2) { int avx2_burn_stack_depth = 32 * CAMELLIA_BLOCK_SIZE + 16 + - 2 * sizeof(void *); + 2 * sizeof(void *) + ASM_EXTRA_STACK; if (burn_stack_depth < avx2_burn_stack_depth) burn_stack_depth = avx2_burn_stack_depth; @@ -519,8 +541,11 @@ _gcry_camellia_cfb_dec(void *context, unsigned char *iv, if (did_use_aesni_avx) { - if (burn_stack_depth < 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *)) - burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + 2 * sizeof(void *); + int avx_burn_stack_depth = 16 * CAMELLIA_BLOCK_SIZE + + 2 * sizeof(void *) + ASM_EXTRA_STACK; + + if (burn_stack_depth < avx_burn_stack_depth) + burn_stack_depth = avx_burn_stack_depth; } /* Use generic code to handle smaller chunks... */ commit e05682093ffb003b589a697428d918d755ac631d Author: Jussi Kivilinna Date: Sun May 3 17:28:40 2015 +0300 Enable AMD64 Blowfish implementation on WIN64 * cipher/blowfish-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/blowfish.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (call_sysv_fn): New. (do_encrypt, do_encrypt_block, do_decrypt_block) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Call assembly function through 'call_sysv_fn'. (blowfish_amd64_ctr_enc, blowfish_amd64_cbc_dec) (blowfish_amd64_cfb_dec): New wrapper functions for bulk assembly functions. .. Signed-off-by: Jussi Kivilinna diff --git a/cipher/blowfish-amd64.S b/cipher/blowfish-amd64.S index 87b676f..21b63fc 100644 --- a/cipher/blowfish-amd64.S +++ b/cipher/blowfish-amd64.S @@ -20,7 +20,15 @@ #ifdef __x86_64 #include -#if defined(USE_BLOWFISH) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_BLOWFISH) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text @@ -120,7 +128,7 @@ movq RX0, (RIO); .align 8 -.type __blowfish_enc_blk1, at function; +ELF(.type __blowfish_enc_blk1, at function;) __blowfish_enc_blk1: /* input: @@ -145,11 +153,11 @@ __blowfish_enc_blk1: movq %r11, %rbp; ret; -.size __blowfish_enc_blk1,.-__blowfish_enc_blk1; +ELF(.size __blowfish_enc_blk1,.-__blowfish_enc_blk1;) .align 8 .globl _gcry_blowfish_amd64_do_encrypt -.type _gcry_blowfish_amd64_do_encrypt, at function; +ELF(.type _gcry_blowfish_amd64_do_encrypt, at function;) _gcry_blowfish_amd64_do_encrypt: /* input: @@ -171,11 +179,11 @@ _gcry_blowfish_amd64_do_encrypt: movl RX0d, (RX2); ret; -.size _gcry_blowfish_amd64_do_encrypt,.-_gcry_blowfish_amd64_do_encrypt; +ELF(.size _gcry_blowfish_amd64_do_encrypt,.-_gcry_blowfish_amd64_do_encrypt;) .align 8 .globl _gcry_blowfish_amd64_encrypt_block -.type _gcry_blowfish_amd64_encrypt_block, at function; +ELF(.type _gcry_blowfish_amd64_encrypt_block, at function;) _gcry_blowfish_amd64_encrypt_block: /* input: @@ -195,11 +203,11 @@ _gcry_blowfish_amd64_encrypt_block: write_block(); ret; -.size _gcry_blowfish_amd64_encrypt_block,.-_gcry_blowfish_amd64_encrypt_block; +ELF(.size _gcry_blowfish_amd64_encrypt_block,.-_gcry_blowfish_amd64_encrypt_block;) .align 8 .globl _gcry_blowfish_amd64_decrypt_block -.type _gcry_blowfish_amd64_decrypt_block, at function; +ELF(.type _gcry_blowfish_amd64_decrypt_block, at function;) _gcry_blowfish_amd64_decrypt_block: /* input: @@ -231,7 +239,7 @@ _gcry_blowfish_amd64_decrypt_block: movq %r11, %rbp; ret; -.size _gcry_blowfish_amd64_decrypt_block,.-_gcry_blowfish_amd64_decrypt_block; +ELF(.size _gcry_blowfish_amd64_decrypt_block,.-_gcry_blowfish_amd64_decrypt_block;) /********************************************************************** 4-way blowfish, four blocks parallel @@ -319,7 +327,7 @@ _gcry_blowfish_amd64_decrypt_block: bswapq RX3; .align 8 -.type __blowfish_enc_blk4, at function; +ELF(.type __blowfish_enc_blk4, at function;) __blowfish_enc_blk4: /* input: @@ -343,10 +351,10 @@ __blowfish_enc_blk4: outbswap_block4(); ret; -.size __blowfish_enc_blk4,.-__blowfish_enc_blk4; +ELF(.size __blowfish_enc_blk4,.-__blowfish_enc_blk4;) .align 8 -.type __blowfish_dec_blk4, at function; +ELF(.type __blowfish_dec_blk4, at function;) __blowfish_dec_blk4: /* input: @@ -372,11 +380,11 @@ __blowfish_dec_blk4: outbswap_block4(); ret; -.size __blowfish_dec_blk4,.-__blowfish_dec_blk4; +ELF(.size __blowfish_dec_blk4,.-__blowfish_dec_blk4;) .align 8 .globl _gcry_blowfish_amd64_ctr_enc -.type _gcry_blowfish_amd64_ctr_enc, at function; +ELF(.type _gcry_blowfish_amd64_ctr_enc, at function;) _gcry_blowfish_amd64_ctr_enc: /* input: * %rdi: ctx, CTX @@ -429,11 +437,11 @@ _gcry_blowfish_amd64_ctr_enc: popq %rbp; ret; -.size _gcry_blowfish_amd64_ctr_enc,.-_gcry_blowfish_amd64_ctr_enc; +ELF(.size _gcry_blowfish_amd64_ctr_enc,.-_gcry_blowfish_amd64_ctr_enc;) .align 8 .globl _gcry_blowfish_amd64_cbc_dec -.type _gcry_blowfish_amd64_cbc_dec, at function; +ELF(.type _gcry_blowfish_amd64_cbc_dec, at function;) _gcry_blowfish_amd64_cbc_dec: /* input: * %rdi: ctx, CTX @@ -477,11 +485,11 @@ _gcry_blowfish_amd64_cbc_dec: popq %rbp; ret; -.size _gcry_blowfish_amd64_cbc_dec,.-_gcry_blowfish_amd64_cbc_dec; +ELF(.size _gcry_blowfish_amd64_cbc_dec,.-_gcry_blowfish_amd64_cbc_dec;) .align 8 .globl _gcry_blowfish_amd64_cfb_dec -.type _gcry_blowfish_amd64_cfb_dec, at function; +ELF(.type _gcry_blowfish_amd64_cfb_dec, at function;) _gcry_blowfish_amd64_cfb_dec: /* input: * %rdi: ctx, CTX @@ -527,7 +535,7 @@ _gcry_blowfish_amd64_cfb_dec: popq %rbx; popq %rbp; ret; -.size _gcry_blowfish_amd64_cfb_dec,.-_gcry_blowfish_amd64_cfb_dec; +ELF(.size _gcry_blowfish_amd64_cfb_dec,.-_gcry_blowfish_amd64_cfb_dec;) #endif /*defined(USE_BLOWFISH)*/ #endif /*__x86_64*/ diff --git a/cipher/blowfish.c b/cipher/blowfish.c index ae470d8..a3fc26c 100644 --- a/cipher/blowfish.c +++ b/cipher/blowfish.c @@ -45,7 +45,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) && \ +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && \ (BLOWFISH_ROUNDS == 16) # define USE_AMD64_ASM 1 #endif @@ -280,22 +281,87 @@ extern void _gcry_blowfish_amd64_cbc_dec(BLOWFISH_context *ctx, byte *out, extern void _gcry_blowfish_amd64_cfb_dec(BLOWFISH_context *ctx, byte *out, const byte *in, byte *iv); +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +static inline void +call_sysv_fn (const void *fn, const void *arg1, const void *arg2, + const void *arg3, const void *arg4) +{ + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (arg1), + "+S" (arg2), + "+d" (arg3), + "+c" (arg4) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +} +#endif + static void do_encrypt ( BLOWFISH_context *bc, u32 *ret_xl, u32 *ret_xr ) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_do_encrypt, bc, ret_xl, ret_xr, NULL); +#else _gcry_blowfish_amd64_do_encrypt (bc, ret_xl, ret_xr); +#endif } static void do_encrypt_block (BLOWFISH_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_encrypt_block, context, outbuf, inbuf, + NULL); +#else _gcry_blowfish_amd64_encrypt_block (context, outbuf, inbuf); +#endif } static void do_decrypt_block (BLOWFISH_context *context, byte *outbuf, const byte *inbuf) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_decrypt_block, context, outbuf, inbuf, + NULL); +#else _gcry_blowfish_amd64_decrypt_block (context, outbuf, inbuf); +#endif +} + +static inline void +blowfish_amd64_ctr_enc(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *ctr) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_ctr_enc, ctx, out, in, ctr); +#else + _gcry_blowfish_amd64_ctr_enc(ctx, out, in, ctr); +#endif +} + +static inline void +blowfish_amd64_cbc_dec(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_cbc_dec, ctx, out, in, iv); +#else + _gcry_blowfish_amd64_cbc_dec(ctx, out, in, iv); +#endif +} + +static inline void +blowfish_amd64_cfb_dec(BLOWFISH_context *ctx, byte *out, const byte *in, + byte *iv) +{ +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + call_sysv_fn (_gcry_blowfish_amd64_cfb_dec, ctx, out, in, iv); +#else + _gcry_blowfish_amd64_cfb_dec(ctx, out, in, iv); +#endif } static unsigned int @@ -605,7 +671,7 @@ _gcry_blowfish_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); + blowfish_amd64_ctr_enc(ctx, outbuf, inbuf, ctr); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; @@ -674,7 +740,7 @@ _gcry_blowfish_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); + blowfish_amd64_cbc_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; @@ -734,7 +800,7 @@ _gcry_blowfish_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, /* Process data in 4 block chunks. */ while (nblocks >= 4) { - _gcry_blowfish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); + blowfish_amd64_cfb_dec(ctx, outbuf, inbuf, iv); nblocks -= 4; outbuf += 4 * BLOWFISH_BLOCKSIZE; commit c46b015bedba7ce0db68929bd33a86a54ab3d919 Author: Jussi Kivilinna Date: Sun May 3 17:06:56 2015 +0300 Enable AMD64 arcfour implementation on WIN64 * cipher/arcfour-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/arcfour.c (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (do_encrypt, do_decrypt) [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]: Use assembly block to call AMD64 assembly function. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/arcfour-amd64.S b/cipher/arcfour-amd64.S index 8b8031a..2e52ea0 100644 --- a/cipher/arcfour-amd64.S +++ b/cipher/arcfour-amd64.S @@ -15,12 +15,19 @@ #ifdef __x86_64__ #include -#if defined(USE_ARCFOUR) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(USE_ARCFOUR) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif .text .align 16 .globl _gcry_arcfour_amd64 -.type _gcry_arcfour_amd64, at function +ELF(.type _gcry_arcfour_amd64, at function) _gcry_arcfour_amd64: push %rbp push %rbx @@ -91,7 +98,7 @@ _gcry_arcfour_amd64: pop %rbp ret .L__gcry_arcfour_amd64_end: -.size _gcry_arcfour_amd64,.L__gcry_arcfour_amd64_end-_gcry_arcfour_amd64 +ELF(.size _gcry_arcfour_amd64,.L__gcry_arcfour_amd64_end-_gcry_arcfour_amd64) #endif #endif diff --git a/cipher/arcfour.c b/cipher/arcfour.c index 27537bf..44e8ef4 100644 --- a/cipher/arcfour.c +++ b/cipher/arcfour.c @@ -33,7 +33,8 @@ /* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */ #undef USE_AMD64_ASM -#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) # define USE_AMD64_ASM 1 #endif @@ -53,7 +54,21 @@ static void encrypt_stream (void *context, byte *outbuf, const byte *inbuf, size_t length) { +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS + const void *fn = _gcry_arcfour_amd64; + /* Call SystemV ABI function without storing non-volatile XMM registers, + * as target function does not use vector instruction sets. */ + asm volatile ("callq *%0\n\t" + : "+a" (fn), + "+D" (context), + "+S" (length), + "+d" (inbuf), + "+c" (outbuf) + : + : "cc", "memory", "r8", "r9", "r10", "r11"); +#else _gcry_arcfour_amd64 (context, length, inbuf, outbuf ); +#endif } #else /*!USE_AMD64_ASM*/ commit ee8fc4edcb3466b03246c8720b90731bf274ff1d Author: Jussi Kivilinna Date: Thu May 14 10:02:51 2015 +0300 Update documentation for Poly1305-ChaCha20 AEAD, RFC-7539 * cipher/cipher-poly1305.c: Add RFC-7539 to header. * doc/gcrypt.texi: Update Poly1305 AEAD documentation with mention of RFC-7539; Drop Salsa from supported stream ciphers for Poly1305 AEAD. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-poly1305.c b/cipher/cipher-poly1305.c index f283333..965a7b6 100644 --- a/cipher/cipher-poly1305.c +++ b/cipher/cipher-poly1305.c @@ -1,4 +1,4 @@ -/* cipher-pol1305.c - Poly1305 based AEAD cipher mode +/* cipher-poly1305.c - Poly1305 based AEAD cipher mode, RFC-7539 * Copyright (C) 2014 Jussi Kivilinna * * This file is part of Libgcrypt. diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 8683ca8..ab4f685 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -1643,9 +1643,10 @@ Associated Data (AEAD) block cipher mode, which is specified in 'NIST Special Publication 800-38D'. @item GCRY_CIPHER_MODE_POLY1305 - at cindex Poly1305 based AEAD mode -Poly1305 is an Authenticated Encryption with Associated Data (AEAD) -mode, which can be used with ChaCha20 and Salsa20 stream ciphers. + at cindex Poly1305 based AEAD mode with ChaCha20 +This mode implements the Poly1305 Authenticated Encryption with Associated +Data (AEAD) mode according to RFC-7539. This mode can be used with ChaCha20 +stream cipher. @item GCRY_CIPHER_MODE_OCB @cindex OCB, OCB3 @@ -1687,7 +1688,7 @@ and the according constants. Note that some modes are incompatible with some algorithms - in particular, stream mode (@code{GCRY_CIPHER_MODE_STREAM}) only works with stream ciphers. Poly1305 AEAD mode (@code{GCRY_CIPHER_MODE_POLY1305}) only works with -ChaCha and Salsa stream ciphers. The block cipher modes +ChaCha20 stream cipher. The block cipher modes (@code{GCRY_CIPHER_MODE_ECB}, @code{GCRY_CIPHER_MODE_CBC}, @code{GCRY_CIPHER_MODE_CFB}, @code{GCRY_CIPHER_MODE_OFB} and @code{GCRY_CIPHER_MODE_CTR}) will work with any block cipher ----------------------------------------------------------------------- Summary of changes: cipher/arcfour-amd64.S | 13 ++++-- cipher/arcfour.c | 17 +++++++- cipher/blowfish-amd64.S | 46 ++++++++++++--------- cipher/blowfish.c | 74 +++++++++++++++++++++++++++++++-- cipher/camellia-aesni-avx-amd64.S | 41 +++++++++++-------- cipher/camellia-aesni-avx2-amd64.S | 29 ++++++++----- cipher/camellia-glue.c | 61 +++++++++++++++++++-------- cipher/cast5-amd64.S | 43 ++++++++++--------- cipher/cast5.c | 67 +++++++++++++++++++++++++++--- cipher/chacha20-avx2-amd64.S | 13 ++++-- cipher/chacha20-sse2-amd64.S | 13 ++++-- cipher/chacha20-ssse3-amd64.S | 13 ++++-- cipher/chacha20.c | 43 ++++++++++++++----- cipher/cipher-poly1305.c | 2 +- cipher/des-amd64.S | 29 ++++++++----- cipher/des.c | 61 +++++++++++++++++++++++++-- cipher/poly1305-avx2-amd64.S | 22 ++++++---- cipher/poly1305-internal.h | 27 +++++++++--- cipher/poly1305-sse2-amd64.S | 22 ++++++---- cipher/poly1305.c | 33 ++++++++------- cipher/salsa20-amd64.S | 17 +++++--- cipher/salsa20.c | 26 +++++++++--- cipher/serpent-avx2-amd64.S | 29 ++++++++----- cipher/serpent-sse2-amd64.S | 29 ++++++++----- cipher/serpent.c | 30 ++++++++++---- cipher/twofish-amd64.S | 37 ++++++++++------- cipher/twofish.c | 84 +++++++++++++++++++++++++++++++++++--- doc/gcrypt.texi | 9 ++-- 28 files changed, 699 insertions(+), 231 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From christian at grothoff.org Tue May 19 13:56:02 2015 From: christian at grothoff.org (Christian Grothoff) Date: Tue, 19 May 2015 13:56:02 +0200 Subject: triple DH In-Reply-To: <555B1CF4.7050207@gmail.com> References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> Message-ID: <555B24D2.4040405@grothoff.org> Hi! Bart just prompted me to look over libgcrypt's key generation for EdDSA vs. ECDHE (again). I noticed a two odd things. First, in 'ecc.c::nist_generate_key' you do (for EdDSA): rndbuf = _gcry_random_bytes_secure (32, random_level); rndbuf[0] &= 0x7f; /* Clear bit 255. */ rndbuf[0] |= 0x40; /* Set bit 254. */ rndbuf[31] &= 0xf8; /* Clear bits 2..0 so that d mod 8 == 0 */ _gcry_mpi_set_buffer (sk->d, rndbuf, 32, 0); The bit operations may seem to be to follow the EdDSA spec, but that's actually false. Those bit operations must be done AFTER the hashing, and you do those there as well, in ecc-edsa.c::508: reverse_buffer (hash_d, 32); /* Only the first half of the hash. */ hash_d[0] = (hash_d[0] & 0x7f) | 0x40; hash_d[31] &= 0xf8; _gcry_mpi_set_buffer (a, hash_d, 32, 0); So in ecc:c::nist_generate_key() they seem to be misplaced and just draining a bit of entropy from the key generation process (effectively reducing key size from 256 bits of entropy to 251). Now, what I was actually tring to do was establish why ECDHE key generation is 3x slower than EdDSA key generation (both on Ed25519). We use the following code: // Slow 'ECDHE' version: if (0 != (rc = gcry_sexp_build (&s_keyparam, NULL, "(genkey(ecc(curve Ed25519)" "(flags)))"))) { LOG_GCRY (GNUNET_ERROR_TYPE_ERROR, "gcry_sexp_build", rc); return NULL; } if (0 != (rc = gcry_pk_genkey (&priv_sexp, s_keyparam))) { LOG_GCRY (GNUNET_ERROR_TYPE_ERROR, "gcry_pk_genkey", rc); gcry_sexp_release (s_keyparam); return NULL; } // Fast 'EdDSA' version: if (0 != (rc = gcry_sexp_build (&s_keyparam, NULL, "(genkey(ecc(curve Ed25519)" "(flags eddsa)))"))) { LOG_GCRY (GNUNET_ERROR_TYPE_ERROR, "gcry_sexp_build", rc); return NULL; } if (0 != (rc = gcry_pk_genkey (&priv_sexp, s_keyparam))) { LOG_GCRY (GNUNET_ERROR_TYPE_ERROR, "gcry_pk_genkey", rc); gcry_sexp_release (s_keyparam); return NULL; } The benchmarking results are rather dramatic: On 05/19/2015 01:22 PM, Bart Polot wrote: > Still happens in svn head: > > [bart at voyager ~/g/src/util] (master *% u+1)$ ./perf_crypto_asymmetric > Init: 54 ?s > EdDSA create key: 3502 ?s <--- > EdDSA get pubilc: 3395 ?s > EdDSA sign HashCode: 7924 ?s > EdDSA verify HashCode: 6731 ?s > ECDH create key: 11054 ?s <--- > ECDH get public: 2353 ?s > ECDH do DH: 2684 ?s > [bart at voyager ~/g/src/util] (master *% u+1)$ pacman -Q libgcrypt > libgcrypt 1.6.3-2 > [bart at voyager ~/g/src/util] (master *% u+1)$ > > Why is this? In ecc.c:158, we see that if (E->dialect == ECC_DIALECT_ED25519) point_set (&sk->Q, &Q); else { // ... lots of code } the key generation logic diverges here. The reason is that for NIST curves (and other non-Curve25519) some logic is needed to ensure that the Q has the right sign. So I understand why this code is there, but why is it needed on Curve25519? AFAIK for ECDHE on Curve25519 we still don't need this. If I set the 'eddsa' flag when generating the ECDHE key, everything still works fine (done so in GNUnet SVN 35742), so that's an easy workaround. Still, feels 'wrong' to use such a hack. Happy hacking! Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From wk at gnupg.org Wed May 20 12:41:54 2015 From: wk at gnupg.org (Werner Koch) Date: Wed, 20 May 2015 12:41:54 +0200 Subject: triple DH In-Reply-To: <555B24D2.4040405@grothoff.org> (Christian Grothoff's message of "Tue, 19 May 2015 13:56:02 +0200") References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> Message-ID: <878ucjbkel.fsf@vigenere.g10code.de> On Tue, 19 May 2015 13:56, christian at grothoff.org said: > Why is this? In ecc.c:158, we see that > > if (E->dialect == ECC_DIALECT_ED25519) > point_set (&sk->Q, &Q); > else > { > // ... lots of code > } > > the key generation logic diverges here. The reason is that for NIST > curves (and other non-Curve25519) The comment a few lines above explains it: /* We want the Q=(x,y) be a "compliant key" in terms of the * http://tools.ietf.org/html/draft-jivsov-ecc-compact, which simply * means that we choose either Q=(x,y) or -Q=(x,p-y) such that we Thus this is about generating keys in a way to allow point compression in a non-patent encumbered way. Meanwhile the point compression patent expired and thus this does not make much sense anymore. I'll ask Andrey Jivsov on how we can proceed here. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Thu May 21 11:17:37 2015 From: wk at gnupg.org (Werner Koch) Date: Thu, 21 May 2015 11:17:37 +0200 Subject: triple DH In-Reply-To: <878ucjbkel.fsf@vigenere.g10code.de> (Werner Koch's message of "Wed, 20 May 2015 12:41:54 +0200") References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> <878ucjbkel.fsf@vigenere.g10code.de> Message-ID: <87zj4y8f2m.fsf@vigenere.g10code.de> On Wed, 20 May 2015 12:41, wk at gnupg.org said: > Thus this is about generating keys in a way to allow point compression > in a non-patent encumbered way. Meanwhile the point compression patent The reason for the lower speed can not be attributed to Jivsov's trick but the fact that we convert to affine coordinates twice (which requires an inversion). The attached patch remove the double conversion. Does this help? Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-ecc-Avoid-twice-conversion-to-affine-coordinates-in-.patch Type: text/x-diff Size: 8213 bytes Desc: not available URL: From christian at grothoff.org Thu May 21 13:36:57 2015 From: christian at grothoff.org (Christian Grothoff) Date: Thu, 21 May 2015 13:36:57 +0200 Subject: triple DH In-Reply-To: <87zj4y8f2m.fsf@vigenere.g10code.de> References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> <878ucjbkel.fsf@vigenere.g10code.de> <87zj4y8f2m.fsf@vigenere.g10code.de> Message-ID: <555DC359.4000308@grothoff.org> On 05/21/2015 11:17 AM, Werner Koch wrote: > On Wed, 20 May 2015 12:41, wk at gnupg.org said: > >> Thus this is about generating keys in a way to allow point compression >> in a non-patent encumbered way. Meanwhile the point compression patent > The reason for the lower speed can not be attributed to Jivsov's trick > but the fact that we convert to affine coordinates twice (which requires > an inversion). The attached patch remove the double conversion. > > Does this help? > > Short answer: no. Long answer: Measurements show that for ECDHE nist_generate_key() calls 38x gcry_mpi_ec_mul_point via _gcry_ecc_ecdsa_sign and 77x via gcry_ecc_ecdsa_verify and 38x via gcry_ecc_eddsa_genkey while for EdDSA nist_generate_key() calls 12x gcry_mpi_ec_mul_point via _gcry_ecc_ecdsa_sign and 23x via gcry_ecc_ecdsa_verify, and 12x via gcry_ecc_eddsa_genkey Detailed measurement plots were too big for the list (> 40k). -Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From cvs at cvs.gnupg.org Thu May 21 16:58:43 2015 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Thu, 21 May 2015 16:58:43 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-229-g2bddd94 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 2bddd947fd1c11b4ec461576db65a5e34fea1b07 (commit) via 102d68b3bd77813a3ff989526855bb1e283bf9d7 (commit) via 8124e357b732a719696bfd5271def4e528f2a1e1 (commit) from 9b0c6c8141ae9bd056392a3f6b5704b505fc8501 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 2bddd947fd1c11b4ec461576db65a5e34fea1b07 Author: Werner Koch Date: Thu May 21 16:24:36 2015 +0200 ecc: Add key generation flag "no-keytest". * src/cipher.h (PUBKEY_FLAG_NO_KEYTEST): New. * cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Add flag "no-keytest". Return an error for invalid flags of length 10. * cipher/ecc.c (nist_generate_key): Replace arg random_level by flags set random level depending on flags. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Ditto. * cipher/ecc.c (ecc_generate): Pass flags to generate fucntion and remove var random_level. (nist_generate_key): Implement "no-keytest" flag. * tests/keygen.c (check_ecc_keys): Add tests for transient-key and no-keytest. -- After key creation we usually run a test to check whether the keys really work. However for transient keys this might be too time consuming and given that a failed test would anyway abort the process the optional use of a flag to skip the test is appropriate. Using Ed25519 for EdDSA and the "no-keytest" flags halves the time to create such a key. This was measured by looping the last test from check_ecc_keys() 1000 times with and without the flag. Due to a bug in the flags parser unknown flags with a length of 10 characters were not detected. Thus the "no-keytest" flag can be employed by all software even for libraries before this. That bug is however solved with this version. Signed-off-by: Werner Koch diff --git a/NEWS b/NEWS index 4c74533..d90ee6d 100644 --- a/NEWS +++ b/NEWS @@ -23,6 +23,10 @@ Noteworthy changes in version 1.7.0 (unreleased) * Added OCB mode. + * New flag "no-keytest" for ECC key generation. Due to a bug in the + parser that flag will also be accepted but ignored by older version + of Libgcrypt. + * Interface changes relative to the 1.6.0 release: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gcry_mac_get_algo NEW. diff --git a/cipher/ecc-common.h b/cipher/ecc-common.h index 83bf20d..f0d97ea 100644 --- a/cipher/ecc-common.h +++ b/cipher/ecc-common.h @@ -123,7 +123,7 @@ gpg_err_code_t _gcry_ecc_eddsa_compute_h_d (unsigned char **r_digest, gpg_err_code_t _gcry_ecc_eddsa_genkey (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, - gcry_random_level_t random_level); + int flags); gpg_err_code_t _gcry_ecc_eddsa_sign (gcry_mpi_t input, ECC_secret_key *sk, gcry_mpi_t r_r, gcry_mpi_t s, diff --git a/cipher/ecc-eddsa.c b/cipher/ecc-eddsa.c index a12ebab..4323d8e 100644 --- a/cipher/ecc-eddsa.c +++ b/cipher/ecc-eddsa.c @@ -465,15 +465,28 @@ _gcry_ecc_eddsa_compute_h_d (unsigned char **r_digest, } -/* Ed25519 version of the key generation. */ +/** + * _gcry_ecc_eddsa_genkey - EdDSA version of the key generation. + * + * @sk: A struct to receive the secret key. + * @E: Parameters of the curve. + * @ctx: Elliptic curve computation context. + * @flags: Flags controlling aspects of the creation. + * + * Return: An error code. + * + * The only @flags bit used by this function is %PUBKEY_FLAG_TRANSIENT + * to use a faster RNG. + */ gpg_err_code_t _gcry_ecc_eddsa_genkey (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, - gcry_random_level_t random_level) + int flags) { gpg_err_code_t rc; int b = 256/8; /* The only size we currently support. */ gcry_mpi_t a, x, y; mpi_point_struct Q; + gcry_random_level_t random_level; char *dbuf; size_t dlen; gcry_buffer_t hvec[1]; @@ -482,6 +495,11 @@ _gcry_ecc_eddsa_genkey (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, point_init (&Q); memset (hvec, 0, sizeof hvec); + if ((flags & PUBKEY_FLAG_TRANSIENT_KEY)) + random_level = GCRY_STRONG_RANDOM; + else + random_level = GCRY_VERY_STRONG_RANDOM; + a = mpi_snew (0); x = mpi_new (0); y = mpi_new (0); diff --git a/cipher/ecc.c b/cipher/ecc.c index 262fcd8..5ffe84b 100644 --- a/cipher/ecc.c +++ b/cipher/ecc.c @@ -1,6 +1,6 @@ /* ecc.c - Elliptic Curve Cryptography * Copyright (C) 2007, 2008, 2010, 2011 Free Software Foundation, Inc. - * Copyright (C) 2013 g10 Code GmbH + * Copyright (C) 2013, 2015 g10 Code GmbH * * This file is part of Libgcrypt. * @@ -106,12 +106,11 @@ _gcry_register_pk_ecc_progress (void (*cb) (void *, const char *, /** - * nist_generate_key - Standard version of the key generation. - * + * nist_generate_key - Standard version of the ECC key generation. * @sk: A struct to receive the secret key. * @E: Parameters of the curve. * @ctx: Elliptic curve computation context. - * @random_level: The quality of the random. + * @flags: Flags controlling aspects of the creation. * @nbits: Only for testing * @r_x: On success this receives an allocated MPI with the affine * x-coordinate of the poblic key. On error NULL is stored. @@ -119,19 +118,29 @@ _gcry_register_pk_ecc_progress (void (*cb) (void *, const char *, * * Return: An error code. * + * The @flags bits used by this function are %PUBKEY_FLAG_TRANSIENT to + * use a faster RNG, and %PUBKEY_FLAG_NO_KEYTEST to skip the assertion + * that the key works as expected. + * * FIXME: Check whether N is needed. */ static gpg_err_code_t nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, - gcry_random_level_t random_level, unsigned int nbits, + int flags, unsigned int nbits, gcry_mpi_t *r_x, gcry_mpi_t *r_y) { mpi_point_struct Q; + gcry_random_level_t random_level; gcry_mpi_t x, y; const unsigned int pbits = mpi_get_nbits (E->p); point_init (&Q); + if ((flags & PUBKEY_FLAG_TRANSIENT_KEY)) + random_level = GCRY_STRONG_RANDOM; + else + random_level = GCRY_VERY_STRONG_RANDOM; + /* Generate a secret. */ if (ctx->dialect == ECC_DIALECT_ED25519) { @@ -226,7 +235,9 @@ nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, point_free (&Q); /* Now we can test our keys (this should never fail!). */ - if (sk->E.model != MPI_EC_MONTGOMERY) + if ((flags & PUBKEY_FLAG_NO_KEYTEST)) + ; /* User requested to skip the test. */ + else if (sk->E.model != MPI_EC_MONTGOMERY) test_keys (sk, nbits - 64); else test_ecdh_only_keys (sk, nbits - 64); @@ -492,7 +503,6 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) gcry_mpi_t Qy = NULL; char *curve_name = NULL; gcry_sexp_t l1; - gcry_random_level_t random_level; mpi_ec_t ctx = NULL; gcry_sexp_t curve_info = NULL; gcry_sexp_t curve_flags = NULL; @@ -560,17 +570,12 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) log_printpnt ("ecgen curve G", &E.G, NULL); } - if ((flags & PUBKEY_FLAG_TRANSIENT_KEY)) - random_level = GCRY_STRONG_RANDOM; - else - random_level = GCRY_VERY_STRONG_RANDOM; - ctx = _gcry_mpi_ec_p_internal_new (E.model, E.dialect, 0, E.p, E.a, E.b); if ((flags & PUBKEY_FLAG_EDDSA)) - rc = _gcry_ecc_eddsa_genkey (&sk, &E, ctx, random_level); + rc = _gcry_ecc_eddsa_genkey (&sk, &E, ctx, flags); else - rc = nist_generate_key (&sk, &E, ctx, random_level, nbits, &Qx, &Qy); + rc = nist_generate_key (&sk, &E, ctx, flags, nbits, &Qx, &Qy); if (rc) goto leave; diff --git a/cipher/pubkey-util.c b/cipher/pubkey-util.c index 514f1eb..afa3454 100644 --- a/cipher/pubkey-util.c +++ b/cipher/pubkey-util.c @@ -1,7 +1,7 @@ /* pubkey-util.c - Supporting functions for all pubkey modules. * Copyright (C) 1998, 1999, 2000, 2002, 2003, 2005, * 2007, 2008, 2011 Free Software Foundation, Inc. - * Copyright (C) 2013 g10 Code GmbH + * Copyright (C) 2013, 2015 g10 Code GmbH * * This file is part of Libgcrypt. * @@ -155,6 +155,10 @@ _gcry_pk_util_parse_flaglist (gcry_sexp_t list, case 10: if (!memcmp (s, "igninvflag", 10)) igninvflag = 1; + else if (!memcmp (s, "no-keytest", 10)) + flags |= PUBKEY_FLAG_NO_KEYTEST; + else if (!igninvflag) + rc = GPG_ERR_INV_FLAG; break; case 11: diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index ab4f685..f13695a 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -2327,6 +2327,13 @@ random number generator. This flag may be used for keys which are only used for a short time or per-message and do not require full cryptographic strength. + at item no-keytest + at cindex no-keytest +This flag skips internal failsafe tests to assert that a generated key +is properly working. It currently has an effect only for standard ECC +key generation. It is mostly useful along with transient-key to +achieve fastest ECC key generation. + @item use-x931 @cindex X9.31 Force the use of the ANSI X9.31 key generation algorithm instead of diff --git a/src/cipher.h b/src/cipher.h index 7ad0b2c..ef183fd 100644 --- a/src/cipher.h +++ b/src/cipher.h @@ -40,6 +40,7 @@ #define PUBKEY_FLAG_NOCOMP (1 << 11) #define PUBKEY_FLAG_EDDSA (1 << 12) #define PUBKEY_FLAG_GOST (1 << 13) +#define PUBKEY_FLAG_NO_KEYTEST (1 << 14) enum pk_operation diff --git a/tests/keygen.c b/tests/keygen.c index 4aff9c9..8b9a1d5 100644 --- a/tests/keygen.c +++ b/tests/keygen.c @@ -1,5 +1,6 @@ /* keygen.c - key generation regression tests * Copyright (C) 2003, 2005, 2012 Free Software Foundation, Inc. + * Copyright (C) 2013, 2015 g10 Code GmbH * * This file is part of Libgcrypt. * @@ -14,8 +15,7 @@ * GNU Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public - * License along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA + * License along with this program; if not, see . */ #ifdef HAVE_CONFIG_H @@ -432,7 +432,43 @@ check_ecc_keys (void) show_sexp ("ECC key:\n", key); check_generated_ecc_key (key); + gcry_sexp_release (key); + + + if (verbose) + show ("creating ECC key using curve Ed25519 for ECDSA (transient-key)\n"); + rc = gcry_sexp_build (&keyparm, NULL, + "(genkey(ecc(curve Ed25519)(flags transient-key)))"); + if (rc) + die ("error creating S-expression: %s\n", gpg_strerror (rc)); + rc = gcry_pk_genkey (&key, keyparm); + gcry_sexp_release (keyparm); + if (rc) + die ("error generating ECC key using curve Ed25519 for ECDSA" + " (transient-key): %s\n", + gpg_strerror (rc)); + if (verbose > 1) + show_sexp ("ECC key:\n", key); + check_generated_ecc_key (key); + gcry_sexp_release (key); + if (verbose) + show ("creating ECC key using curve Ed25519 for ECDSA " + "(transient-key no-keytest)\n"); + rc = gcry_sexp_build (&keyparm, NULL, + "(genkey(ecc(curve Ed25519)" + "(flags transient-key no-keytest)))"); + if (rc) + die ("error creating S-expression: %s\n", gpg_strerror (rc)); + rc = gcry_pk_genkey (&key, keyparm); + gcry_sexp_release (keyparm); + if (rc) + die ("error generating ECC key using curve Ed25519 for ECDSA" + " (transient-key no-keytest): %s\n", + gpg_strerror (rc)); + if (verbose > 1) + show_sexp ("ECC key:\n", key); + check_generated_ecc_key (key); gcry_sexp_release (key); } commit 102d68b3bd77813a3ff989526855bb1e283bf9d7 Author: Werner Koch Date: Thu May 21 11:12:42 2015 +0200 ecc: Avoid double conversion to affine coordinates in keygen. * cipher/ecc.c (nist_generate_key): Add args r_x and r_y. (ecc_generate): Rename vars. Convert to affine coordinates only if not returned by the lower level generation function. -- nist_generate_key already needs to convert to affine coordinates to implement Jivsov's trick. Thus we can return them and avoid calling it in ecc_generate again. Signed-off-by: Werner Koch diff --git a/cipher/ecc.c b/cipher/ecc.c index 2f5e401..262fcd8 100644 --- a/cipher/ecc.c +++ b/cipher/ecc.c @@ -105,12 +105,30 @@ _gcry_register_pk_ecc_progress (void (*cb) (void *, const char *, -/* Standard version of the key generation. */ +/** + * nist_generate_key - Standard version of the key generation. + * + * @sk: A struct to receive the secret key. + * @E: Parameters of the curve. + * @ctx: Elliptic curve computation context. + * @random_level: The quality of the random. + * @nbits: Only for testing + * @r_x: On success this receives an allocated MPI with the affine + * x-coordinate of the poblic key. On error NULL is stored. + * @r_y: Ditto for the y-coordinate. + * + * Return: An error code. + * + * FIXME: Check whether N is needed. + */ static gpg_err_code_t nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, - gcry_random_level_t random_level, unsigned int nbits) + gcry_random_level_t random_level, unsigned int nbits, + gcry_mpi_t *r_x, gcry_mpi_t *r_y) { mpi_point_struct Q; + gcry_mpi_t x, y; + const unsigned int pbits = mpi_get_nbits (E->p); point_init (&Q); @@ -146,6 +164,11 @@ nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, sk->E.h = mpi_copy (E->h); point_init (&sk->Q); + x = mpi_new (pbits); + y = mpi_new (pbits); + if (_gcry_mpi_ec_get_affine (x, y, &Q, ctx)) + log_fatal ("ecgen: Failed to get affine coordinates for %s\n", "Q"); + /* We want the Q=(x,y) be a "compliant key" in terms of the * http://tools.ietf.org/html/draft-jivsov-ecc-compact, which simply * means that we choose either Q=(x,y) or -Q=(x,p-y) such that we @@ -159,16 +182,10 @@ nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, point_set (&sk->Q, &Q); else { - gcry_mpi_t x, y, negative; - const unsigned int pbits = mpi_get_nbits (E->p); + gcry_mpi_t negative; - x = mpi_new (pbits); - y = mpi_new (pbits); negative = mpi_new (pbits); - if (_gcry_mpi_ec_get_affine (x, y, &Q, ctx)) - log_fatal ("ecgen: Failed to get affine coordinates for %s\n", "Q"); - if (E->model == MPI_EC_WEIERSTRASS) mpi_sub (negative, E->p, y); /* negative = p - y */ else @@ -178,12 +195,18 @@ nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, { /* We need to end up with -Q; this assures that new Q's y is the smallest one */ - mpi_sub (sk->d, E->n, sk->d); /* d = order - d */ if (E->model == MPI_EC_WEIERSTRASS) - mpi_point_snatch_set (&sk->Q, x, negative, - mpi_alloc_set_ui (1)); + { + mpi_free (y); + y = negative; + } else - mpi_point_snatch_set (&sk->Q, negative, y, mpi_alloc_set_ui (1)); + { + mpi_free (x); + x = negative; + } + mpi_sub (sk->d, E->n, sk->d); /* d = order - d */ + mpi_point_set (&sk->Q, x, y, mpi_const (MPI_C_ONE)); if (DBG_CIPHER) log_debug ("ecgen converted Q to a compliant point\n"); @@ -191,23 +214,16 @@ nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx, else /* p - y >= p */ { /* No change is needed exactly 50% of the time: just copy. */ + mpi_free (negative); point_set (&sk->Q, &Q); if (DBG_CIPHER) log_debug ("ecgen didn't need to convert Q to a compliant point\n"); - - mpi_free (negative); - if (E->model == MPI_EC_WEIERSTRASS) - mpi_free (x); - else - mpi_free (y); } - - if (E->model == MPI_EC_WEIERSTRASS) - mpi_free (y); - else - mpi_free (x); } + *r_x = x; + *r_y = y; + point_free (&Q); /* Now we can test our keys (this should never fail!). */ if (sk->E.model != MPI_EC_MONTGOMERY) @@ -470,8 +486,10 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) unsigned int nbits; elliptic_curve_t E; ECC_secret_key sk; - gcry_mpi_t x = NULL; - gcry_mpi_t y = NULL; + gcry_mpi_t Gx = NULL; + gcry_mpi_t Gy = NULL; + gcry_mpi_t Qx = NULL; + gcry_mpi_t Qy = NULL; char *curve_name = NULL; gcry_sexp_t l1; gcry_random_level_t random_level; @@ -548,26 +566,27 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) random_level = GCRY_VERY_STRONG_RANDOM; ctx = _gcry_mpi_ec_p_internal_new (E.model, E.dialect, 0, E.p, E.a, E.b); - x = mpi_new (0); - y = mpi_new (0); if ((flags & PUBKEY_FLAG_EDDSA)) rc = _gcry_ecc_eddsa_genkey (&sk, &E, ctx, random_level); else - rc = nist_generate_key (&sk, &E, ctx, random_level, nbits); + rc = nist_generate_key (&sk, &E, ctx, random_level, nbits, &Qx, &Qy); if (rc) goto leave; /* Copy data to the result. */ - if (_gcry_mpi_ec_get_affine (x, y, &sk.E.G, ctx)) + Gx = mpi_new (0); + Gy = mpi_new (0); + if (_gcry_mpi_ec_get_affine (Gx, Gy, &sk.E.G, ctx)) log_fatal ("ecgen: Failed to get affine coordinates for %s\n", "G"); - base = _gcry_ecc_ec2os (x, y, sk.E.p); + base = _gcry_ecc_ec2os (Gx, Gy, sk.E.p); if (sk.E.dialect == ECC_DIALECT_ED25519 && !(flags & PUBKEY_FLAG_NOCOMP)) { unsigned char *encpk; unsigned int encpklen; - rc = _gcry_ecc_eddsa_encodepoint (&sk.Q, ctx, x, y, + /* (Gx and Gy are used as scratch variables) */ + rc = _gcry_ecc_eddsa_encodepoint (&sk.Q, ctx, Gx, Gy, !!(flags & PUBKEY_FLAG_COMP), &encpk, &encpklen); if (rc) @@ -578,9 +597,16 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) } else { - if (_gcry_mpi_ec_get_affine (x, y, &sk.Q, ctx)) - log_fatal ("ecgen: Failed to get affine coordinates for %s\n", "Q"); - public = _gcry_ecc_ec2os (x, y, sk.E.p); + if (!Qx) + { + /* This is the case for a key from _gcry_ecc_eddsa_generate + with no compression. */ + Qx = mpi_new (0); + Qy = mpi_new (0); + if (_gcry_mpi_ec_get_affine (Qx, Qy, &sk.Q, ctx)) + log_fatal ("ecgen: Failed to get affine coordinates for %s\n", "Q"); + } + public = _gcry_ecc_ec2os (Qx, Qy, sk.E.p); } secret = sk.d; sk.d = NULL; if (E.name) @@ -614,7 +640,8 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) curve_info, curve_flags, sk.E.p, sk.E.a, sk.E.b, base, sk.E.n, sk.E.h, public, curve_info, curve_flags, - sk.E.p, sk.E.a, sk.E.b, base, sk.E.n, sk.E.h, public, secret); + sk.E.p, sk.E.a, sk.E.b, base, sk.E.n, sk.E.h, public, + secret); else rc = sexp_build (r_skey, NULL, "(key-data" @@ -654,8 +681,10 @@ ecc_generate (const gcry_sexp_t genparms, gcry_sexp_t *r_skey) mpi_free (sk.d); } _gcry_ecc_curve_free (&E); - mpi_free (x); - mpi_free (y); + mpi_free (Gx); + mpi_free (Gy); + mpi_free (Qx); + mpi_free (Qy); _gcry_mpi_ec_free (ctx); sexp_release (curve_flags); sexp_release (curve_info); commit 8124e357b732a719696bfd5271def4e528f2a1e1 Author: Werner Koch Date: Mon May 4 16:46:02 2015 +0200 random: Change initial extra seeding from 2400 bits to 128 bits. * random/random-csprng.c (read_pool): Reduce initial seeding. -- See discussion starting at https://lists.gnupg.org/pipermail/gnupg-devel/2015-April/029750.html and also in May. Signed-off-by: Werner Koch diff --git a/random/random-csprng.c b/random/random-csprng.c index 332744b..da50fda 100644 --- a/random/random-csprng.c +++ b/random/random-csprng.c @@ -973,8 +973,8 @@ read_pool (byte *buffer, size_t length, int level) pool_balance = 0; needed = length - pool_balance; - if (needed < POOLSIZE/2) - needed = POOLSIZE/2; + if (needed < 16) /* At least 128 bits. */ + needed = 16; else if( needed > POOLSIZE ) BUG (); read_random_source (RANDOM_ORIGIN_EXTRAPOLL, needed, ----------------------------------------------------------------------- Summary of changes: NEWS | 4 ++ cipher/ecc-common.h | 2 +- cipher/ecc-eddsa.c | 22 ++++++++- cipher/ecc.c | 128 +++++++++++++++++++++++++++++++------------------ cipher/pubkey-util.c | 6 ++- doc/gcrypt.texi | 7 +++ random/random-csprng.c | 4 +- src/cipher.h | 1 + tests/keygen.c | 40 +++++++++++++++- 9 files changed, 159 insertions(+), 55 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Thu May 21 17:05:31 2015 From: wk at gnupg.org (Werner Koch) Date: Thu, 21 May 2015 17:05:31 +0200 Subject: triple DH In-Reply-To: <555DC359.4000308@grothoff.org> (Christian Grothoff's message of "Thu, 21 May 2015 13:36:57 +0200") References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> <878ucjbkel.fsf@vigenere.g10code.de> <87zj4y8f2m.fsf@vigenere.g10code.de> <555DC359.4000308@grothoff.org> Message-ID: <877fs27yys.fsf@vigenere.g10code.de> On Thu, 21 May 2015 13:36, christian at grothoff.org said: > ECDHE nist_generate_key() calls 38x gcry_mpi_ec_mul_point via > _gcry_ecc_ecdsa_sign and 77x via gcry_ecc_ecdsa_verify and 38x via > gcry_ecc_eddsa_genkey Frankly, I don't understand this report: Why is gcry_ecc_edddsa_genkey reported - it is only used if you request an EdDSA key using the eddsa flag. Anyway, the tests take quite some time. I have pushed another change: ecc: Add key generation flag "no-keytest". * src/cipher.h (PUBKEY_FLAG_NO_KEYTEST): New. * cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Add flag "no-keytest". Return an error for invalid flags of length 10. * cipher/ecc.c (nist_generate_key): Replace arg random_level by flags set random level depending on flags. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Ditto. * cipher/ecc.c (ecc_generate): Pass flags to generate fucntion and remove var random_level. (nist_generate_key): Implement "no-keytest" flag. * tests/keygen.c (check_ecc_keys): Add tests for transient-key and no-keytest. -- After key creation we usually run a test to check whether the keys really work. However for transient keys this might be too time consuming and given that a failed test would anyway abort the process the optional use of a flag to skip the test is appropriate. Using Ed25519 for EdDSA and the "no-keytest" flags halves the time to create such a key. This was measured by looping the last test from check_ecc_keys() 1000 times with and without the flag. Due to a bug in the flags parser unknown flags with a length of 10 characters were not detected. Thus the "no-keytest" flag can be employed by all software even for libraries before this. That bug is however solved with this version. I also pushed the tweak for the RNG which was discussed earlier this month. If that improves things for you, shall I backport them to 1.6 ? Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Thu May 21 17:12:04 2015 From: wk at gnupg.org (Werner Koch) Date: Thu, 21 May 2015 17:12:04 +0200 Subject: Ed25519 key generation (was: triple DH) In-Reply-To: <555B24D2.4040405@grothoff.org> (Christian Grothoff's message of "Tue, 19 May 2015 13:56:02 +0200") References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> Message-ID: <87382q7ynv.fsf_-_@vigenere.g10code.de> On Tue, 19 May 2015 13:56, christian at grothoff.org said: > I noticed a two odd things. First, in 'ecc.c::nist_generate_key' you do > (for EdDSA): > > rndbuf = _gcry_random_bytes_secure (32, random_level); > rndbuf[0] &= 0x7f; /* Clear bit 255. */ > rndbuf[0] |= 0x40; /* Set bit 254. */ > rndbuf[31] &= 0xf8; /* Clear bits 2..0 so that d mod 8 == 0 */ > _gcry_mpi_set_buffer (sk->d, rndbuf, 32, 0); > > The bit operations may seem to be to follow the EdDSA spec, but that's > actually false. Those They are part of the Ed25519 curve specification. You find them in nist_generate_key for plain use of the curve and slighly different in _gcry_ecc_eddsa_genkey for generating a curve for use with EdDSA. Only one of these functions is ever used by the opt level ecc_generate(): if ((flags & PUBKEY_FLAG_EDDSA)) rc = _gcry_ecc_eddsa_genkey (&sk, &E, ctx, flags); else rc = nist_generate_key (&sk, &E, ctx, flags, nbits, &Qx, &Qy); Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From eugene.zelenko at gmail.com Fri May 22 00:41:05 2015 From: eugene.zelenko at gmail.com (Eugene Zelenko) Date: Thu, 21 May 2015 15:41:05 -0700 Subject: Problem with building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler In-Reply-To: References: Message-ID: Hi! Other prototype/declaration discrepancies: "ec.c", line 544.1: 1506-343 (S) Redeclaration of _gcry_mpi_ec_set_mpi differs from previous declaration on line 295 of "../src/mpi.h". "ec.c", line 554.1: 1506-343 (S) Redeclaration of _gcry_mpi_ec_set_point differs from previous declaration on line 297 of "../src/mpi.h". "ecc-curves.c", line 746.1: 1506-343 (S) Redeclaration of _gcry_mpi_ec_new differs from previous declaration on line 302 of "../src/mpi.h". "global.c", line 343.1: 1506-343 (S) Redeclaration of _gcry_vcontrol differs from previous declaration on line 95 of "g10lib.h". "sexp.c", line 2427.1: 1506-343 (S) Redeclaration of _gcry_sexp_extract_param differs from previous declaration on line 331 of "gcrypt-int.h". I compiled on AIX 6.1. Configuration options: ./configure --prefix=${PWD}/../install \ --enable-static --disable-shared --disable-asm \ --with-gpg-error-prefix=${Root}/libgpg-error-1.18/install \ --with-libgpg-error-prefix=${Root}/libgpg-error-1.18/install \ CC=xlc CFLAGS="-q64 -O3 -qansialias -w -qtune=auto" \ CXX=xlC CXXFLAGS="-q64 -D_UNIX64 -O3 -qansialias -w -qtune=auto" \ LDFLAGS="-b64" AR_FLAGS="-X64 cr" MAKE=gmake Eugene. From christian at grothoff.org Fri May 22 09:29:57 2015 From: christian at grothoff.org (Christian Grothoff) Date: Fri, 22 May 2015 09:29:57 +0200 Subject: triple DH In-Reply-To: <877fs27yys.fsf@vigenere.g10code.de> References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> <878ucjbkel.fsf@vigenere.g10code.de> <87zj4y8f2m.fsf@vigenere.g10code.de> <555DC359.4000308@grothoff.org> <877fs27yys.fsf@vigenere.g10code.de> Message-ID: <555EDAF5.1050306@grothoff.org> On 05/21/2015 05:05 PM, Werner Koch wrote: > If that improves things for you, shall I backport them to 1.6 ? I re-ran my benchmarks and can confirm that with this flag, the key generation slowdown is gone. However, as I can use the hack of passing the 'eddsa' flag to quickly generate an ECDHE-key, and as that fortunate parser-bug makes sure that 1.6 accepts the new code, I don't see a strong reason to put in the effort to backport it. Thanks! Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From wk at gnupg.org Fri May 22 09:27:58 2015 From: wk at gnupg.org (Werner Koch) Date: Fri, 22 May 2015 09:27:58 +0200 Subject: Problem with building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler In-Reply-To: (Eugene Zelenko's message of "Thu, 21 May 2015 15:41:05 -0700") References: Message-ID: <87twv55awx.fsf@vigenere.g10code.de> On Fri, 22 May 2015 00:41, eugene.zelenko at gmail.com said: > "sexp.c", line 2427.1: 1506-343 (S) Redeclaration of > _gcry_sexp_extract_param differs from previous declaration on line 331 > of "gcrypt-int.h". > ./configure --prefix=${PWD}/../install \ > --enable-static --disable-shared --disable-asm \ > --with-gpg-error-prefix=${Root}/libgpg-error-1.18/install \ > --with-libgpg-error-prefix=${Root}/libgpg-error-1.18/install \ > CC=xlc CFLAGS="-q64 -O3 -qansialias -w -qtune=auto" \ By overriding CFLAGS you also override most of the warning options. See configure.ac. In case there are certain options which are required on a certain target it is possible to _add_ them to the default CFLAGS (grep fox hpux in configure.ac for an example). Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From wk at gnupg.org Fri May 22 10:28:35 2015 From: wk at gnupg.org (Werner Koch) Date: Fri, 22 May 2015 10:28:35 +0200 Subject: triple DH In-Reply-To: <555EDAF5.1050306@grothoff.org> (Christian Grothoff's message of "Fri, 22 May 2015 09:29:57 +0200") References: <555A5433.7090209@grothoff.org> <555A57C6.9070809@gmail.com> <555A62D4.2010604@grothoff.org> <555A69B3.1040800@gmail.com> <555AFDD6.9080208@grothoff.org> <555B0F45.3090405@gmail.com> <555B18BD.6040704@grothoff.org> <555B18A3.7030202@gmail.com> <555B1B2F.3040908@grothoff.org> <555B1CF4.7050207@gmail.com> <555B24D2.4040405@grothoff.org> <878ucjbkel.fsf@vigenere.g10code.de> <87zj4y8f2m.fsf@vigenere.g10code.de> <555DC359.4000308@grothoff.org> <877fs27yys.fsf@vigenere.g10code.de> <555EDAF5.1050306@grothoff.org> Message-ID: <87egm9583w.fsf@vigenere.g10code.de> On Fri, 22 May 2015 09:29, christian at grothoff.org said: > However, as I can use the hack of passing the 'eddsa' flag to quickly > generate an ECDHE-key, That is a too ugly hack. > and as that fortunate parser-bug makes sure that 1.6 accepts the new > code, I don't see a > strong reason to put in the effort to backport it. Already done will be in 1.6.4. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. From eugene.zelenko at gmail.com Fri May 22 21:01:37 2015 From: eugene.zelenko at gmail.com (Eugene Zelenko) Date: Fri, 22 May 2015 12:01:37 -0700 Subject: Problem with building libgcrypt 1.6.3 on AIX 6.1 with IBM compiler In-Reply-To: <87twv55awx.fsf@vigenere.g10code.de> References: <87twv55awx.fsf@vigenere.g10code.de> Message-ID: Hi, Werner! I removed all FLAGS from configure options and I still have same problem. Anyway, I think function prototypes and definitions should be consistent. With best regards, Eugene. On Fri, May 22, 2015 at 12:27 AM, Werner Koch wrote: > On Fri, 22 May 2015 00:41, eugene.zelenko at gmail.com said: > >> "sexp.c", line 2427.1: 1506-343 (S) Redeclaration of >> _gcry_sexp_extract_param differs from previous declaration on line 331 >> of "gcrypt-int.h". > >> ./configure --prefix=${PWD}/../install \ >> --enable-static --disable-shared --disable-asm \ >> --with-gpg-error-prefix=${Root}/libgpg-error-1.18/install \ >> --with-libgpg-error-prefix=${Root}/libgpg-error-1.18/install \ >> CC=xlc CFLAGS="-q64 -O3 -qansialias -w -qtune=auto" \ > > By overriding CFLAGS you also override most of the warning options. See > configure.ac. In case there are certain options which are required on a > certain target it is possible to _add_ them to the default CFLAGS (grep > fox hpux in configure.ac for an example). > > > Salam-Shalom, > > Werner > > > -- > Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. >