[PATCH v2 2/2] Add SM4 ARMv8/AArch64 assembly implementation

Jussi Kivilinna jussi.kivilinna at iki.fi
Tue Feb 22 19:22:45 CET 2022


On 22.2.2022 14.18, Tianjia Zhang wrote:
> * cipher/Makefile.am: Add 'sm4-aarch64.S'.
> * cipher/sm4-aarch64.S: New.
> * cipher/sm4.c (USE_AARCH64_SIMD): New.
> (SM4_context) [USE_AARCH64_SIMD]: Add 'use_aarch64_simd'.
> [USE_AARCH64_SIMD] (_gcry_sm4_aarch64_crypt)
> (_gcry_sm4_aarch64_ctr_enc, _gcry_sm4_aarch64_cbc_dec)
> (_gcry_sm4_aarch64_cfb_dec, _gcry_sm4_aarch64_crypt_blk1_8)
> (sm4_aarch64_crypt_blk1_8): New.
> (sm4_setkey): Enable ARMv8/AArch64 if supported by HW.
> (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
> (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AARCH64_SIMD]:
> Add ARMv8/AArch64 bulk functions.
> * configure.ac: Add 'sm4-aarch64.lo'.
> --
> 
> This patch adds ARMv8/AArch64 bulk encryption/decryption. Bulk
> functions process eight blocks in parallel.
> 
> Benchmark on T-Head Yitian-710 2.75 GHz:
> 
> Before:
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          CBC enc |     12.10 ns/B     78.81 MiB/s     33.28 c/B      2750
>          CBC dec |      7.19 ns/B     132.6 MiB/s     19.77 c/B      2750
>          CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
>          CFB dec |      7.24 ns/B     131.8 MiB/s     19.90 c/B      2750
>          CTR enc |      7.24 ns/B     131.7 MiB/s     19.90 c/B      2750
>          CTR dec |      7.24 ns/B     131.7 MiB/s     19.91 c/B      2750
>          GCM enc |      9.49 ns/B     100.4 MiB/s     26.11 c/B      2750
>          GCM dec |      9.49 ns/B     100.5 MiB/s     26.10 c/B      2750
>         GCM auth |      2.25 ns/B     423.1 MiB/s      6.20 c/B      2750
>          OCB enc |      7.35 ns/B     129.8 MiB/s     20.20 c/B      2750
>          OCB dec |      7.36 ns/B     129.6 MiB/s     20.23 c/B      2750
>         OCB auth |      7.29 ns/B     130.8 MiB/s     20.04 c/B      2749
> 
> After (~55% faster):
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          CBC enc |     12.10 ns/B     78.79 MiB/s     33.28 c/B      2750
>          CBC dec |      4.63 ns/B     205.9 MiB/s     12.74 c/B      2749
>          CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
>          CFB dec |      4.64 ns/B     205.5 MiB/s     12.76 c/B      2750
>          CTR enc |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
>          CTR dec |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
>          GCM enc |      4.88 ns/B     195.4 MiB/s     13.42 c/B      2750
>          GCM dec |      4.88 ns/B     195.5 MiB/s     13.42 c/B      2750
>         GCM auth |     0.189 ns/B      5048 MiB/s     0.520 c/B      2750
>          OCB enc |      4.86 ns/B     196.0 MiB/s     13.38 c/B      2750
>          OCB dec |      4.90 ns/B     194.7 MiB/s     13.47 c/B      2750
>         OCB auth |      4.79 ns/B     199.0 MiB/s     13.18 c/B      2750
> 
> Signed-off-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
> ---
>   cipher/Makefile.am   |   2 +-
>   cipher/sm4-aarch64.S | 642 +++++++++++++++++++++++++++++++++++++++++++
>   cipher/sm4.c         | 129 +++++++++
>   configure.ac         |   3 +
>   4 files changed, 775 insertions(+), 1 deletion(-)
>   create mode 100644 cipher/sm4-aarch64.S
> 
> diff --git a/cipher/Makefile.am b/cipher/Makefile.am
> index 264b3d30..6c1c7693 100644
> --- a/cipher/Makefile.am
> +++ b/cipher/Makefile.am
> @@ -116,7 +116,7 @@ EXTRA_libcipher_la_SOURCES = \
>   	scrypt.c \
>   	seed.c \
>   	serpent.c serpent-sse2-amd64.S \
> -	sm4.c sm4-aesni-avx-amd64.S sm4-aesni-avx2-amd64.S \
> +	sm4.c sm4-aesni-avx-amd64.S sm4-aesni-avx2-amd64.S sm4-aarch64.S \
>   	serpent-avx2-amd64.S serpent-armv7-neon.S \
>   	sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \
>   	sha1-avx2-bmi2-amd64.S sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \
> diff --git a/cipher/sm4-aarch64.S b/cipher/sm4-aarch64.S
> new file mode 100644
> index 00000000..8d29be37
> --- /dev/null
> +++ b/cipher/sm4-aarch64.S
> @@ -0,0 +1,642 @@
> +/* sm4-aarch64.S  -  ARMv8/AArch64 accelerated SM4 cipher
> + *
> + * Copyright (C) 2022 Alibaba Group.
> + * Copyright (C) 2022 Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
> + *
> + * This file is part of Libgcrypt.
> + *
> + * Libgcrypt is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as
> + * published by the Free Software Foundation; either version 2.1 of
> + * the License, or (at your option) any later version.
> + *
> + * Libgcrypt is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "asm-common-aarch64.h"
> +
> +#if defined(__AARCH64EL__) && \
> +    defined(HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS) && \
> +    defined(HAVE_GCC_INLINE_ASM_AARCH64_NEON) && \
> +    defined(USE_SM4)
> +
> +.cpu generic+simd
> +
> +/* Constants */
> +
> +.text
> +.align 16

Alignment to 65536 bytes seems excessive. Did you mean to use
16-byte aligment here (".align 4" or ".balign 16")?

Otherwise patches look good. On Cortex-A53, ~36% performance
improvement seen for CFB/CBC/CTR/OCB.

-Jussi



More information about the Gcrypt-devel mailing list