[PATCH 2/2] Add SM4 ARMv8/AArch64/CE assembly implementation

Tianjia Zhang tianjia.zhang at linux.alibaba.com
Mon Feb 28 13:25:36 CET 2022


Hi Jussi,

On 2/26/22 3:10 PM, Jussi Kivilinna wrote:
> On 25.2.2022 9.41, Tianjia Zhang wrote:
>> * cipher/Makefile.am: Add 'sm4-armv8-aarch64-ce.S'.
>> * cipher/sm4-armv8-aarch64-ce.S: New.
>> * cipher/sm4.c (USE_ARM_CE): New.
>> (SM4_context) [USE_ARM_CE]: Add 'use_arm_ce'.
>> [USE_ARM_CE] (_gcry_sm4_armv8_ce_expand_key)
>> (_gcry_sm4_armv8_ce_crypt, _gcry_sm4_armv8_ce_ctr_enc)
>> (_gcry_sm4_armv8_ce_cbc_dec, _gcry_sm4_armv8_ce_cfb_dec)
>> (_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_8): New.
>> (sm4_expand_key) [USE_ARM_CE]: Use ARMv8/AArch64/CE key setup.
>> (sm4_setkey): Enable ARMv8/AArch64/CE if supported by HW.
>> (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
>> (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_ARM_CE]:
>> Add ARMv8/AArch64/CE bulk functions.
>> * configure.ac: Add 'sm4-armv8-aarch64-ce.lo'.
>> -- 
>>
>> This patch adds ARMv8/AArch64/CE bulk encryption/decryption. Bulk
>> functions process eight blocks in parallel.
>>
>> Benchmark on T-Head Yitian-710 2.75 GHz:
>>
>> Before:
>>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>>          CBC enc |     12.10 ns/B     78.79 MiB/s     33.28 c/B      2750
>>          CBC dec |      4.63 ns/B     205.9 MiB/s     12.74 c/B      2749
>>          CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
>>          CFB dec |      4.64 ns/B     205.5 MiB/s     12.76 c/B      2750
>>          CTR enc |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
>>          CTR dec |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
>>          GCM enc |      4.88 ns/B     195.4 MiB/s     13.42 c/B      2750
>>          GCM dec |      4.88 ns/B     195.5 MiB/s     13.42 c/B      2750
>>         GCM auth |     0.189 ns/B      5048 MiB/s     0.520 c/B      2750
>>          OCB enc |      4.86 ns/B     196.0 MiB/s     13.38 c/B      2750
>>          OCB dec |      4.90 ns/B     194.7 MiB/s     13.47 c/B      2750
>>         OCB auth |      4.79 ns/B     199.0 MiB/s     13.18 c/B      2750
>>
>> After (16x - 19x faster than ARMv8/AArch64 impl):
>>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>>          CBC enc |     12.10 ns/B     78.81 MiB/s     33.27 c/B      2750
>>          CBC dec |     0.243 ns/B      3921 MiB/s     0.669 c/B      2750
> 
> This implementation is actually so much faster than generic C, that 
> `_gcry_sm4_armv8_ce_crypt_blk1_8` could be used in `sm4_encrypt` and 
> `sm4_decrypt` to speed up single block operations (CBC encryption, etc) ...
> 
>    static unsigned int
>    sm4_encrypt (void *context, byte *outbuf, const byte *inbuf)
>    {
>      SM4_context *ctx = context;
> 
>    #ifdef USE_ARM_CE
>      if (ctx->use_arm_ce)
>        return sm4_armv8_ce_crypt_blk1_8 (ctx->rkey_enc, outbuf, inbuf, 1);
>    #endif
>    ...
> 

Great suggestion, I will do.

>>          CFB enc |     12.14 ns/B     78.52 MiB/s     33.39 c/B      2750
>>          CFB dec |     0.241 ns/B      3963 MiB/s     0.662 c/B      2750
>>          CTR enc |     0.298 ns/B      3201 MiB/s     0.819 c/B      2749
>>          CTR dec |     0.298 ns/B      3197 MiB/s     0.820 c/B      2750
>>          GCM enc |     0.488 ns/B      1956 MiB/s      1.34 c/B      2749
>>          GCM dec |     0.487 ns/B      1959 MiB/s      1.34 c/B      2750
>>         GCM auth |     0.189 ns/B      5049 MiB/s     0.519 c/B      2749
>>          OCB enc |     0.461 ns/B      2069 MiB/s      1.27 c/B      2750
>>          OCB dec |     0.495 ns/B      1928 MiB/s      1.36 c/B      2750
>>         OCB auth |     0.385 ns/B      2479 MiB/s      1.06 c/B      2750
>>
>> Signed-off-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
>> ---
>>   cipher/Makefile.am            |   1 +
>>   cipher/sm4-armv8-aarch64-ce.S | 614 ++++++++++++++++++++++++++++++++++
>>   cipher/sm4.c                  | 142 ++++++++
>>   configure.ac                  |   1 +
>>   4 files changed, 758 insertions(+)
>>   create mode 100644 cipher/sm4-armv8-aarch64-ce.S
>>
>> diff --git a/cipher/Makefile.am b/cipher/Makefile.am
>> index a7cbf3fc..3339c463 100644
>> --- a/cipher/Makefile.am
>> +++ b/cipher/Makefile.am
>> @@ -117,6 +117,7 @@ EXTRA_libcipher_la_SOURCES = \
>>       seed.c \
>>       serpent.c serpent-sse2-amd64.S \
>>       sm4.c sm4-aesni-avx-amd64.S sm4-aesni-avx2-amd64.S sm4-aarch64.S \
>> +    sm4-armv8-aarch64-ce.S \
>>       serpent-avx2-amd64.S serpent-armv7-neon.S \
>>       sha1.c sha1-ssse3-amd64.S sha1-avx-amd64.S sha1-avx-bmi2-amd64.S \
>>       sha1-avx2-bmi2-amd64.S sha1-armv7-neon.S sha1-armv8-aarch32-ce.S \
>> diff --git a/cipher/sm4-armv8-aarch64-ce.S 
>> b/cipher/sm4-armv8-aarch64-ce.S
>> new file mode 100644
>> index 00000000..943f0143
>> --- /dev/null
>> +++ b/cipher/sm4-armv8-aarch64-ce.S
>> @@ -0,0 +1,614 @@
>> +/* sm4-armv8-aarch64-ce.S  -  ARMv8/AArch64/CE accelerated SM4 cipher
>> + *
>> + * Copyright (C) 2022 Alibaba Group.
>> + * Copyright (C) 2022 Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
>> + *
>> + * This file is part of Libgcrypt.
>> + *
>> + * Libgcrypt is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU Lesser General Public License as
>> + * published by the Free Software Foundation; either version 2.1 of
>> + * the License, or (at your option) any later version.
>> + *
>> + * Libgcrypt is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this program; if not, see 
>> <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "asm-common-aarch64.h"
>> +
>> +#if defined(__AARCH64EL__) && \
>> +    defined(HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS) && \
>> +    defined(HAVE_GCC_INLINE_ASM_AARCH64_CRYPTO) && \
>> +    defined(USE_SM4)
>> +
>> +.cpu generic+simd+crypto
>> +
>> +.irp b, 0, 1, 2, 3, 4, 5, 6, 7, 16, 24, 25, 26, 27, 28, 29, 30, 31
>> +    .set .Lv\b\().4s, \b
>> +.endr
>> +
>> +.macro sm4e, vd, vn
>> +    .inst 0xcec08400 | (.L\vn << 5) | .L\vd
>> +.endm
>> +
>> +.macro sm4ekey, vd, vn, vm
>> +    .inst 0xce60c800 | (.L\vm << 16) | (.L\vn << 5) | .L\vd
>> +.endm
> 
> We have target architectures where assembler does not support these 
> macros (MacOSX for example). It's better to detect if these instructions 
> are supported with new check in `configure.ac`. For example, see how 
> this is done for `HAVE_GCC_INLINE_ASM_AARCH64_CRYPTO`.
> 
> -Jussi

SM Crypto Extensions is an optional ARMv8 extension, so the current
mainstream ARM architecture CPUs do not support this extension due to
various reasons. I will add in the next patch to detect whether the
extension of the SM3/4 instructions is supported.

Best regards,
Tianjia



More information about the Gcrypt-devel mailing list