[PATCH] Add SM4 ARMv8/AArch64 assembly implementation

Tianjia Zhang tianjia.zhang at linux.alibaba.com
Fri Feb 18 13:00:48 CET 2022


Hi Jussi,

On 2/18/22 2:12 AM, Jussi Kivilinna wrote:
> Hello,
> 
> Looks good, just few comments below...
> 
> On 16.2.2022 15.12, Tianjia Zhang wrote:
>> * cipher/Makefile.am: Add 'sm4-aarch64.S'.
>> * cipher/sm4-aarch64.S: New.
>> * cipher/sm4.c (USE_AARCH64_SIMD): New.
>> (SM4_context) [USE_AARCH64_SIMD]: Add 'use_aarch64_simd'.
>> [USE_AARCH64_SIMD] (_gcry_sm4_aarch64_crypt)
>> (_gcry_sm4_aarch64_cbc_dec, _gcry_sm4_aarch64_cfb_dec)
>> (_gcry_sm4_aarch64_ctr_dec): New.
>> (sm4_setkey): Enable ARMv8/AArch64 if supported by HW.
>> (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec)
>> (_gcry_sm4_cfb_dec) [USE_AESNI_AVX2]: Add ARMv8/AArch64 bulk functions.
> 
> USE_AARCH64_SIMD here.
> 
>> * configure.ac: Add ''sm4-aarch64.lo'.

>> +/* Register macros */
>> +
>> +#define RTMP0 v8
>> +#define RTMP1 v9
>> +#define RTMP2 v10
>> +#define RTMP3 v11
>> +#define RTMP4 v12
>> +
>> +#define RX0   v13
>> +#define RKEY  v14
>> +#define RIDX  v15
> 
> Vectors registers v8 to v15 are being used, so functions need
> to store and restore d8-d15 registers as they are ABI callee
> saved. Check "VPUSH_ABI" and "VPOP_API" macros in
> "cipher-gcm-armv8-aarch64-ce.S". Those could be moved to
> "asm-common-aarch64.h" so that macros can be shared between
> different files.
> 
>> +

>> +    mov x6, 8;
>> +.Lroundloop:
>> +    ld1 {RKEY.4s}, [x0], #16;
>> +    ROUND(0, v0, v1, v2, v3);
>> +    ROUND(1, v1, v2, v3, v0);
>> +    ROUND(2, v2, v3, v0, v1);
>> +    ROUND(3, v3, v0, v1, v2);
>> +
>> +    subs x6, x6, #1;
> 
> Bit of micro-optimization, but this could be moved after
> "ld1 {RKEY.4s}" above.
> 
>> +    bne .Lroundloop;
>> +
>> +    rotate_clockwise_90(v0, v1, v2, v3);
>> +    rev32 v0.16b, v0.16b;
>> +    rev32 v1.16b, v1.16b;
>> +    rev32 v2.16b, v2.16b;
>> +    rev32 v3.16b, v3.16b;

>>   #endif /* USE_AESNI_AVX2 */
>> +#ifdef USE_AARCH64_SIMD
>> +extern void _gcry_sm4_aarch64_crypt(const u32 *rk, byte *out,
>> +                    const byte *in,
>> +                    int nblocks);> +
>> +extern void _gcry_sm4_aarch64_cbc_dec(const u32 *rk_dec, byte *out,
>> +                      const byte *in,
>> +                      byte *iv,
>> +                      int nblocks);
>> +
>> +extern void _gcry_sm4_aarch64_cfb_dec(const u32 *rk_enc, byte *out,
>> +                      const byte *in,
>> +                      byte *iv,
>> +                      int nblocks);
>> +
>> +extern void _gcry_sm4_aarch64_ctr_enc(const u32 *rk_enc, byte *out,
>> +                      const byte *in,
>> +                      byte *ctr,
>> +                      int nblocks);
> 
> Use 'size_t' for nblocks. Clang can make assumption that 'int'
> means that target function uses only low 32-bit of 'nblocks'
> (assumes that target function accesses only through W3
> register) and leave garbage values in upper 32-bit of X3 register
> here.
> 
> -Jussi

Thanks for your suggestion, I will fix the bugs you mentioned, and
introduce 8 way acceleration support in the v2 patch, and introduce
a new patch to move VPUSH_ABI/VPOP_ABI into asm-common-aarch64.h
header file.

Best regards,
Tianjia



More information about the Gcrypt-devel mailing list