[PATCH] Add SM4 ARMv8/AArch64 assembly implementation
Tianjia Zhang
tianjia.zhang at linux.alibaba.com
Fri Feb 18 13:00:48 CET 2022
Hi Jussi,
On 2/18/22 2:12 AM, Jussi Kivilinna wrote:
> Hello,
>
> Looks good, just few comments below...
>
> On 16.2.2022 15.12, Tianjia Zhang wrote:
>> * cipher/Makefile.am: Add 'sm4-aarch64.S'.
>> * cipher/sm4-aarch64.S: New.
>> * cipher/sm4.c (USE_AARCH64_SIMD): New.
>> (SM4_context) [USE_AARCH64_SIMD]: Add 'use_aarch64_simd'.
>> [USE_AARCH64_SIMD] (_gcry_sm4_aarch64_crypt)
>> (_gcry_sm4_aarch64_cbc_dec, _gcry_sm4_aarch64_cfb_dec)
>> (_gcry_sm4_aarch64_ctr_dec): New.
>> (sm4_setkey): Enable ARMv8/AArch64 if supported by HW.
>> (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec)
>> (_gcry_sm4_cfb_dec) [USE_AESNI_AVX2]: Add ARMv8/AArch64 bulk functions.
>
> USE_AARCH64_SIMD here.
>
>> * configure.ac: Add ''sm4-aarch64.lo'.
>> +/* Register macros */
>> +
>> +#define RTMP0 v8
>> +#define RTMP1 v9
>> +#define RTMP2 v10
>> +#define RTMP3 v11
>> +#define RTMP4 v12
>> +
>> +#define RX0 v13
>> +#define RKEY v14
>> +#define RIDX v15
>
> Vectors registers v8 to v15 are being used, so functions need
> to store and restore d8-d15 registers as they are ABI callee
> saved. Check "VPUSH_ABI" and "VPOP_API" macros in
> "cipher-gcm-armv8-aarch64-ce.S". Those could be moved to
> "asm-common-aarch64.h" so that macros can be shared between
> different files.
>
>> +
>> + mov x6, 8;
>> +.Lroundloop:
>> + ld1 {RKEY.4s}, [x0], #16;
>> + ROUND(0, v0, v1, v2, v3);
>> + ROUND(1, v1, v2, v3, v0);
>> + ROUND(2, v2, v3, v0, v1);
>> + ROUND(3, v3, v0, v1, v2);
>> +
>> + subs x6, x6, #1;
>
> Bit of micro-optimization, but this could be moved after
> "ld1 {RKEY.4s}" above.
>
>> + bne .Lroundloop;
>> +
>> + rotate_clockwise_90(v0, v1, v2, v3);
>> + rev32 v0.16b, v0.16b;
>> + rev32 v1.16b, v1.16b;
>> + rev32 v2.16b, v2.16b;
>> + rev32 v3.16b, v3.16b;
>> #endif /* USE_AESNI_AVX2 */
>> +#ifdef USE_AARCH64_SIMD
>> +extern void _gcry_sm4_aarch64_crypt(const u32 *rk, byte *out,
>> + const byte *in,
>> + int nblocks);> +
>> +extern void _gcry_sm4_aarch64_cbc_dec(const u32 *rk_dec, byte *out,
>> + const byte *in,
>> + byte *iv,
>> + int nblocks);
>> +
>> +extern void _gcry_sm4_aarch64_cfb_dec(const u32 *rk_enc, byte *out,
>> + const byte *in,
>> + byte *iv,
>> + int nblocks);
>> +
>> +extern void _gcry_sm4_aarch64_ctr_enc(const u32 *rk_enc, byte *out,
>> + const byte *in,
>> + byte *ctr,
>> + int nblocks);
>
> Use 'size_t' for nblocks. Clang can make assumption that 'int'
> means that target function uses only low 32-bit of 'nblocks'
> (assumes that target function accesses only through W3
> register) and leave garbage values in upper 32-bit of X3 register
> here.
>
> -Jussi
Thanks for your suggestion, I will fix the bugs you mentioned, and
introduce 8 way acceleration support in the v2 patch, and introduce
a new patch to move VPUSH_ABI/VPOP_ABI into asm-common-aarch64.h
header file.
Best regards,
Tianjia
More information about the Gcrypt-devel
mailing list