[PATCH 3/3] sm4-aesni-avx2: add generic 1 to 16 block bulk processing function

Tianjia Zhang tianjia.zhang at linux.alibaba.com
Tue Apr 26 10:35:15 CEST 2022


Hi Jussi,

On 4/25/22 2:47 AM, Jussi Kivilinna wrote:
> * cipher/sm4-aesni-avx2-amd64.S: Remove unnecessary vzeroupper at
> function entries.
> (_gcry_sm4_aesni_avx2_crypt_blk1_16): New.
> * cipher/sm4.c (_gcry_sm4_aesni_avx2_crypt_blk1_16)
> (sm4_aesni_avx2_crypt_blk1_16): New.
> (sm4_get_crypt_blk1_16_fn) [USE_AESNI_AVX2]: Add
> 'sm4_aesni_avx2_crypt_blk1_16'.
> --
> 
> Benchmark AMD Ryzen 5800X:
> 
> Before:
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          XTS enc |      1.48 ns/B     643.2 MiB/s      7.19 c/B      4850
>          XTS dec |      1.48 ns/B     644.3 MiB/s      7.18 c/B      4850
> 
> After (1.37x faster):
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          XTS enc |      1.07 ns/B     888.7 MiB/s      5.21 c/B      4850
>          XTS dec |      1.07 ns/B     889.4 MiB/s      5.20 c/B      4850
> 
> Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
> ---

Benchmark on Intel i5-6200U 2.30GHz:

Before:
  SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         XTS enc |      2.95 ns/B     323.0 MiB/s      8.25 c/B      2792
         XTS dec |      2.95 ns/B     323.0 MiB/s      8.24 c/B      2792

After (1.64x faster):
  SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         XTS enc |      1.79 ns/B     531.4 MiB/s      5.01 c/B      2791
         XTS dec |      1.79 ns/B     531.6 MiB/s      5.01 c/B      2791

Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>

Best regards,
Tianjia



More information about the Gcrypt-devel mailing list