[PATCH 3/3] sm4-aesni-avx2: add generic 1 to 16 block bulk processing function
Tianjia Zhang
tianjia.zhang at linux.alibaba.com
Tue Apr 26 10:35:15 CEST 2022
Hi Jussi,
On 4/25/22 2:47 AM, Jussi Kivilinna wrote:
> * cipher/sm4-aesni-avx2-amd64.S: Remove unnecessary vzeroupper at
> function entries.
> (_gcry_sm4_aesni_avx2_crypt_blk1_16): New.
> * cipher/sm4.c (_gcry_sm4_aesni_avx2_crypt_blk1_16)
> (sm4_aesni_avx2_crypt_blk1_16): New.
> (sm4_get_crypt_blk1_16_fn) [USE_AESNI_AVX2]: Add
> 'sm4_aesni_avx2_crypt_blk1_16'.
> --
>
> Benchmark AMD Ryzen 5800X:
>
> Before:
> SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> XTS enc | 1.48 ns/B 643.2 MiB/s 7.19 c/B 4850
> XTS dec | 1.48 ns/B 644.3 MiB/s 7.18 c/B 4850
>
> After (1.37x faster):
> SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> XTS enc | 1.07 ns/B 888.7 MiB/s 5.21 c/B 4850
> XTS dec | 1.07 ns/B 889.4 MiB/s 5.20 c/B 4850
>
> Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
> ---
Benchmark on Intel i5-6200U 2.30GHz:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 2.95 ns/B 323.0 MiB/s 8.25 c/B 2792
XTS dec | 2.95 ns/B 323.0 MiB/s 8.24 c/B 2792
After (1.64x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 1.79 ns/B 531.4 MiB/s 5.01 c/B 2791
XTS dec | 1.79 ns/B 531.6 MiB/s 5.01 c/B 2791
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
Best regards,
Tianjia
More information about the Gcrypt-devel
mailing list