[PATCH 1/3] sm4: add XTS bulk processing
    Tianjia Zhang 
    tianjia.zhang at linux.alibaba.com
       
    Tue Apr 26 10:33:56 CEST 2022
    
    
  
Hi Jussi,
On 4/25/22 2:47 AM, Jussi Kivilinna wrote:
> * cipher/sm4.c (_gcry_sm4_xts_crypt): New.
> (sm4_setkey): Set XTS bulk function.
> --
> 
> Benchmark on Ryzen 5800X:
> 
> Before:
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          XTS enc |      7.28 ns/B     131.0 MiB/s     35.31 c/B      4850
>          XTS dec |      7.29 ns/B     130.9 MiB/s     35.34 c/B      4850
> 
> After (4.8x faster):
>   SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
>          XTS enc |      1.49 ns/B     638.6 MiB/s      7.24 c/B      4850
>          XTS dec |      1.49 ns/B     639.3 MiB/s      7.24 c/B      4850
> 
> Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
> ---
>   cipher/sm4.c | 35 +++++++++++++++++++++++++++++++++++
>   1 file changed, 35 insertions(+)
> 
> diff --git a/cipher/sm4.c b/cipher/sm4.c
> index 4815b184..600850e2 100644
> --- a/cipher/sm4.c
> +++ b/cipher/sm4.c
> @@ -97,6 +97,9 @@ static void _gcry_sm4_cbc_dec (void *context, unsigned char *iv,
>   static void _gcry_sm4_cfb_dec (void *context, unsigned char *iv,
>   			       void *outbuf_arg, const void *inbuf_arg,
>   			       size_t nblocks);
> +static void _gcry_sm4_xts_crypt (void *context, unsigned char *tweak,
> +                                 void *outbuf_arg, const void *inbuf_arg,
> +                                 size_t nblocks, int encrypt);
>   static size_t _gcry_sm4_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg,
>   				   const void *inbuf_arg, size_t nblocks,
>   				   int encrypt);
> @@ -492,6 +495,7 @@ sm4_setkey (void *context, const byte *key, const unsigned keylen,
>     bulk_ops->cbc_dec = _gcry_sm4_cbc_dec;
>     bulk_ops->cfb_dec = _gcry_sm4_cfb_dec;
>     bulk_ops->ctr_enc = _gcry_sm4_ctr_enc;
> +  bulk_ops->xts_crypt = _gcry_sm4_xts_crypt;
>     bulk_ops->ocb_crypt = _gcry_sm4_ocb_crypt;
>     bulk_ops->ocb_auth  = _gcry_sm4_ocb_auth;
>   
> @@ -954,6 +958,37 @@ _gcry_sm4_cfb_dec(void *context, unsigned char *iv,
>       _gcry_burn_stack(burn_stack_depth);
>   }
>   
> +/* Bulk encryption/decryption of complete blocks in XTS mode. */
> +static void
> +_gcry_sm4_xts_crypt (void *context, unsigned char *tweak, void *outbuf_arg,
> +                     const void *inbuf_arg, size_t nblocks, int encrypt)
> +{
> +  SM4_context *ctx = context;
> +  unsigned char *outbuf = outbuf_arg;
> +  const unsigned char *inbuf = inbuf_arg;
> +  int burn_stack_depth = 0;
> +
> +  /* Process remaining blocks. */
> +  if (nblocks)
> +    {
> +      crypt_blk1_8_fn_t crypt_blk1_8 = sm4_get_crypt_blk1_8_fn(ctx);
> +      u32 *rk = encrypt ? ctx->rkey_enc : ctx->rkey_dec;
> +      unsigned char tmpbuf[16 * 8];
> +      unsigned int tmp_used = 16;
> +      size_t nburn;
> +
> +      nburn = bulk_xts_crypt_128(rk, crypt_blk1_8, outbuf, inbuf, nblocks,
> +                                 tweak, tmpbuf, sizeof(tmpbuf) / 16,
> +                                 &tmp_used);
> +      burn_stack_depth = nburn > burn_stack_depth ? nburn : burn_stack_depth;
> +
> +      wipememory(tmpbuf, tmp_used);
> +    }
> +
> +  if (burn_stack_depth)
> +    _gcry_burn_stack(burn_stack_depth);
> +}
> +
>   /* Bulk encryption/decryption of complete blocks in OCB mode. */
>   static size_t
>   _gcry_sm4_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg,
Thanks for the reply, this is a great job, I did some performance tests
and reviews, but unfortunately I haven't found a machine that supports
GFNI features at the moment, so for patch 1/3:
Benchmark on Intel i5-6200U 2.30GHz:
Before:
  SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         XTS enc |     13.41 ns/B     71.10 MiB/s     37.45 c/B      2792
         XTS dec |     13.43 ns/B     71.03 MiB/s     37.49 c/B      2792
After (4.54x faster):
  SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         XTS enc |      2.96 ns/B     322.7 MiB/s      8.25 c/B      2792
         XTS dec |      2.96 ns/B     322.5 MiB/s      8.26 c/B      2792
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
Best regards,
Tianjia
    
    
More information about the Gcrypt-devel
mailing list