Optimization for SM4 and x86-64/AES-NI implementations

Jussi Kivilinna jussi.kivilinna at iki.fi
Tue Jun 16 21:28:22 CEST 2020


This patch-set adds optimizations for C implementation of SM4 cipher and
AES-NI accelerated AVX and AVX2 assembly implementations. Performance
improvement for whole patch-set is presented below. Intermediate results
are listed in each patch separately. 

As summary, on x86-64, generic C implementation is ~2 to ~4 times faster
than original C implementation. AES-NI implementations speed-up
parallelizable cipher modes and there AES-NI/AVX is ~11 times faster
and AES-NI/AVX2 ~18 times faster original C implementation.

Benchmark on AMD Ryzen 7 3700X:

Before:
 SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        ECB enc |     17.69 ns/B     53.92 MiB/s     76.50 c/B      4326
        ECB dec |     17.74 ns/B     53.77 MiB/s     76.72 c/B      4325
        CBC enc |     18.14 ns/B     52.56 MiB/s     78.47 c/B      4325
        CBC dec |     18.05 ns/B     52.83 MiB/s     78.09 c/B      4326
        CFB enc |     18.19 ns/B     52.44 MiB/s     78.67 c/B      4326
        CFB dec |     18.16 ns/B     52.53 MiB/s     78.53 c/B      4326
        OFB enc |     16.82 ns/B     56.70 MiB/s     72.96 c/B      4338
        OFB dec |     16.87 ns/B     56.53 MiB/s     72.96 c/B      4325
        CTR enc |     18.17 ns/B     52.47 MiB/s     78.62 c/B      4326
        CTR dec |     18.02 ns/B     52.94 MiB/s     77.92 c/B      4325
        XTS enc |     17.70 ns/B     53.87 MiB/s     76.11 c/B      4300
        XTS dec |     17.65 ns/B     54.04 MiB/s     76.28 c/B      4323±1
        CCM enc |     33.76 ns/B     28.25 MiB/s     146.9 c/B      4350
        CCM dec |     34.07 ns/B     27.99 MiB/s     147.4 c/B      4326
       CCM auth |     16.97 ns/B     56.19 MiB/s     73.41 c/B      4325
        EAX enc |     34.02 ns/B     28.03 MiB/s     147.1 c/B      4325
        EAX dec |     36.56 ns/B     26.08 MiB/s     159.1 c/B      4350
       EAX auth |     17.02 ns/B     56.03 MiB/s     73.62 c/B      4325
        GCM enc |     16.76 ns/B     56.90 MiB/s     72.50 c/B      4325
        GCM dec |     18.01 ns/B     52.94 MiB/s     78.37 c/B      4350
       GCM auth |     0.120 ns/B      7975 MiB/s     0.517 c/B      4325
        OCB enc |     18.19 ns/B     52.43 MiB/s     78.68 c/B      4325
        OCB dec |     18.15 ns/B     52.54 MiB/s     78.51 c/B      4325
       OCB auth |     16.87 ns/B     56.54 MiB/s     72.95 c/B      4325

After:
 SM4            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        ECB enc |      8.32 ns/B     114.6 MiB/s     36.01 c/B      4325
        ECB dec |      8.31 ns/B     114.7 MiB/s     35.75 c/B      4300
        CBC enc |      8.94 ns/B     106.7 MiB/s     38.67 c/B      4325
        CBC dec |     0.984 ns/B     969.2 MiB/s      4.23 c/B      4300
        CFB enc |      8.92 ns/B     107.0 MiB/s     38.57 c/B      4325
        CFB dec |     0.989 ns/B     964.1 MiB/s      4.23 c/B      4275
        OFB enc |      8.45 ns/B     112.8 MiB/s     36.35 c/B      4300
        OFB dec |      8.40 ns/B     113.5 MiB/s     36.34 c/B      4325
        CTR enc |      1.00 ns/B     952.6 MiB/s      4.31 c/B      4300
        CTR dec |     0.999 ns/B     954.9 MiB/s      4.29 c/B      4300
        XTS enc |      8.81 ns/B     108.3 MiB/s     38.11 c/B      4326
        XTS dec |      8.81 ns/B     108.3 MiB/s     38.09 c/B      4325
        CCM enc |      9.93 ns/B     96.07 MiB/s     42.69 c/B      4300
        CCM dec |      9.91 ns/B     96.20 MiB/s     42.89 c/B      4326
       CCM auth |      8.89 ns/B     107.3 MiB/s     38.45 c/B      4326
        EAX enc |      9.91 ns/B     96.27 MiB/s     42.85 c/B      4325
        EAX dec |      9.91 ns/B     96.19 MiB/s     42.80 c/B      4317
       EAX auth |      8.95 ns/B     106.6 MiB/s     38.71 c/B      4325
        GCM enc |      1.11 ns/B     856.8 MiB/s      4.79 c/B      4300
        GCM dec |      1.12 ns/B     849.4 MiB/s      4.80 c/B      4275
       GCM auth |     0.117 ns/B      8154 MiB/s     0.509 c/B      4350
        OCB enc |     0.999 ns/B     954.8 MiB/s      4.29 c/B      4300
        OCB dec |      1.00 ns/B     952.1 MiB/s      4.31 c/B      4300
       OCB auth |     0.989 ns/B     964.4 MiB/s      4.25 c/B      4300





More information about the Gcrypt-devel mailing list