Optimization for SM4 and x86-64/AES-NI implementations
Jussi Kivilinna
jussi.kivilinna at iki.fi
Tue Jun 16 21:28:22 CEST 2020
This patch-set adds optimizations for C implementation of SM4 cipher and
AES-NI accelerated AVX and AVX2 assembly implementations. Performance
improvement for whole patch-set is presented below. Intermediate results
are listed in each patch separately.
As summary, on x86-64, generic C implementation is ~2 to ~4 times faster
than original C implementation. AES-NI implementations speed-up
parallelizable cipher modes and there AES-NI/AVX is ~11 times faster
and AES-NI/AVX2 ~18 times faster original C implementation.
Benchmark on AMD Ryzen 7 3700X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 17.69 ns/B 53.92 MiB/s 76.50 c/B 4326
ECB dec | 17.74 ns/B 53.77 MiB/s 76.72 c/B 4325
CBC enc | 18.14 ns/B 52.56 MiB/s 78.47 c/B 4325
CBC dec | 18.05 ns/B 52.83 MiB/s 78.09 c/B 4326
CFB enc | 18.19 ns/B 52.44 MiB/s 78.67 c/B 4326
CFB dec | 18.16 ns/B 52.53 MiB/s 78.53 c/B 4326
OFB enc | 16.82 ns/B 56.70 MiB/s 72.96 c/B 4338
OFB dec | 16.87 ns/B 56.53 MiB/s 72.96 c/B 4325
CTR enc | 18.17 ns/B 52.47 MiB/s 78.62 c/B 4326
CTR dec | 18.02 ns/B 52.94 MiB/s 77.92 c/B 4325
XTS enc | 17.70 ns/B 53.87 MiB/s 76.11 c/B 4300
XTS dec | 17.65 ns/B 54.04 MiB/s 76.28 c/B 4323±1
CCM enc | 33.76 ns/B 28.25 MiB/s 146.9 c/B 4350
CCM dec | 34.07 ns/B 27.99 MiB/s 147.4 c/B 4326
CCM auth | 16.97 ns/B 56.19 MiB/s 73.41 c/B 4325
EAX enc | 34.02 ns/B 28.03 MiB/s 147.1 c/B 4325
EAX dec | 36.56 ns/B 26.08 MiB/s 159.1 c/B 4350
EAX auth | 17.02 ns/B 56.03 MiB/s 73.62 c/B 4325
GCM enc | 16.76 ns/B 56.90 MiB/s 72.50 c/B 4325
GCM dec | 18.01 ns/B 52.94 MiB/s 78.37 c/B 4350
GCM auth | 0.120 ns/B 7975 MiB/s 0.517 c/B 4325
OCB enc | 18.19 ns/B 52.43 MiB/s 78.68 c/B 4325
OCB dec | 18.15 ns/B 52.54 MiB/s 78.51 c/B 4325
OCB auth | 16.87 ns/B 56.54 MiB/s 72.95 c/B 4325
After:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 8.32 ns/B 114.6 MiB/s 36.01 c/B 4325
ECB dec | 8.31 ns/B 114.7 MiB/s 35.75 c/B 4300
CBC enc | 8.94 ns/B 106.7 MiB/s 38.67 c/B 4325
CBC dec | 0.984 ns/B 969.2 MiB/s 4.23 c/B 4300
CFB enc | 8.92 ns/B 107.0 MiB/s 38.57 c/B 4325
CFB dec | 0.989 ns/B 964.1 MiB/s 4.23 c/B 4275
OFB enc | 8.45 ns/B 112.8 MiB/s 36.35 c/B 4300
OFB dec | 8.40 ns/B 113.5 MiB/s 36.34 c/B 4325
CTR enc | 1.00 ns/B 952.6 MiB/s 4.31 c/B 4300
CTR dec | 0.999 ns/B 954.9 MiB/s 4.29 c/B 4300
XTS enc | 8.81 ns/B 108.3 MiB/s 38.11 c/B 4326
XTS dec | 8.81 ns/B 108.3 MiB/s 38.09 c/B 4325
CCM enc | 9.93 ns/B 96.07 MiB/s 42.69 c/B 4300
CCM dec | 9.91 ns/B 96.20 MiB/s 42.89 c/B 4326
CCM auth | 8.89 ns/B 107.3 MiB/s 38.45 c/B 4326
EAX enc | 9.91 ns/B 96.27 MiB/s 42.85 c/B 4325
EAX dec | 9.91 ns/B 96.19 MiB/s 42.80 c/B 4317
EAX auth | 8.95 ns/B 106.6 MiB/s 38.71 c/B 4325
GCM enc | 1.11 ns/B 856.8 MiB/s 4.79 c/B 4300
GCM dec | 1.12 ns/B 849.4 MiB/s 4.80 c/B 4275
GCM auth | 0.117 ns/B 8154 MiB/s 0.509 c/B 4350
OCB enc | 0.999 ns/B 954.8 MiB/s 4.29 c/B 4300
OCB dec | 1.00 ns/B 952.1 MiB/s 4.31 c/B 4300
OCB auth | 0.989 ns/B 964.4 MiB/s 4.25 c/B 4300
More information about the Gcrypt-devel
mailing list