FYI: fast gcm/ghash for arm neon
Jussi Kivilinna
jussi.kivilinna at iki.fi
Mon Mar 11 18:05:06 CET 2019
Hello,
On 10.3.2019 10.38, Yuriy M. Kaminskiy wrote:
> Currently ghash/gcm performance on arm in both gcrypt and nettle is a bit abysmal:
> === bench-slopes-nettle ===
> GCM auth | 28.43 ns/B 33.54 MiB/s 39.81 c/B 1400.2
> === bench-slopes-gcrypt ===
> GCM auth | 21.86 ns/B 43.62 MiB/s 30.52 c/B 1396.0
> === bench-slopes-openssl [1.1.1a] ===
> GCM auth | 5.99 ns/B 159.3 MiB/s 8.38 c/B 1399.6
> === cut ===> Current openssl/cryptograms code is based on ideas from
> https://hal.inria.fr/hal-01506572 (licensed CC BY 4.0)
> and there are linked implementation
> https://conradoplg.cryptoland.net/software/ecc-and-ae-for-arm-neon/
> (licensed LGPL 2.1+), which I guess should be acceptable to borrow.
Thanks for providing link to these. My focus for AES/GCM has been on
ARM crypto extension instruction set so I hadn't look into ARM/NEON
implementation. When CPU has support for crypto instructions, gcrypt
performs significantly better and gives results similar to openssl:
Cortex-A53, 32-bit:
bench-slope-gcrypt: libgcrypt: 1.8.3
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 2.65 ns/B 360.3 MiB/s 2.16 c/B 816.0
GCM dec | 2.65 ns/B 360.1 MiB/s 2.16 c/B 816.0
GCM auth | 1.08 ns/B 885.9 MiB/s 0.878 c/B 816.0
bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018
aes-128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 3.05 ns/B 313.1 MiB/s 2.49 c/B 816.0
GCM dec | 3.04 ns/B 313.3 MiB/s 2.48 c/B 816.0
GCM auth | 1.23 ns/B 777.2 MiB/s 1.00 c/B 816.0
Cortex-A53, 64-bit:
bench-slope-gcrypt: libgcrypt: 1.8.3
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 2.69 ns/B 354.4 MiB/s 2.20 c/B 816.0
GCM dec | 2.70 ns/B 353.8 MiB/s 2.20 c/B 816.0
GCM auth | 1.24 ns/B 771.1 MiB/s 1.01 c/B 816.0
bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018
aes-128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 2.86 ns/B 333.7 MiB/s 2.33 c/B 816.0
GCM dec | 2.86 ns/B 333.6 MiB/s 2.33 c/B 816.0
GCM auth | 1.06 ns/B 903.3 MiB/s 0.861 c/B 816.0
Adding ARM/NEON implementation would be make sense for low-end ARM
CPUs since those do not provide these crypto instructions.
-Jussi
More information about the Gcrypt-devel
mailing list