FYI: fast gcm/ghash for arm neon

Jussi Kivilinna jussi.kivilinna at iki.fi
Mon Mar 11 18:05:06 CET 2019


Hello,

On 10.3.2019 10.38, Yuriy M. Kaminskiy wrote:
> Currently ghash/gcm performance on arm in both gcrypt and nettle is a bit abysmal:
> === bench-slopes-nettle ===
>        GCM auth |     28.43 ns/B     33.54 MiB/s     39.81 c/B    1400.2
> === bench-slopes-gcrypt ===
>        GCM auth |     21.86 ns/B     43.62 MiB/s     30.52 c/B    1396.0
> === bench-slopes-openssl [1.1.1a] ===
>        GCM auth |      5.99 ns/B     159.3 MiB/s      8.38 c/B    1399.6
> === cut ===> Current openssl/cryptograms code is based on ideas from
> https://hal.inria.fr/hal-01506572 (licensed CC BY 4.0)
> and there are linked implementation
> https://conradoplg.cryptoland.net/software/ecc-and-ae-for-arm-neon/
> (licensed LGPL 2.1+), which I guess should be acceptable to borrow.

Thanks for providing link to these. My focus for AES/GCM has been on
ARM crypto extension instruction set so I hadn't look into ARM/NEON
implementation. When CPU has support for crypto instructions, gcrypt
performs significantly better and gives results similar to openssl:

Cortex-A53, 32-bit:
bench-slope-gcrypt: libgcrypt: 1.8.3
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        GCM enc |      2.65 ns/B     360.3 MiB/s      2.16 c/B     816.0
        GCM dec |      2.65 ns/B     360.1 MiB/s      2.16 c/B     816.0
       GCM auth |      1.08 ns/B     885.9 MiB/s     0.878 c/B     816.0

bench-slope-openssl: OpenSSL 1.1.1  11 Sep 2018
 aes-128        |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        GCM enc |      3.05 ns/B     313.1 MiB/s      2.49 c/B     816.0
        GCM dec |      3.04 ns/B     313.3 MiB/s      2.48 c/B     816.0
       GCM auth |      1.23 ns/B     777.2 MiB/s      1.00 c/B     816.0

Cortex-A53, 64-bit:
bench-slope-gcrypt: libgcrypt: 1.8.3
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        GCM enc |      2.69 ns/B     354.4 MiB/s      2.20 c/B     816.0
        GCM dec |      2.70 ns/B     353.8 MiB/s      2.20 c/B     816.0
       GCM auth |      1.24 ns/B     771.1 MiB/s      1.01 c/B     816.0

bench-slope-openssl: OpenSSL 1.1.1  11 Sep 2018
 aes-128        |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
        GCM enc |      2.86 ns/B     333.7 MiB/s      2.33 c/B     816.0
        GCM dec |      2.86 ns/B     333.6 MiB/s      2.33 c/B     816.0
       GCM auth |      1.06 ns/B     903.3 MiB/s     0.861 c/B     816.0

Adding ARM/NEON implementation would be make sense for low-end ARM
CPUs since those do not provide these crypto instructions. 

-Jussi



More information about the Gcrypt-devel mailing list