Low level ops?
Jussi Kivilinna
jussi.kivilinna at iki.fi
Tue Jun 19 18:28:37 CEST 2018
Hello,
On 19.06.2018 08:27, Stef Bon wrote:
> Op wo 13 jun. 2018 om 21:21 schreef Stef Bon <stefbon at gmail.com>:
>>
>>
>> So as I see it it would be worth to try to bring back the overhead for
>> AES-CBC//DEC since they vary from 99% to 12,5%, since the size most
>> ssh messages is between 128 and 1024 bytes.
>>
>> You mention parallel mode for AES-CBC/DEC. Is it possible to use this
>> from the api?
>> And do you know what counts for chacha20-poly1305 at openssh.com?
>>
>
> Hi,
>
> can you please take a look at my remarks. I think that it's usefull to
> reduce the overhead
> for the mentioned ciphers.
I made changes on weekend to reduce the overhead for cipher operations.
When I tried to get those patches to the mailing-list they just would
not get through. I've spend past two nights trying to figure out what
the ____ is wrong with my mail setup.
Anyway, overhead for example for AESNI/CBC decryption has reduced
from ~80 cycles per call to ~30 cycles. The remaining 30 cycles, seems
to be mainly caused by the optimized AESNI/CBC decryption function
itself. AESNI/CBC encryption function is less complex and overhead
for it is now 9 cycles per call (was 40 cycles).
> And what about chacha20-poly1305 at openssh.com?
If you check the chacha20-poly1305 in OpenSSH, you see that for each
packet you need to perform one extra chacha20 block encryption, which
alone is going to cost over 400 cycles.
If you want to see how to implement chacha20-poly1305 at openssh.com with
libgcrypt, check following commit where I've changed OpenSSH to use
libgcrypt:
https://github.com/jkivilin/openssh-portable/commit/dd4d06bb47cbbbe3607b9be30f17f1495adbeb12
> An about controlling the parallel handling through the api?
Parallel handling is automatic for cipher modes that can be
parallelizable (depends on your CPU's feature set and what
implementations are available). These are CTR-mode, CBC-decryption,
CFB-decryption, XTS, and OCB. EAX, GCM and CCM modes use CTR for
encryption/decryption and benefit from CTR-mode optimizations too.
Chacha20 and Salsa20 stream ciphers also have parallel block
optimizations.
To utilize this, you need to provide input buffers larger than
blocksize to libgcrypt. For AESNI implementations, you get best
performance starting with buffer size of 8 blocks or 8*16=128
bytes. For Chacha20, you need 4 blocks or 4*64=256 bytes.
-Jussi
>
> Thanks,
>
> Stef
>
> _______________________________________________
> Gcrypt-devel mailing list
> Gcrypt-devel at gnupg.org
> http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
>
More information about the Gcrypt-devel
mailing list