Low level ops?

Stef Bon stefbon at gmail.com
Wed Jun 13 21:21:19 CEST 2018


Op ma 11 jun. 2018 om 22:10 schreef Jussi Kivilinna <jussi.kivilinna at iki.fi>:
>
> Hello,
>

>
> With parallelizable modes, such as CBC decryption and CTR, this test
> no longer measure actual overhead as underlying algorithm changes with
> different chunk sizes. With AES-NI on x86_64, CBC decryption is done
> with 8 parallel blocks (128 bytes), so results below for 16 to 64 chunks
> sizes show how slow non-parallel code is compared to parallel code.
> Results starting with 128 chunks sizes show overhead for parallel code:
>
> AES-CBC/DEC: no-overhead test, full benchmark buffers, speed 0.632 cycles/byte
> AES-CBC/DEC: overhead test, benchmark buffers processed in 16 byte chunks, speed: 6.56 cycles/byte, overhead +937.4%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 32 byte chunks, speed: 3.94 cycles/byte, overhead +523.3%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 64 byte chunks, speed: 1.88 cycles/byte, overhead +197.6%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 128 byte chunks, speed: 1.26 cycles/byte, overhead +99.4%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 256 byte chunks, speed: 0.946 cycles/byte, overhead +49.7%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 512 byte chunks, speed: 0.794 cycles/byte, overhead +25.5%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 1024 byte chunks, speed: 0.711 cycles/byte, overhead +12.5%
> AES-CBC/DEC: overhead test, benchmark buffers processed in 2048 byte chunks, speed: 0.664 cycles/byte, overhead +5.0%
>
> Per call overhead for 128 byte chunks, ~80 cycles.
>

Thanks for the results!

I had to take some time to read the results. I'm not used to interpret
these numbers.
As I understand it the size of the data to be en/decrypted makes the
percentage of the overhead lower.
That makes sense. It looks a lot like the overhead takes a fixed
amount of time (or cycles), independent of the chunk size
and that the overall time is linear increasing with the chunk size,
since the overhead is divided by two when the chunk size is multiplied
by 2.

With ssh/sftp you cannot control the size of the data. Mostly (I'm not
sure) they vary from 128 bytes for a stat to 4096 for directory
listnings and reading files.
So as I see it it would be worth to try to bring back the overhead for
AES-CBC//DEC since they vary from 99% to 12,5%, since the size most
ssh messages is between 128 and 1024 bytes.

You mention parallel mode for AES-CBC/DEC. Is it possible to use this
from the api?
And do you know what counts for chacha20-poly1305 at openssh.com?

I'm working on parallel en/decryption of two or more messages at the same time.

Stef

> > It's also possibe that these checks do not cost anything. I don't know.



More information about the Gcrypt-devel mailing list