[PATCH] rijndael-aesni: use inline checksumming for OCB decryption

Thu Jun 1 20:25:39 CEST 2023

On 31.5.2023 7.48, Jussi Kivilinna wrote:
> On 30.5.2023 13.32, Werner Koch via Gcrypt-devel wrote:
>> On Sun, 28 May 2023 17:53, Jussi Kivilinna said:
>>
>>> Inline checksumming is far faster on Ryzen processors on i386
>>> builds than two-pass checksumming.
>>
>> That is indeed a large performance boost.  Did you had a chance to
>> benchmark it on some common Intel CPU?
>>
> 
> I tested now with Intel tigerlake, performance dropped by 9% which is
> unexpectedly large change. I'll try few different things to see if
> I can avoid such drop.

Performance problem I'm seen is limited to Zen4 and only 32-bit execution
mode. Running same code in 64-bit mode does not suffer from this problem.
Seems to be somehow related to mixed XMM/YMM register usage and vzeroupper
instruction not work as expected.

For example, if I disable 8-blocks AES-OCB-enc loop and add following
instructions at the end of 4-blocks loop:
		    "vpcmpeqd %%ymm0, %%ymm0, %%ymm0\n\t"
		    "vzeroupper\n\t"

With these instructions, I see approx two times slower performance in
32bit-mode (0.851 cycles/byte vs 0.411 c/B). In 64bit-mode above
instructions slow execution only about 1% (0.418 c/B vs 0.414 c/B).

So, I won't apply this patch after all.

-Jussi

> 
> -Jussi
> 
>>
>> Shalom-Salam,
>>
>>     Werner
>>
>>
>> _______________________________________________
>> Gcrypt-devel mailing list
>> Gcrypt-devel at gnupg.org
>> https://lists.gnupg.org/mailman/listinfo/gcrypt-devel
> 
> 
> _______________________________________________
> Gcrypt-devel mailing list
> Gcrypt-devel at gnupg.org
> https://lists.gnupg.org/mailman/listinfo/gcrypt-devel