[PATCH] rijndael-aesni: use inline checksumming for OCB decryption
Jussi Kivilinna
jussi.kivilinna at iki.fi
Thu Jun 1 20:25:39 CEST 2023
On 31.5.2023 7.48, Jussi Kivilinna wrote:
> On 30.5.2023 13.32, Werner Koch via Gcrypt-devel wrote:
>> On Sun, 28 May 2023 17:53, Jussi Kivilinna said:
>>
>>> Inline checksumming is far faster on Ryzen processors on i386
>>> builds than two-pass checksumming.
>>
>> That is indeed a large performance boost. Did you had a chance to
>> benchmark it on some common Intel CPU?
>>
>
> I tested now with Intel tigerlake, performance dropped by 9% which is
> unexpectedly large change. I'll try few different things to see if
> I can avoid such drop.
Performance problem I'm seen is limited to Zen4 and only 32-bit execution
mode. Running same code in 64-bit mode does not suffer from this problem.
Seems to be somehow related to mixed XMM/YMM register usage and vzeroupper
instruction not work as expected.
For example, if I disable 8-blocks AES-OCB-enc loop and add following
instructions at the end of 4-blocks loop:
"vpcmpeqd %%ymm0, %%ymm0, %%ymm0\n\t"
"vzeroupper\n\t"
With these instructions, I see approx two times slower performance in
32bit-mode (0.851 cycles/byte vs 0.411 c/B). In 64bit-mode above
instructions slow execution only about 1% (0.418 c/B vs 0.414 c/B).
So, I won't apply this patch after all.
-Jussi
>
> -Jussi
>
>>
>> Shalom-Salam,
>>
>> Werner
>>
>>
>> _______________________________________________
>> Gcrypt-devel mailing list
>> Gcrypt-devel at gnupg.org
>> https://lists.gnupg.org/mailman/listinfo/gcrypt-devel
>
>
> _______________________________________________
> Gcrypt-devel mailing list
> Gcrypt-devel at gnupg.org
> https://lists.gnupg.org/mailman/listinfo/gcrypt-devel
More information about the Gcrypt-devel
mailing list