[PATCH] cipher: Fix SM3 avx/bmi2 compilation error

Tianjia Zhang tianjia.zhang at linux.alibaba.com
Tue Dec 21 08:29:39 CET 2021


Hi Jussi,

On 12/21/21 12:49 AM, Jussi Kivilinna wrote:
> Hello,
> 
> On 20.12.2021 5.23, Tianjia Zhang via Gcrypt-devel wrote:
>> * cipher/sm3-avx-bmi2-amd64.S: Fix assembler errors.
>>
>> -- 
>>
>> There are a lot of the following errors compiling with GNU assembler
>> version 2.27-41:
>>
>>    sm3-avx-bmi2-amd64.S: Assembler messages:
>>    sm3-avx-bmi2-amd64.S:402: Error: 0xf3988a32 out range of signed
>>      32bit displacement
>>
>> The newer GNU assembler does not have this issue. It is likely that
>> the old version of the assembler did not handle it well, but in order
>> to allow libgcrypt to be compiled on more systems, I still fixed this
>> problem, an additional add operation is added to the lea instruction
>> to calculate the sum of three elements. I did a benchmark test on an
>> Intel i5-6200U 2.30GHz CPU and found no significant performance
>> difference.
> 
> 
> Thanks for reporting. However, I think this can be fixed by changing
> K0-K63 macros from hex-format to signed decimal values. Patch attached.
> 

Thanks for your suggestion, this method is feasible, I will try to fix 
this issue.

Best regards,
Tianjia

>>
>> Signed-off-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
>> ---
>>   cipher/sm3-avx-bmi2-amd64.S | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/cipher/sm3-avx-bmi2-amd64.S b/cipher/sm3-avx-bmi2-amd64.S
>> index 93aecacb..4a075d76 100644
>> --- a/cipher/sm3-avx-bmi2-amd64.S
>> +++ b/cipher/sm3-avx-bmi2-amd64.S
>> @@ -206,7 +206,8 @@ ELF(.size 
>> _gcry_sm3_avx2_consts,.-_gcry_sm3_avx2_consts)
>>           /* rol(a, 12) => t0 */ \
>>             roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% 
>> on zen3 */ \
>>           /* rol (t0 + e + t), 7) => t1 */ \
>> -          leal K##round(t0, e, 1), t1; \
>> +          addl3(t0, e, t1); \
>> +          addl $K##round, t1; \
> 
> This is 12% slower on AMD Zen3 (from 7.37 cycles/byte to 8.30 cpb).
> 
> -Jussi
> 
>>             roll2(7, t1); \
>>           /* h + w1 => h */ \
>>             addl wtype##_W1_ADDR(round, widx), h; \
>>



More information about the Gcrypt-devel mailing list