[PATCH 1/1] whirlpool hash amd64 assembly
    Jussi Kivilinna 
    jussi.kivilinna at iki.fi
       
    Tue Sep  2 17:06:28 CEST 2014
    
    
  
On 02/09/14 04:02, And Sch wrote:
> That is very impressive. The goal is accomplished then, I just wanted a faster whirlpool hash in gnupg. I'm no good with assembly, so I have no hope of doing better than the compiler. You may want to title the assembly as sse-amd64 now.
> 
> Thanks
Did you have change to run the implementation on Atom? I'd be very interested to know how's the performance there.
-Jussi
ps. Please keep mailing-list in CC.
> 
>> -----Original Message-----
>> From: jussi.kivilinna at iki.fi
>> Sent: Mon, 01 Sep 2014 19:15:03 +0300
>> To: gcrypt-devel at gnupg.org
>> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
>>
>> On 29/08/14 18:45, And Sch wrote:
>> <snip>
>>>
>>> That is more than twice as fast as the original on the Atom system.
>>>
>>> I tried to find a way to use macros to sort out parts of the loop, but
>>> any change in the order of the instructions slows it down a lot. There
>>> are also only 7 registers available at one time in most parts of the
>>> loop, so that makes macros and rearrangements even more difficult.
>>>
>>> I used a little endian version of the last patch I posted and gcc
>>> -funroll-loops to generate this assembly. I've looked through it and
>>> tried to organize it as best I can. Suggestions on how to clean it up
>>> further would be helpful.
>>>
>>
>> I don't agree that this is good method for creating assembly
>> implementations. As I see it, the main point with assembly
>> implementations is that you can do optimizations that compiler has no way
>> of finding. For example, you could load indexes to rax/rbx/rcx/rdx
>> registers that allow extracting not only first index byte but also second
>> byte with just one instruction. Or, use XMM registers to store the key[]
>> and state[] arrays instead of stack.
>>
>> Well, I ended up making such implementation, which I've attached. On
>> Intel i5-4570 (3.6 Ghz turbo), I get:
>>
>>> tests/bench-slope --cpu-mhz 3600 hash whirlpool
>> Hash:
>>                 |  nanosecs/byte   mebibytes/sec   cycles/byte
>>  WHIRLPOOL      |      4.28 ns/B     222.7 MiB/s     15.42 c/B
>>
>> -Jussi
>>
>> _______________________________________________
>> Gcrypt-devel mailing list
>> Gcrypt-devel at gnupg.org
>> http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
> 
> ____________________________________________________________
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
> Check it out at http://www.inbox.com/earth
> 
> 
> 
    
    
More information about the Gcrypt-devel
mailing list