[PATCH 1/1] whirlpool hash amd64 assembly

Jussi Kivilinna jussi.kivilinna at iki.fi
Mon Sep 1 18:15:03 CEST 2014


On 29/08/14 18:45, And Sch wrote:
<snip>
> 
> That is more than twice as fast as the original on the Atom system.
> 
> I tried to find a way to use macros to sort out parts of the loop, but any change in the order of the instructions slows it down a lot. There are also only 7 registers available at one time in most parts of the loop, so that makes macros and rearrangements even more difficult.
> 
> I used a little endian version of the last patch I posted and gcc -funroll-loops to generate this assembly. I've looked through it and tried to organize it as best I can. Suggestions on how to clean it up further would be helpful.
> 

I don't agree that this is good method for creating assembly implementations. As I see it, the main point with assembly implementations is that you can do optimizations that compiler has no way of finding. For example, you could load indexes to rax/rbx/rcx/rdx registers that allow extracting not only first index byte but also second byte with just one instruction. Or, use XMM registers to store the key[] and state[] arrays instead of stack.

Well, I ended up making such implementation, which I've attached. On Intel i5-4570 (3.6 Ghz turbo), I get:

> tests/bench-slope --cpu-mhz 3600 hash whirlpool
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      4.28 ns/B     222.7 MiB/s     15.42 c/B

-Jussi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01-whirl-amd64-asm.patch
Type: text/x-patch
Size: 15806 bytes
Desc: not available
URL: </pipermail/attachments/20140901/b0af8943/attachment-0001.bin>


More information about the Gcrypt-devel mailing list