sha1 hash using libgcrypt different from what returns sha1sum

Tue Nov 12 00:44:49 CET 2013

Denis Corbin wrote:
> first, I hope that I do not sent my questions to the wrong mailing-list.
> 
> I maintain the dar (aka Disk ARchive) backup software that generates
> sha1 or md5 sums of files it generates using libgcrypt. Recently a user
> reported a hash failure on a large file when using sha1sum against the
> hash generated by dar, while at the same time the generated archive was
> not corrupted.
> 
> I could reproduce this problem but only using large files (larger than
> 200 GB and could not clearly find a threshold below which the problem
> never exists and above which it always occurs).
> 
> dar is a somehow large C++ program thus I've extracted the pertinent
> code into a more simple C program for easier review and criticism about
> the way dar calls libgcrypt. This C code (hashsum.c [1]) also reproduces
> the problem. I have re-read the libgcrypt documentation and could not
> find anything wrong... but it's still possible I've missed something...
> 
> To avoid generating large testing files and to reduce execution time of
> the tests the make_sparse_file.c program ([1]) let generate a so called
> sparse file (if the underlying filesystem supports it). Note that using
> dd to create a real plain equivalent file leads to the same difference
> of hash for large files. Note also that if the generated file for
> testing contains a long series of byte set to zero, the problem also
> occurs for more random data pattern (like slice generated by dar).
> 
> The system used is Debian wheezy, on 64 bits AMD host:
>     libgcrypt version based on 1.5.0 (with probable Debian patches)
>     sha1sum (GNU coreutils) 8.13
>     md5sum (GNU coreutils) 8.13
> 
> Using libgcrypt 1.5.3 gives the same result.
> 
> Well I wonder whether this is due to wrong usage of libgcrypt in dar or
> here hashsum.c? Or is it a bug in libgcrypt or in sha1sum and md5sum?

It looks like gcrypt uses 'u32 nblocks' intermediately to count 64-byte blocks,
and then convert it to number of bits appended to last block in sha1_final function.

This u32 counter overflows after processing (2**32)*64 bytes (== 2**38 B == 256
GiB).
Actually, as number of bytes in final blocks will be added to [effectively]
64-bit variable, those "nblocks wraparound effects" will be visible only with
files over (2**38)+63 bytes, very peculiar limit.

I strongly believe this is a bug, I have not found any such behavior in standard
- only limitation for SHA-1 is 2**64 bits (2**61 bytes).

There are exactly same bug with sha256 and md5 implementations (but, curiously,
there are *no* similar problem in sha512).

I checked git master, it should have same problem.

> [1] source code and test results are available for download at:
> 	http://dar.linux.free.fr/libgcrypt-tests.tar.gz  (~ 3 KB)