Is it safe to rename file.gpg to `md5sum file`?

Robert J. Hansen rjh at sixdemonbag.org
Wed Dec 5 07:20:28 CET 2012


On 12/4/2012 3:03 PM, sben1783 wrote:
> Yes, I meant to use the MD5 checksum of the original file, not its
> original name. I'm still interested whether this would be "insecure"?

Let's not even use the word insecure, since that word is wholly
subjective: there's no agreed-upon definition for what it means.
Instead, let me ask a different question: what, precisely, are you
trying to accomplish?

> And, by the way, how could the hash of a filename be used to reconstruct
> the filename (as atom says "... makes recovery of the db possible ...")
> There is no such thing as inverse-md5sum, is there? You'd still need
> "brute force" to find the original name?

Sure, of course there's inverse-md5sum -- after a fashion.  Many files
bear names that are just a couple of short words: "Tuesday report.doc",
for instance.  So you go through a dictionary and compile a list of
every one and two--word filename, separating by underscores and spaces,
and using the top 100 file extensions.

There are about 5,000 words in common usage in English.  (A native
speaker will have a larger vocabulary, but you can get by quite well on
5,000 words.)  Every possible one and two-word combo from this list
would amount to about 25 million entries in the database: multiply by,
say, four, to represent different conventions for capitalization and
spacing and whatnot, brings you to 100 million.  Multiply by the top 100
file extensions and you get 10 billion.  Each of those records would
require about 100 bytes of storage, or 1 trillion bytes.

You could easily store it on a $100 hard drive.

This is what's called a "dictionary attack."  There are other much
better ways to attack it: rainbow tables, for instance.  But this is
enough to show you that MD5 is nowhere near as hard to reverse as you
might think.  If you're creating filenames based on the MD5 hash of the
entire file, that would be (probably) nontrivial to reverse: if you're
creating filenames based on the MD5 hash of the original filename,
you're playing with fire.

That said: please don't use MD5.  Please use a better, stronger hash
algorithm like SHA-256, SHA-512 or SHA-3 instead.





More information about the Gnupg-users mailing list