Passphrase Encoding and Entropy

Wed Jun 8 22:35:58 CEST 2005

[Oskar L.]
> Thanks for your anwser, but I'm afraid you mostly told me 
> what I already know. What I don't understand is how this 
> relates to breaking passphrases.
> For example, say I use the passphrase foobar. It has 6 
> characters, each represented by 8 bits, so it will be 
> represented by 46 bits. These 46 bits are then the key used 
> to symmetrically encrypt/decrypt my secret key, right? 
> (Another question; is salt added to it, and/or is it hashed?)

There are a number of options in OpenPGP for converting a pass phrase to
a symmetric encryption key. They are called "String-to-Key (S2K)
Specifiers" and are listed in RFC-2440. Some are salted, others
unstalted, and some use more than one iteration of the hash function.
Many different hash functions are supported, although MD5 and SHA-1 are
the most commonly used.

I believe the GnuPG default settings work like this: the passphrase is
stored as unicode (UTF-8 encoding maybe?) in memory, the random 64-bit
salt is read from the password file and tacked on to the begginnging of
your passphrase. The entire salt + passphrase string is hashed with
SHA-1. That generates the encryption key used to encrypt the private key
or file with 3DES. 

> Now if the attacker knows that I have only used the 23 
> characters a-z in the passphrase, then she/he can represent 
> all of them using 5 bits. But I don't understand how this 
> helps the attacker, since foobar represented this way would 
> obviously produce a completely different key (of only 30 bits).

It helps the attacker because the attacker only needs to try 26^6
combinations instead of 95^6 combinations. That is a big difference. The
all-lower case password would require ~2^27 guesses on average to crack.
The one using six of the full 95 characters on a U.S. key would require
about 2^38 guesses. That makes it ~2000 times harder to crack (but still
very weak considering all the multi-core multi-GHz computers cheaply
available out there).

The fact that the attacker must take the time to make the SHA-1 hash of
each guess doesn't really figure into the discussion, since making that
hash takes the same amount of time for every guess - it is a "constant
time" factor that drops out when comparing the relative strength of one
passphrase vs. another.

> I thought that all this was about time and the amount of data 
> you have to deal with. That, for example, when someone is 
> brute forcing a passphrase it would be 25% faster if all the 
> characters used were represented with only 6 bits instead of 
> 8. I understand why this would be faster, but not who it 
> could possibly work.

It works because the attacker just has to make their password-guessing
program smart enough to skip the unused characters. They start with
"a","b"... then to "aa","ab","ac".... and finally all the way up to
"zzzzzy","zzzzzz". If they had to cycle through all the combinations of
all the characters on the keyboard, instead of just lower case, there
would be 2000 times as many steps in this process. They wold have to go
through combinations like "4Av%#."

Regards,
	Ryan