Passphrase Encoding and Entropy

Oskar L. oskar at rbgi.net
Tue Jun 7 23:16:05 CEST 2005


"Martin Geisler" <mgeisler at mgeisler.net> wrote:

> When you have 64 different possibilities, all of equal likelyhood,
> then you can code them using 6 bit. This is what the entropy tells
> you.
>
> The fact that A in the 7-bit ASCII standard is 01000001 is just a
> coincedence --- they could just as well have put your chosen 64
> characters into the lower 6 bits, and then have the other 64 available
> characters use the high bit.
>
> In general it doesn't change anything if you encode your message (a
> passphrase in your example) in a different encoding: the amount of
> information stays the same if you still just select your characters
> From the same subset.
>
>
> So making a passphrase of ASCII characters, and then encoding it using
> UTF-16 doesn't make it more secure. Sure, with UTF-16 gives you the
> potential to encode something like 2^16-1 characters, but to calculate
> the entropy you can disregard all characters which you will never
> choose.
>
> The formula for entropy explains this:
>
>   H(X) = - sum_{i=1}^n p(i) * log_2(p(i))
>
> Here the p(i)'s are the probability that your message will be "i".
> With a bigger space of possible messages (a bigger n) then the sum
> contains more terms, but if you still select your message from the
> same small set, then most of the terms will be zero. So if a message
> of value "j" is, say, a Chinese passphrase, then p(j) = 0 would mean
> that know that you'll never such a passphrase. And thus the term
> disappears from the sum (well, actually you get a problem with taking
> the logarighm of zerolet's not go into that).
>
>
> See http://en.wikipedia.org/wiki/Information_entropy for more on how
> to calculate the entropy, but I hope this helped a bit.

Thanks for your anwser, but I'm afraid you mostly told me what I already
know. What I don't understand is how this relates to breaking passphrases.
For example, say I use the passphrase foobar. It has 6 characters, each
represented by 8 bits, so it will be represented by 46 bits. These 46 bits
are then the key used to symmetrically encrypt/decrypt my secret key,
right? (Another question; is salt added to it, and/or is it hashed?)

Now if the attacker knows that I have only used the 23 characters a-z in
the passphrase, then she/he can represent all of them using 5 bits. But I
don't understand how this helps the attacker, since foobar represented
this way would obviously produce a completely different key (of only 30
bits).

I thought that all this was about time and the amount of data you have to
deal with. That, for example, when someone is brute forcing a passphrase
it would be 25% faster if all the characters used were represented with
only 6 bits instead of 8. I understand why this would be faster, but not
who it could possibly work.

Oskar



More information about the Gnupg-users mailing list