Passphrase Encoding and Entropy

Martin Geisler mgeisler at mgeisler.net
Mon Jun 6 13:48:58 CEST 2005


"Oskar L." <oskar at rbgi.net> writes:

> Also, let's say it is known that the characters in a passphrase has
> been selected from the 64 ASCII characters A-Z, a-z, 0-9, # and $.
> This will give each character an entropy of 6 bits (log2(64)), witch
> if I understand correctly means that 6 of the 8 bits used to
> represent the character are unknown. But can you in real life tell
> witch six, for example for the character A, witch in binary is
> 01000001? The first zero will of course be known, but is there a
> second known digit?

When you have 64 different possibilities, all of equal likelyhood,
then you can code them using 6 bit. This is what the entropy tells
you.

The fact that A in the 7-bit ASCII standard is 01000001 is just a
coincedence --- they could just as well have put your chosen 64
characters into the lower 6 bits, and then have the other 64 available
characters use the high bit.

In general it doesn't change anything if you encode your message (a
passphrase in your example) in a different encoding: the amount of
information stays the same if you still just select your characters
From the same subset.


So making a passphrase of ASCII characters, and then encoding it using
UTF-16 doesn't make it more secure. Sure, with UTF-16 gives you the
potential to encode something like 2^16-1 characters, but to calculate
the entropy you can disregard all characters which you will never
choose.

The formula for entropy explains this:

  H(X) = - sum_{i=1}^n p(i) * log_2(p(i))

Here the p(i)'s are the probability that your message will be "i".
With a bigger space of possible messages (a bigger n) then the sum
contains more terms, but if you still select your message from the
same small set, then most of the terms will be zero. So if a message
of value "j" is, say, a Chinese passphrase, then p(j) = 0 would mean
that know that you'll never such a passphrase. And thus the term
disappears from the sum (well, actually you get a problem with taking
the logarighm of zerolet's not go into that).


See http://en.wikipedia.org/wiki/Information_entropy for more on how
to calculate the entropy, but I hope this helped a bit.

-- 
Martin Geisler                                     GnuPG Key: 0x7E45DD38

PHP EXIF Library      |  PHP Weather             |  PHP Shell
http://pel.sf.net/    |  http://phpweather.net/  |  http://mgeisler.net/
Read/write EXIF data  |  Show current weather    |  A shell in a browser
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : /pipermail/attachments/20050606/5203313e/attachment.pgp


More information about the Gnupg-users mailing list