10x compression factor

David Shaw dshaw at jabberwocky.com
Tue Feb 10 13:29:25 CET 2004


On Tue, Feb 10, 2004 at 08:32:24AM -0800, Steve Butler wrote:
> Plain text should have lots (and even more than lots) of spaces.  Many of
> them consecutively spaced <<grin>>.  There should be lots of other
> characters (even strings of characters) that repeat throughout the file.
> 
> Gzip should be able to do that trick fairly easy.  Pkzip could also do it.
> Since GnuPG uses similar compression routines (ZIP and ZLIB), it should do
> just as well.
> 
> I just generated a 1 Gbyte file that contains only spaces (no newline; no
> <CR><LF>).  That's 1,073,741,824 spaces (all in a row).
> 
> PKZIP compressed this down to 1,048,417 bytes (1024:1)  1Gbyte to just under
> 1 Mbyte.
> gzip  compressed this down to 1,042,077 bytes (1030:1)
> GnuPG encrypted  this down to 1,044,357 bytes (1028:1) (even with the key
> overhead)
> 
> So, 10:1 compression just means that the text files had more random text
> than a string of spaces.  But it wasn't completely random.  Most text files
> will fail most any statistical test for randomness.  Ergo, they compress
> really well!

If you want to talk about good compression ratios, remember that GnuPG
1.3 supports bzip2 compression.  A one gig file of spaces, just like
yours, using the highest compression level in GnuPG:

  ZIP:   1,043,671 (1028:1)
  ZLIB:  1,043,677 (1028:1)
  BZIP2: 833       (1289005:1)

So, bzip2 compresses 1 gigabyte to less than 1 kilobyte. :) Of course,
this is an really artificial situation (all spaces) that plays
directly into a major strength of bzip2, but it's amusing nonetheless.
Note that ZLIB actually does marginally worse in this situation than
ZIP, when under more common circumstances, it would do a good bit
better.

Compression ratios like this make for decompression bombs...

David



More information about the Gnupg-users mailing list