10x compression factor
David Shaw
dshaw at jabberwocky.com
Tue Feb 10 13:29:25 CET 2004
On Tue, Feb 10, 2004 at 08:32:24AM -0800, Steve Butler wrote:
> Plain text should have lots (and even more than lots) of spaces. Many of
> them consecutively spaced <<grin>>. There should be lots of other
> characters (even strings of characters) that repeat throughout the file.
>
> Gzip should be able to do that trick fairly easy. Pkzip could also do it.
> Since GnuPG uses similar compression routines (ZIP and ZLIB), it should do
> just as well.
>
> I just generated a 1 Gbyte file that contains only spaces (no newline; no
> <CR><LF>). That's 1,073,741,824 spaces (all in a row).
>
> PKZIP compressed this down to 1,048,417 bytes (1024:1) 1Gbyte to just under
> 1 Mbyte.
> gzip compressed this down to 1,042,077 bytes (1030:1)
> GnuPG encrypted this down to 1,044,357 bytes (1028:1) (even with the key
> overhead)
>
> So, 10:1 compression just means that the text files had more random text
> than a string of spaces. But it wasn't completely random. Most text files
> will fail most any statistical test for randomness. Ergo, they compress
> really well!
If you want to talk about good compression ratios, remember that GnuPG
1.3 supports bzip2 compression. A one gig file of spaces, just like
yours, using the highest compression level in GnuPG:
ZIP: 1,043,671 (1028:1)
ZLIB: 1,043,677 (1028:1)
BZIP2: 833 (1289005:1)
So, bzip2 compresses 1 gigabyte to less than 1 kilobyte. :) Of course,
this is an really artificial situation (all spaces) that plays
directly into a major strength of bzip2, but it's amusing nonetheless.
Note that ZLIB actually does marginally worse in this situation than
ZIP, when under more common circumstances, it would do a good bit
better.
Compression ratios like this make for decompression bombs...
David
More information about the Gnupg-users
mailing list