10x compression factor

Chris Fox dissectingtable at comcast.net
Tue Feb 10 11:50:56 CET 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adrian 'Dagurashibanipal' von Bidder wrote:
| On Tuesday 10 February 2004 20.15, Steve Butler wrote:
|
|
|>As for novels, some I've (started to) read seemed a lot more repetitive
|>than technical specs.  Perhaps checking the compression ratio should be
|>part of the test for good novels. --tongue in cheek--
|
|
| There is actually a paper somewhere out there (meaning: in some CS
journal)
| about determining the probably author of a text by comparing its
compression
| rate to other texts from known authors.
|
| Of course, the whole thing was tongue in cheek, but IIRC they did have
some
| numbers and the general direction of their research was not just made up.

NOT tongue in cheek: if you have letter-frequency tables as short as
four letters long and set a random-generator based on those tables, you
can often recognize the writer in the generated text, by his writing
style.  With six letters it's very robust but even with four you get
Faulkner-like text from Faulkner letter frequency table.

- --
Chris Fox, Windows User, Linux User (#341856), non-partisan
Since free markets lead directly to monopoly, oligarchy, poverty,
unemployment, and Fascism, they cannot be said to "work" in any
meaningful sense.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFAKTYg9jaRInQzvmsRAoMeAKDaVs44zkNFAdFja+wLe7H0vYb3RQCgk+u5
0aBBz9dzZeOe5tPW43gOQWiIPwMFAUApNiC2gOp1BO9b9hECgx4An2GgirnrQuYh
iEsvENU5w6HVbVXgAKDee5D5+iCOg45mtpr4yG2daL3tSQ==
=IZAE
-----END PGP SIGNATURE-----




More information about the Gnupg-users mailing list