Differentiating GPG data from random data

Tue Nov 25 15:09:09 CET 2008

On Mon, Nov 24, 2008 at 11:21 PM, David Shaw <dshaw at jabberwocky.com> wrote:

> Those bytes will more-or-less work, but as you say won't catch everything.
>  In OpenPGP, the first few octets cover the length and type of the packet,
> so those bytes hardcode a particular length, which is probably not what you
> want.  For example, the "85 02 0e 03" from your example is an old-style
> encrypted session key that is 526 bytes long, which will only match a
> particular key size.
>
> The problem is that OpenPGP has so many different ways to encode a
> particular packet, that writing a rule loose enough to match them all will
> inevitably have a huge number of false positives.  For example, hex 84, 85,
> 86, and C1 can all indicate an asymmetrically encrypted message.  85 is the
> most common (and 84 would be extremely uncommon), but they are all possible.
>  Some OpenPGP programs start with or A8, A9, AA, or CA (though it is
> virtually always A8).  GPG will read such a message, but doesn't generate
> it.
>
> For your purpose, is it better to have false positives or false negatives?
>  That is, it is better to accidentally include some GPG files, or better to
> accidentally exclude some files?  That would help in figuring out how many
> bytes you want to match on.
>
> David

Thank you for the information. It confirms what I thought after
reading the RFCs. It would be better for me to accidentally include
some GPG files rather than accidentally exclude files I'm searching
for. I can manually look at the files and use GnuPG to easily tell the
GPG ones from the non-GPG ones.

Thanks again,
Ted