Question about use of --cipher-algo AES & --openpgp

Sun Nov 12 13:18:41 CET 2006

 "Peter S. May" <me at psmay.com> wrote:

<SNIP>

> I don't know how the internals of "file" work.  If I were trying to get
> a generic file-like program to grok OpenPGP, here's probably how I'd go
> about it:
> 
> * If the first non-blank line started "--- BEGIN PGP ", it would
> probably be reasonable to call it armored OpenPGP and perhaps look into
> it further, to figure out a subtype.
> * If the file program decides the file isn't any other type it
> recognizes, take a look at the first byte of the file, which must be a
> valid OpenPGP packet tag.  You could run some or all of these tests
> before passing the file on to GPGME, which would ultimately determine a
> file's reasonable OpenPGP compatibility.  Some assumptions based on bis-18:
> 
> (in pseudocode, of course)
> 
> function is_pgp_packet_tag (byte)
>   if byte & 0xC0 == 0xC0  // new format tag
>     tag_number = byte & 0x3f
>   else if byte & 0xC0 == 0x80 // old format tag
>     tag_number = (byte & 0x3c) >> 2
>   else
>     return false // first bit is always set
> 
>   if tag_number == 0
>     return false  // 0 is reserved
> 
>   // the rest of the assumptions may change with future
>   // versions of the spec and need to be kept up to date
>   if tag_number == 15 or tag_number == 16
>     return false  // 15 and 16 are not currently defined
>   if tag_number >= 20
>     return false
>     // Values 20 to 59 are not currently defined
>     // Values 60 to 63 are defined as private and GPG can't grok them
> 
> After those checks, I would either pass the file on to GPGME or run one
> more heuristic first:  Read a packet header.  If it's valid, extract the
> length it specifies and jump forward that many bytes.  Then repeat.  If
> any of the tags are !is_pgp_packet_tag(), or if the last length
> specifier you find leads you past the end of the file, it's not OpenPGP.
>  Else, it has a significant chance of being formally correct.
> 
> Might be too complicated a check for file, but I think it would work.
> 
> PSM

I was originally only going to respond to the Peter May out of
group.  The more I think about it, that would be the wrong thing
to do.  If what he has is what everybody can live with (I didn't
see any objections) not only for now but into the forseeable
future we are okay.  If you can't live with it, speak up now and
tell us WHERE we are going wrong!  This discussion if continued
will be going out of group.

First, the file command does read into a --armor encrypted file
and from what is on the very first line,  it KNOWS what it is:

$ file TOOMUCH.asc
TOOMUCH.asc: PGP armored data message

It is when you do NOT use --armor (-a) when file doesn't know what
to do with it.  The file command uses the magic database.  On my
system and most Linux systems it would be here but it will be in
different places on different systems:

$ ls -1 /usr/share/file
magic		# human readable for "file" command
magic.mgc	# binary USED by "file" command
magic.mime	# human readable for KMimeMagic
magic.mime.mgc	# binary USED by KMimeMagic

You don't edit these files directly,  They are created from source.
You will NOT see the magic.mime* files if you don't have KDE. To
know a little about magic, just do:

man magic	# this will tell where the magic files are
man file

You can see that the byte order can be easily handled as LONG as
it doesn't start to conflict with something else. The file command
can't use GPGME (what do you if it isn't there?).  file needs to be
self contained except for its database.  If you look for ELF in
the "magic" file, the very first thing you see is:

# ORCA/EZ assembler:
#
# This will not identify ORCA/M source files, since those have
# some sort of date code instead of the two zero bytes at 6 and 7
# XXX Conflicts with ELF

file will NEVER identify that kind of a file because of a conflict
with ELF.  Usually, if there are conflicts, the people submitting
the information will drop it if they have far less files.  It isn't
who is first that trumps the others. It is which file is most likely
to be seen when you have collisions that wins out.  Most people don't
even know what an ORCA/EZ assembler file is.

I picked ELF for a reason.  If you look at how ELF does it you
can see how they handle SOME of the conditionals which need to be
handled for various big-endian / little-endian and chip bit sizes
to arrive at the proper string.  That would give you some idea of
how to pick the proper strings for the encryption types.  The only
problem is, ELF ALWAYS starts with the first four bytes "\177ELF".
We don't have that with a PGP encrypted file.  We have multiple
ways of starting, etc.  There is a slight possibility of unrolling
all that into MULTIPLE definitions but not just ONE.  It still looks
to me like what OpenPGP has done is incompatible with the file program.

If you want to look into it further, I suggest we go off-group to
do it, but ONLY if everybody is happy that your analysis is correct
and COMPLETE!  It looks awfully convoluted to me though (not your
analysis - their multiple ways for creating an encrypted file).
The file command never was designed with what OpenPGP has done in
creating their files in mind.  And if they add even more it will
become even more impossible at putting the information into the
magic database that file uses.  So people better make sure they
use the correct filename extension (.gpg or .pgp) when they create
an OpenPGP encrypted file.  That will probably be all we have to
go on to identify what it is.  We will need the OpenPGP programs
to do the rest of the identification.

HHH

PS If I didn't know better, I would say they designed the
   various file header formats to be incompatible with the file
   command.