TOFU Design

Fri Jul 17 14:24:12 CEST 2015

I'd like to informally present the high-level design that I'm working
on for TOFU in GnuPG and some open questions.  I'm interested in
feedback.  But if all you have to say is that you think TOFU is a bad
idea, please restrain yourself.

Thanks!

Neal

Threat Model
============

Recall that TOFU is trying to check the long-term *consistency* of
bindings between identities (as embodied by the OpenPGP user id) and
keys.  That is, the attack we are interested in preventing is: Mallory
creates a key with Alice's identify and sends a mail to Bob signed
with that key so as to trick him into thinking it came from Alice.  If
Bob communicated with Alice before, then TOFU can detect this attack,
because the key used to sign the message has changed.  For
convenience, I'll refer to this as the false key attack.  If there is
a better name for it, please correct me.

Identity
========

The first issue is defining an identity.  In OpenPGP, the identity is,
by convention, an RFC 2822 name-addr, which has the form "John Doe
<jdoe at example.com> (comment)" (without the quotes).  There are three
parts: the name, the email address and the comment.  The question is:
should we use the whole thing as the identity or just some pieces?

In the very least, we should completely ignore the comment field.
That is, a difference in the comment field of two name-addrs should
not mean different identities.  For example, "John Doe
<jdoe at example.com>" and "John Doe <jdoe at example.com> (work)" should be
considered to refer to the same identity.  The reason for this is,
becase the comment field is usually blank and its semantics are not
well defined, most unsophisticated users wouldn't realize that
differences in the comment field mean a different identity.  As such,
when TOFU prompts the user to accept a new identity / key binding
instead of flagging a conflict, most of these users would just blindly
accept the new binding.  The practical result is that allowing the
comment field to define identity makes users susceptible to the false
key attack.

We could use the name, but checking this should probably be reserved
for advanced users.  The first problem is that names are not globally
unique.  In fact, conflicts are frequent.  I think that most people in
the cultures that I'm familiar with probably know a few people with
the same name.  This would cause false negatives (the implementation
identifying conflicts where, in reality, there are none).  After a few
incorrectly identified conflicts: most users will just click the
dialog away, which would defeat the purpose of this system.

Another reason to not use names is that people are liberal in
identifying equivalent names and it is hard to codify this type of
equivalence.  For instance, strcmp would indicate that "John Doe" and
"John A. Doe" are different identities.  Most people, however, would
say they refer to the same person.  As in the comment case, these
false positives simplify the false key attack.

Finally, we'd have to regularize names.  At least for latin and
germanic languages, UTF-8 canonicalization, space compression and down
casing should be enough.  But, I'm not sure about other languages
where letters are combined.

For the same reasons, we don't want to use the combined name and email
address.

This leaves the email address.  Although it is possible to use unicode
to make two email addresses appear visually similar, but compare
differently at the bit level, such discrepancies should be caught by
email clients, which should check that the sender and the signer are
identical.  If they are not, they should issue a warning.  (However,
only kmail actually implements this check, as far as I know.)

In conclusion: I think we should just use the regularized email
address and, perhaps allow checking names for advanced users.  This is
similar to how ssh works.  Making sure the host key for a given ip
address doesn't change is nice for sophisticated users, but it results
in a lot of false positives due to wideuse of a small portion of the
private ip space (i.e., 192.168.1.0/24) and dongles containing the MAC
address, which results in dhcp assigning the same IP to different
hosts.

Note: it is unclear what to do when the OpenPGP User ID is not in RFC
2822 form or there is no email address.

Verification
============

To verify a message, we check a database to see if the identity / key
binding has changed.  Since keys can have multiple user ids, we do
this for each valid user id.

For each user id, we first extract and normalize the email address.

  - If the email address and key binding are known, we are done.

  - If the email address is unknown, but the key is known, then the
    sender added a new uid and we ask the user whether to create a new
    binding (good, bad, decide later).

    Note: we shouldn't silently create the binding.  Consider an
    attacker who emails you (or, perhaps a mailing list you are
    subscribed to) a message signed with the key "John Doe
    <jdoe at example.com>".  Since it doesn't look suspicious, you
    download the key and add the binding to your TOFU database.  Then,
    the attacker adds a new UID "Glenn Greenwald
    <Glenn.Greenwald at theintercept.com>" and your software
    automatically refreshes the key.  Now, messages from the attacker
    with the identity "Glenn Greenwald
    <Glenn.Greenwald at theintercept.com>" will be trusted!

    Note: we could also prompt the user to call the recipient and
    verify the fingerprint manually.  In this case we could offer
    multiple levels of verification.

  - If the email address is known, but the key isn't, the user might
    have a new key.  In this case, we indicate this to the user as
    well as some information about the old key and the new key and ask
    whether to create the new binding.

  - If no bindings contain either the email address or the key, then
    we ask the user whether to accept the new binding.

    One potentially helpful thing we could do at this point is to
    fetch the key from the key server and to conduct a search on the
    email address.  This would help identify potential attacks.

The above method only works if the key is actually available.  Of
course, for new bindings, the key is probably not yet available and if
the user hasn't enabled auto-key-locate (which is disabled by
default), then we can't do the verification nor can we update the
database.  There are a couple of things that we could do here:

  - Issue a warning and suggest enabling auto-key-locate or running
    gpg2 --key-recv KEYID and then reverifing the message.

  - Add the key to a list of pending keys.  This list can be processed
    by, e.g., parcimonie.  Then, the next time a message signed with
    this key is verified, the user will be prompted about the key.

Encryption
==========

When encrypting a message, we should check whether the recipient is
trusted.  This means iterating over each of the uids associated with
the key and checking whether a good binding exists in the database.
If not and the binding is simply unknown, we can proceed as above with
unknown bindings.  If the binding is bad, then we should show an error
message and abort.

Export
======

Should TOFU bindings be exportable?  TOFU reveals the user's social
graph even more than the web of trust.  However, this would allow us
to implement something like the perspectives system [1].  This would
make the false key attack much harder: keys would be verified via
multiple network paths.  On the other hard, since the data is being
provided by untrusted users, it is possible for an attack to poison
the data.  This needs a lot more thought.

   [1]   https://www.cs.cmu.edu/~dga/papers/perspectives-usenix2008/

Additional Metadata
===================

To make understanding binding conflicts easier, we can record
previously seen messages.  In particular, when we verify a message, we
also save the message's hash and the signature creation time keyed on
each identity / key binding.  We need the message's hash to prevent
adding the same message multiple times.  The signature creation time
is taken from the signature packet in the message.

When there is a conflict or the user has added a new user id to a key,
we can show the history of the key(s) (e.g., number of message signed
by this key per month).  If a user has received messages over many
years from one key and none from another, then the new key requires
further scrutiny.

Implementation Details
======================

We are going to use SQLite to store the data rather than a custom
binary format.  SQLite is highly portable and has the nice ACID
properties.  This should significantly simplify the implementation.

It would be nice to make the database synchronizable.  For instance,
when using unison, I can't synchronize my keyring since if I change
both the keyring on my laptop and on my desktop, there is no easy way
to merge them.  By storing the data related to each binding in a
separate file, it should be possible to synchronize most files.  (This
is based on the assumption that updates between two sychronizations by
both computers to the same file are significantly less likely than
updates by both computers in general.)