Understanding TOFU statistics

Wed Sep 14 15:56:31 CEST 2016

When using the TOFU trust model, gpg shows statistics about observed
signatures made by and mails encrypted to the key is question.  For
instance, consider the following:

  $ gpg -r 0xF2AD85AC1E42B367 -a -e
  gpg: wk at g10code.com: Verified 6 signatures in the past 21 days, and encrypted
       15 messages in the past 35 days.
  gpg: wk at gnupg.org: Verified 6 signatures in the past 21 days, and encrypted
       15 messages in the past 35 days.
  gpg: werner at eifzilla.de: Verified 6 signatures in the past 21 days, and
       encrypted 15 messages in the past 35 days.
  gpg: Warning: if you think you've seen more signatures by this key and these
       user ids, then this key might be a forgery!  Carefully examine the email
       addresses for small variations.  If the key is suspect, then use
         gpg --tofu-policy bad 80615870F5BAD690333686D0F2AD85AC1E42B367
       to mark it as being bad.

In this email, I want to briefly explain why the statistics are
important and, most importantly, why we need to show statistics on a
per user id / key-basis and not just on a per-key basis.  But first,
please recall: the only reason to use TOFU is to protect against an
active adversary; using --trust-model always is sufficient to protect
against passive adversaries.

The point of TOFU is to protect the user id / key bindings.  If we
detect a conflict, that is, there are multiple keys associated with a
single user id (or rather, email), then the user may be under attack
(MitM or forgery).  But, a conflict can also arise when key rotation
is not done properly (specifically, if there is no cross signature
between the keys).

The first reason to show the statistics is to help the user
distinguish the currently used key from the new key in the case of a
conflict.  Note: just because one key has higher signature and
encryption counts doesn't mean that it is the right key.  As such, we
encourage the user to contact the assumed owner.

Second, statistics can help identify phishing / forgeries.  To
circumvent TOFU, an intelligent attacker won't simply use a known
email address, but will use a mimicry, a visually similar user id
(e.g., a user id in which the a:s are replaced with Cyrillic a:s).
Specifically, by always showing the statistics, and a warning when the
counts are small, the user will hopefully notice if the count for a
usual communication partner has dropped to zero and investigate the
problem.

In the preceding two examples, showing the statistics on a per-key
basis is probably sufficient.  However, consider this scenario: an
attacker creates a key with the user name Alice, and exchanges several
messages with Bob.  At this point, Alice's key appears to be trusted
since there are a sufficient number of messages.  Now, the attacker
adds a new user id, Mallory, to the key and sends a message to Bob.
If we were to just show the key's statistics, then Mallory's key will
appear to be well known and possibly trick him into being phished.
But, by showing the statistics for the individual bindings, the user
will see that the user id is actually new.

Note: if we know the user id that was used, then we only have to
display the statistics for that binding.  This is normally the case
for email messages: we can use the mail's "From" header and, if
available, the signer id in the signature packet.  But, if this
information is not available, we need to show the statistics for all
bindings, as far as I can tell.

:) Neal