[PATCH] gpg: Fix regexp sanitization.

John O'Meara john.fr.omeara at gmail.com
Thu Jan 19 07:40:58 CET 2017

Hi Damien and list,

I filed issue 2923, have been thinking about the trust signature domain
restrictions some more, and wanted to discuss a bit.

I don't know the original motivation for escaping everything except the dot.

The patch does fix the common use-case of a simple domain name (either not as a
regex or as gnupg creates a regex), which is all the bug report covered.
However, it does not allow more expressive uses.

Here is an example use case for supporting more of the regex: I might trust an
admin for multiple (sub)domains that they manage, and might be related, but
not for the rest of the infrastructure (or alternatively, I might trust a
senior admin for the whole company, but if the company has merged with
another, there might be multiple domains, and I don't want to trust the admin
for all of '.com'). As a concrete example, I might trust them for
sales.example.com and billing.example.com but not engineering.example.com. My
regex might then look like '<[^>]+[@.](sales|billing)\.example\.com>$'

If every special is escaped outside of the prefix/postfix, then this of course
will not work. I can't create two trust signatures, one with each domain,
because only the most recent is considered. It would feel awkward of me to
require the admin to have two UIDs, so that I could put a different tsig on
each UID, but even if they had two UIDs and I tsigned each differently, that
doesn't work (looks like gpg uses the most recent trust signature on the key
overall, not per UID in the key?). So currently (well, after this patch), I'd
need that admin to have two separate keys if I want to tsign for multiple
domains to work.

Still, we can't leave everything unescaped, because the regex in OpenPGP
doesn't match the regex in POSIX extended regex. From what I can gather
(mostly referencing the Boost documentation on POSIX extended regex and
section 8 of RFC 4880), in POSIX a backslash escape is literal in a character
set (what OpenPGP calls a range), and otherwise causes a special character to
become literal (of the set .[{()\*+?|^$ and note no ']' or '}'). A backslash
preceding any other character is undefined.

OpenPGP regex, on the other hand, states that a backslash makes the next char
literal no matter what (though doesn't discuss how backslash operates in a
range), and OpenPGP doesn't have character class, collating elements,
equivalence classes, or ranged repeats (using curly braces '{' and '}').

A simple converter then, I think, would be to add an escape before a '{'
unless there is one already, and remove any escapes in front of non-special
chars. I think it is unlikely that a legitimate OpenPGP regex would
accidentally look like a character class/collater/equivalence, so we might not
have to worry about that. I don't know what to do about backslashes in a range
(this might be covered in the book on regular expressions mentioned in the
informative references section of rfc 4880)

Of course, for all of this to be generally useful, there'd also need to be a
(non-default) mode when making a trust signature to prevent gnupg from
escaping those characters when saving.

Does this seem reasonable? I can take a stab at a patch if desired, but I
didn't want to work on that if I'm way off here.


John O'Meara

Damien Goutte-Gattat <dgouttegattat at incenp.org> wrote:

> * g10/trustdb.c (sanitize_regexp): Do not escape normal characters.
> --
> The current sanitization code escapes ALL characters in the
> regular expression, including characters that do not have any
> special meaning and only match themselves. Only the dot (.)
> is not escaped.
> This leads to, e.g., 'example.com' being sanitized into
> '\e\x\a\m\p\l\e.\c\o\m', which will then fail to match against
> 'alice at example.com'.
> This patch updates the function to escape only the meaningful
> characters (minus the dot).
> Signed-off-by: Damien Goutte-Gattat <dgouttegattat at incenp.org>
> ---
>  g10/trustdb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> diff --git a/g10/trustdb.c b/g10/trustdb.c
> index c113b7e..e5f32da 100644
> --- a/g10/trustdb.c
> +++ b/g10/trustdb.c
> @@ -1498,7 +1498,7 @@ sanitize_regexp(const char *old)
>      {
>        if(!escaped && old[start]=='\\')
>  	escaped=1;
> -      else if(!escaped && old[start]!='.')
> +      else if(!escaped && strchr("[]$^()|*+?{}", old[start]))
>  	new[idx++]='\\';
>        else
>  	escaped=0;
> -- 
> 2.9.0

More information about the Gnupg-devel mailing list