Upgrading from gpg1 to gpg2: lots of trouble, need help

Mon Dec 18 10:01:02 CET 2017

Hi,

Happy Holidays!

I'm migrating from gpg1 to gpg2 and am having lots of
trouble. I apologise for the long email but it's been a
saga and others may encounter the same problems I did
and I have some (possibly stupid) suggestions and some
questions that I need answers for.

For most of my decryption use cases I can't use a
pinentry program. Instead, I have to start gpg-agent in
advance (despite what its manpage says) with
--allow-preset-passphrase so that I can then use
gpg-preset-passphrase so that when gpg is run later, it
can decrypt unaided.

Previously, on ubuntu14 and debian8, with (I think)
gpg-1.4.x and gpg-agent-2.0.x it worked fine but I had
great trouble getting it to work on ubuntu16 (with
gpg2-2.1.11) and debian9 (with gpg-2.1.18) and on
macos-10.11.6 (with macports gpg-2.2.3).

Suggestion 1
------------

Some of my troubles were due to gpg-preset-passphrase
needing the keygrip and no longer working with the
fingerprint as the cache id. It would accept the
fingerprint without error but when I tried to decrypt,
gpg would just hang there until I killed it. It wasn't
until I discovered that I needed to use the keygrip
that gpg could decrypt. This happened on the mac with
gpg-2.2.3.

If gpg-preset-passphrase doesn't work with fingerprints
anymore, maybe it could identify when a fingerprint has
been used and let the user know that they need to use
the keygrip instead. An error message to that effect
would have saved me a lot of time. Or it could just
fetch the keygrip that corresponds to the supplied
fingerprint. But maybe this isn't possible.

Suggestion 2
------------

I think much of the rest of my troubles had to do with
the keyring migration needing to have happened before
gpg tried to decrypt anything but it hadn't happened. I
remember at some point while testing something manually
the keyring migration happened and then gpg started
working. But it's all a bit of a blur. I spent several
days and nights on this and my brain was quite frazzled
at the time. Keyring migration seems to happen
automatically when performing some operations but not
all. Possibly because I'm using gpg-preset-passphrase.
Maybe it could be triggered in more places.

And another thing...
--------------------

I also discovered that I need to disable systemd's
handling of gpg-agent (on debian9 with gpg-2.1.18) if I
want to control when gpg-agent starts and stops and
which options are passed to it. I know this is not
recommended but I've had too much trouble in the past
with systemd thinking that it knows when a "user" has
"logged out" and then deciding to "clean up" causing me
masses of grief that I just can't bring myself to trust
it to know what it's doing.

I've disabled systemd's handling of gpg-agent on the
debian9 hosts with:

  systemctl --global mask --now gpg-agent.service
  systemctl --global mask --now gpg-agent.socket
  systemctl --global mask --now gpg-agent-ssh.socket
  systemctl --global mask --now gpg-agent-extra.socket
  systemctl --global mask --now gpg-agent-browser.socket

(from /usr/share/doc/gnupg-agent/README.Debian)

I know someone on the internet has expressed
unhappiness about people doing this and not being happy
about supporting people who do it but please just pretend
that it's a non-systemd system. Not everything is Linux
after all. Gnupg should still work.

Question 1
----------

The most important use case I have is where a host will
ssh to another host which performs decryption on its
behalf. The second host has to be prepared first by me
starting a gpg-agent and presetting the passphrase for
a limited time so that it is ready to decrypt when the
other host connects.

On the decrypting host, I run a command that does
something like:

  sudo -u thing --set-home -- gpgconf --kill gpg-agent

  screen -- \
  sudo -u thing --set-home -- \
  gpg-agent --homedir /etc/thing/.gnupg \
    --allow-preset-passphrase \
	--default-cache-ttl 3600 \
	--max-cache-ttl 3600 \
	--daemon -- \
  bash --login

(Then /etc/thing/.bash_login runs gpg-preset-passphrase)

While these screen/sudo/gpg-agent/bash processes are
running, the first host can connect with ssh and run a
single command that will decrypt and retrieve some
data. I can detach from the screen session knowing that
this access will last for 3600 seconds or until I come
back and terminate the screen/sudo/gpg-agent/bash
session.

I've managed to get this working again on the ubuntu16
host with gpg-2.1.11 but on the debian9 host with
gpg-2.1.18 (but with systemd handling of gpg-agent
disabled), it doesn't work. If I run the decryption
command from within the screen/bash session, it works,
and the only gpg-agent process is the one created by
the above commands:

  gpg-agent --homedir /etc/store/.gnupg --allow-preset-passphrase \
    --default-cache-ttl 3600 --max-cache-ttl 3600 --daemon -- \
    /bin/bash --login

But as soon as the first host connects via ssh (and
tries to run gpg), there is a new gpg-agent process as
well as the one above:

  gpg-agent --homedir /etc/store/.gnupg --use-standard-socket --daemon

And the decryption no longer works from the ssh
connection or from the screen/sudo/gpg-agent/bash
session.

I would have thought that, now that the use of the
standard socket is mandatory, this wouldn't happen. It
seems as though, when the ssh connection ran gpg, it
ignored the existing gpg-agent and started a new
gpg-agent which took over the standard socket. Maybe
not, there are several standard sockets including what
looks like an ssh-specific one:

 0 srwx------ 1 thing thing 0 Dec 18 14:23 S.gpg-agent
 0 srwx------ 1 thing thing 0 Dec 18 14:23 S.gpg-agent.browser
 0 srwx------ 1 thing thing 0 Dec 18 14:23 S.gpg-agent.extra
 0 srwx------ 1 thing thing 0 Dec 18 14:23 S.gpg-agent.ssh

On the ubuntu16 host where this is working, there is
only the S.gpg-agent socket.

Previously, with gpg-agent-2.0.x, I would tell
gpg-agent to write its environment variables to a file
that the incoming ssh connection could use to connect
to that gpg-agent. Now that's impossible and it seems
that gpg is starting a separate gpg-agent with a
separate socket for the incoming ssh connection.

Can anyone help me to get this situation working on the
debian9 host?

Would this work?

  ln -s S.gpg-agent S.gpg-agent.ssh

or is that just wishful/deranged thinking?

I'm delighted (i.e. able to stop panicking) that I
managed to get it working on the ubuntu16 host but I
really need to have this working on multiple hosts and
all the others are recently upgraded debian9 hosts
where it doesn't work. And eventually, the ubuntu host
will no doubt get a version of gpg that behaves like
the one on the debian9 host.

I really really need to get this working.

Any help would be greatly appreciated.

Question 2
----------

There is another thing that I don't understand that I'd
like to. I'd like to be able to tell, before running
gpg, whether or not gpg-agent currently has a cached
passphrase. I found a method on the internet that
became this:

  gpg_userid="user at domain.org"
  gpg_cache_id="`gpg2 --fingerprint --with-keygrip $gpg_userid | \
    grep '^ ' | tail -1 | sed -e 's/^.*= *//'`"
  echo "GET_PASSPHRASE --no-ask $gpg_cache_id Error Prompt Desc" | \
    gpg-connect-agent --no-autostart | grep -q OK && echo OK || echo ERR

And it seemed to work ok until I realised that whether
it reported that the passphrase was present or not was
not always related to whether or not gpg would be able
to decrypt unaided. That wasted a lot of my time too. :-)

I set up something like the following shell functions:

  export GPG_TTY="`tty`"

  [ -d /usr/lib/gnupg2 ] && PATH="$PATH:/usr/lib/gnupg2" # debian/ubuntu
  [ -d /opt/local/libexec ] && PATH="$PATH:/opt/local/libexec" # macports

  gpg_userid="user at domain.org"
  gpg_keygrip="`gpg2 --fingerprint --with-keygrip $gpg_userid | \
    grep '^ ' | tail -1 | sed -e 's/^.*= *//'`"

  function gpgcheck()
  {
    echo "GET_PASSPHRASE --no-ask $gpg_keygrip Error Prompt Desc" | \
      gpg-connect-agent --no-autostart | grep -q OK && echo OK || echo ERR
    ps auxwww | grep '[g]pg-agent'
  }

  function gpgstart()
  {
    gpgconf --kill gpg-agent
    gpg-agent --allow-preset-passphrase --default-cache-ttl 3600 \
       --max-cache-ttl 3600 --daemon
    askpass | gpg-preset-passphrase --preset "$gpg_keygrip"
  }

  function gpgstop()
  {
    gpgconf --kill gpg-agent
  }

And sure enough, after gpgstart, gpgcheck would report
that the passphrase was present and gpg could decrypt
unaided but at some later point, gpgcheck would report
that the passphrase wasn't present but gpg could still
decrypt unaided. It would be nice to have an
explanation of this behaviour and it would be nice to
know how to reliably check whether or not gpg-agent has
the passphrase cached. But it's not essential. As long
as I know that I can't trust this method, I know not to
rely on it. But it would be nice to have a method that
I could rely on.

This might have something to do with the multiple
standard sockets being used by different processes.

Question 3
----------

I have another use case that I also haven't managed to
get working. This is a new use case that I didn't have
working before migrating to gpg2. The above
gpgstart/gpgcheck/gpgstop functions were created while
trying to get this working.

I use ansible to do things on a small number of
servers. Each server has a different sudo password.
Ansible on its own doesn't cater for this situation but
it's possible to get ansible to run a program to get
sudo passwords for each host. I've set up the "pass"
program to store these passwords in individual
gpg-encrypted files so that ansible can fetch them
automatically.

Since ansible will start up many processes in parallel,
all needing to decrypt a sudo password without my
interaction, a pinentry program can't be used. I need
to preset the passphrase before running ansible but
when I do, it doesn't work. I run gpgstart and enter
the passphrase. Then I run gpgcheck and it reports that
the passphrase is present. Then I run ansible e.g.:

  ansible all -b -m shell -a "echo yes"

However, it seems that as soon as I start ansible, the
gpg-agent loses the passphrase and I'm bombarded with
pinentry-curses processes. It all gets a bit crazy and
at best, my xterm's tty settings are all messed up
(i.e. if I type anything afterwards, it's all
gibberish) and I have to kill the xterm. At worst, my
laptop ends up filled with pinentry-curses processes,
all hammering the CPU, and I have to kill them as well
or force a shutdown.

Just before I start ansible, gpgcheck shows OK. As soon
as I start ansible, gpgcheck (in another xterm) shows
ERR (but the agent is still running). I know I said
that what gpgcheck reports doesn't always reflect gpg's
ability to access the passphrase to decrypt but in this
case (at least soon after gpgstart), it does seem to be
telling the truth.

This is on macos-10.11.6 with macports gpg-2.2.3.

Does anyone have any idea what might be going wrong
here?

An additional gpg-agent process does get automatically
started while this is happening:

  gpg-agent --homedir /Users/me/.gnupg --use-standard-socket --daemon

Which no doubt has something to do with it. But I
don't understand why it refused to use the gpg-agent
process that already existed.

I just tried it again and managed to see this error
message:

  gpg: waiting for lock (held by 20749)

The process with pid 20749 is:

  gpg2 -d --quiet --yes --compress-algo=none --no-encrypt-to \
    --batch --use-agent /Users/me/.password-store/ansible/s2.gpg

That would have been started by "pass".

And eventually I saw: "gpg: decryption failed: No secret key"

Some of ansible's subprocesses will work and some won't
so maybe some are getting the passphrase before it
disappears.

I saw this in gpg-agent's manpage:

  SIGHUP This signal flushes all cached passphrases...

Is it possible that something here is sending gpg-agent
a SIGHUP?

If so, is there a way to prevent that?

Or maybe it has to do with the multiple standard
sockets as well.

Question 4
----------

One last use case. I have a .vimrc config that
automatically decrypts gpg files upon opening and
encrypts them upon writing. With gpg1, I could enter
the passphrase each time I opened an encrypted file and
it was fine. Now that the use of gpg-agent is mandatory
and pinentry programs always get used, I have a
problem. As far as I am aware, no single pinentry
program will work for all of my uses of vim. I use vim
in xterm or Terminal, sometimes locally, sometimes over
ssh. I also use macports MacVim in the mac windowing
system and an X11 gvim in fullscreen X11.

I'd rather not use pinentry-mac because it will take me
out of fullscreen X11 mode if I'm there. And if I'm
logged into the host via ssh from elsewhere I imagine
it probably won't work at all. I don't want to use the
curses pinentry either because while it will work
inside vim in an xterm, it won't work in MacVim or in
an X11 gvim window which is my most common way of using
vim. What I'd really like, is either the ability to not
use gpg-agent (unlikely) or a non-gui, non-curses
pinentry program that just printed a prompt to stdout
and read the passphrase from stdin. That would work in
vim and gvim and MacVim windows whether I am logging in
locally or remotely. Macports won't let me install pgp1
and pgp2 at the same time and I get the impression that
debian doesn't want me installing pgp1 either. It says
it's deprecated which is a great shame.

So if anyone knows of a non-gui non-curses pinentry
program, please let me know (preferably one that
doesn't hammer the CPU). I've had to resort to
presetting a passphrase in a gpg-agent before editing a
gpg-encrypted file which is ok but I'd rather be able
to enter the passphrase from within gvim like I use to.

Thanks in advance,
raf