[gpgme] fork() problem

Wed Feb 21 01:56:07 CET 2007

Hi Stephan,

I want to take a couple of steps back and maybe we can get more light
into these issues.

At Wed, 14 Feb 2007 16:32:03 +0100,
Stephan Menzel <smenzel at gmx-gmbh.de> wrote:
> 
> [1  <multipart/signed (7bit)>]
> [1.1  <text/plain; iso-8859-1 (quoted-printable)>]
> Am Dienstag, 13. Februar 2007 15:43:56 schrieb Stephan Menzel:
> > What I'm afraid of now is, that gpgme might fork() in order to do it's
> > magic and mess up it's own mutexes (or mine) doing so.
> > Does all that say anything to you guys or am I completely mistaken here?
> 
> I have some additional info here.
> I played around a bit and I tried to wrap my own mutex around the whole thing 
> and disable it's internal locking by linking against libgpgme.so.11 instead 
> of libgpgme-pthread.so.11 even though it's a multithreaded app.

Did you serialize *all* calls into GPGME properly, with memory barrier
(a single global mutex locked around all calls should do the trick)?

> And indeed, I couldn't see anymore of those problems. Instead I got several 
> crashes apparently due to failed assertions in the lib. The stacktraces look 
> like this:
> 
> #0  0xffffe410 in __kernel_vsyscall ()
> (gdb) backtrace
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0xb711d885 in raise () from /lib/tls/i686/cmov/libc.so.6
> #2  0xb711f002 in abort () from /lib/tls/i686/cmov/libc.so.6
> #3  0xb7117318 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> #4  0xb6731e03 in _gpgme_ath_mutex_lock (lock=0x0) at ath.c:71
> #5  0xb6741e2f in _gpgme_sema_cs_enter (s=0xb674db40) at posix-sema.c:48
> #6  0xb673bd3b in _gpgme_engine_info_copy (r_info=0x0) at engine.c:225
> #7  0xb6743070 in gpgme_new (r_ctx=0x0) at gpgme.c:58
> #8  0xb732e9f3 in MyWrapperClass (this=0xb1782768) at MyWrapperClass.cc:187
> 
> It still doesn't crash all the time though. It mostly works so I think it's 
> some strange race condition.
> Maybe this helps.

I don't trust the "lock=0x0" parameter in the debug output, it is
clearly bogus which indicates optimization (likely) or a corrupt stack
(less likely).  If you look at posix-sema.c's _gpgme_sema_cs_enter()
implementation, you will see that it just adds OFFSETOF(struct
critsect_s, private) to the S input parameter, which is 0xb674db40
above.  So I assume that this is what it actually is, because
otherwise you would get a segmentation fault and not an assertion
failure.

It would be important to know if any other thread is in GPGME at the
same time.  If you serialized all calls into GPGME, we would expect
that not to be the case of course.

The non-threaded version of GPGME has some rudimentary error checking:
It makes the same mutex calls as the threaded version, but just checks
if the locks are taken and released properly.  This can catch some
simple bugs where locks are not unlocked when they should be or used
after they are destroyed.

The above assertion failure means that it was attempted to take the
engine_info_lock in engine.c twice without unlocking it inbetween.

I have looked through all the users of this lock in engine.c, and I
can't see an error in the LOCK/UNLOCK order.  We also never heard
about a similar assertion failure in a single-threaded program.  This
can mean that there is a bug in GPGME's engine.c here that we don't
know about, or that your wrapper class fails to properly synchronize
all calls into GPGME.  You may want to put a watch point on the memory
value of engine.c's engine_info_lock's PRIVATE member, which is a void
pointer that becomes (void *) 0 if it is unlocked and (void *) 1 if it
is locked.  I am not sure that watch points help find a problem though
if lack of memory barrier is an issue here, though.

Frankly, aside from a "spurious writes to random memory locations"
(and accidentially hitting the above PRIVATE member), I can not
imagine what might cause this assertion failure if all calls into
GPGME are properly serialized.  It's a mystery.

Mmh.  There is one issue in GPGME which may not be handled properly.
We set a null handler for SIGPIPE at initialization time, and one
should make sure that this is also the case for all GPGME-using
threads.  You may want to check if this could be a potential cause for
trouble in your application.  I would expect this to show up
differently though.

Thanks,
Marcus