[gpgme] fork() problem

Wed Feb 21 23:26:33 CET 2007

At Wed, 21 Feb 2007 11:42:02 +0100,
Stephan Menzel <smenzel at gmx-gmbh.de> wrote:
> 
> [1  <multipart/signed (7bit)>]
> [1.1  <text/plain; iso-8859-1 (quoted-printable)>]
> Hi Marcus,
> 
> Am Mittwoch, 21. Februar 2007 01:56:07 schrieb Marcus Brinkmann:
> > Did you serialize *all* calls into GPGME properly, with memory barrier
> > (a single global mutex locked around all calls should do the trick)?
> 
> Yes and no.
> I just doublechecked it once again to make sure. I think I can say that it's 
> impossible for two calls into the lib to happen simultaneously. It's all 
> protected by a scoped_lock around each of the objects. 
> So I don't have global mutexes. The wrapper objects itself are boost_mutexes 
> and lock within themselfes which if prefer.
> However, it is theoretically possible for subsequent calls to happen to 
> different contexts (objects for me). Meaning something like this 
> (simplified):
> 
> GPGObject a();       // does initialization routines using gpgme_new()
> GPGObject b();
> a->verify();
> b->verify();
> 
> That would mean context switches to the engine. I think this not too likely 
> but possible. Btw, all coredumps I got seemed to happen within gpgme_new(). 
> None happening within the actual verify. But I was instanciating a bit 
> unnessecarily at times, so this could be explained. I'm not any more though.

I don't know what a scope lock or a boost mutex is.  I am also
confused, as it seems to me you are now talking again about your
original implementation, and not what I understood to be a second
version which serialized all calls into GPGME.

I am referring to this one:

"I played around a bit and I tried to wrap my own mutex around the whole thing 
 and disable it's internal locking by linking against libgpgme.so.11 instead 
 of libgpgme-pthread.so.11 even though it's a multithreaded app."

Clearly, you can not expect this to work with concurrent access even
to different objects.  GPGME has internal locking, and that needs to
work if there is *any* concurrency.  If you disable internal locking
by linking to libgpgme.so.11 rather than libgpgme-pthread.so.11, then
you have to serialize all calls into GPGME properly.

> > > #0  0xffffe410 in __kernel_vsyscall ()
> > > (gdb) backtrace
> > > #0  0xffffe410 in __kernel_vsyscall ()
> > > #1  0xb711d885 in raise () from /lib/tls/i686/cmov/libc.so.6
> > > #2  0xb711f002 in abort () from /lib/tls/i686/cmov/libc.so.6
> > > #3  0xb7117318 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> > > #4  0xb6731e03 in _gpgme_ath_mutex_lock (lock=0x0) at ath.c:71
> > > #5  0xb6741e2f in _gpgme_sema_cs_enter (s=0xb674db40) at posix-sema.c:48
> > > #6  0xb673bd3b in _gpgme_engine_info_copy (r_info=0x0) at engine.c:225
> > > #7  0xb6743070 in gpgme_new (r_ctx=0x0) at gpgme.c:58
> > > #8  0xb732e9f3 in MyWrapperClass (this=0xb1782768) at
> > > MyWrapperClass.cc:187
> > >
> > > It still doesn't crash all the time though. It mostly works so I think
> > > it's some strange race condition.
> > > Maybe this helps.
> >
> > I don't trust the "lock=0x0" parameter in the debug output, it is
> > clearly bogus which indicates optimization (likely) or a corrupt stack
> > (less likely). 
> 
> Of course. I often get stuff like this and it's never to be trusted. We 
> use -O2 btw.
> 
> > above.  So I assume that this is what it actually is, because
> > otherwise you would get a segmentation fault and not an assertion
> > failure.
> 
> Yes. I didn't get any of those. All crashes I noticed were sig6.
> 
> > The non-threaded version of GPGME has some rudimentary error checking:
> > It makes the same mutex calls as the threaded version, but just checks
> > if the locks are taken and released properly.  This can catch some
> > simple bugs where locks are not unlocked when they should be or used
> > after they are destroyed.
> 
> This is ath.c right?

Yes.

> > The above assertion failure means that it was attempted to take the
> > engine_info_lock in engine.c twice without unlocking it inbetween.
> 
> I patched around a bit and at times had a version running with this mutex 
> removed altogether. I tried to rely on my own mutex instead. I tried this and 
> linking against the non-pthread version.
> The result was that I didn't get thos crashes around this mutex 
> (engine_info_lock) anymore but different ones. I just looked, but don't have 
> the stacktraces anymore :-(
> 
> > Frankly, aside from a "spurious writes to random memory locations"
> > (and accidentially hitting the above PRIVATE member), I can not
> > imagine what might cause this assertion failure if all calls into
> > GPGME are properly serialized.  It's a mystery.
> 
> It sure is to me.
> I looked briefly into it too but I came to the same conclusion. However, since 
> we valgrinded the daemon a lot I just trust that we don't have any messing 
> around in it's mem like you describe. Given our use cases that would be quite 
> desastrous and I really think we would have already noticed that. Segfault 
> crashes would have to result.

Did you valgrind the version using GPGME that produced the failures?
Again, I am not sure I can follow which versions you mean by any of
your references.  We can not have any hope making progress by
following several variants at the same time.

> > Mmh.  There is one issue in GPGME which may not be handled properly.
> > We set a null handler for SIGPIPE at initialization time, and one
> > should make sure that this is also the case for all GPGME-using
> > threads.  You may want to check if this could be a potential cause for
> > trouble in your application.  I would expect this to show up
> > differently though.
> 
> Could I with my limited time do anything to prove that right or wrong?

You could look at how your application installs and uses signal
handlers for the various threads.  But I think it is unlikely that
this is relevant to the primary concerns we have here.

Thanks,
Marcus