[gpgme] fork() problem

Wed Feb 21 11:42:02 CET 2007

Hi Marcus,

Am Mittwoch, 21. Februar 2007 01:56:07 schrieb Marcus Brinkmann:
> Did you serialize *all* calls into GPGME properly, with memory barrier
> (a single global mutex locked around all calls should do the trick)?

Yes and no.
I just doublechecked it once again to make sure. I think I can say that it's 
impossible for two calls into the lib to happen simultaneously. It's all 
protected by a scoped_lock around each of the objects. 
So I don't have global mutexes. The wrapper objects itself are boost_mutexes 
and lock within themselfes which if prefer.
However, it is theoretically possible for subsequent calls to happen to 
different contexts (objects for me). Meaning something like this 
(simplified):

GPGObject a();       // does initialization routines using gpgme_new()
GPGObject b();
a->verify();
b->verify();

That would mean context switches to the engine. I think this not too likely 
but possible. Btw, all coredumps I got seemed to happen within gpgme_new(). 
None happening within the actual verify. But I was instanciating a bit 
unnessecarily at times, so this could be explained. I'm not any more though.

> > #0  0xffffe410 in __kernel_vsyscall ()
> > (gdb) backtrace
> > #0  0xffffe410 in __kernel_vsyscall ()
> > #1  0xb711d885 in raise () from /lib/tls/i686/cmov/libc.so.6
> > #2  0xb711f002 in abort () from /lib/tls/i686/cmov/libc.so.6
> > #3  0xb7117318 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> > #4  0xb6731e03 in _gpgme_ath_mutex_lock (lock=0x0) at ath.c:71
> > #5  0xb6741e2f in _gpgme_sema_cs_enter (s=0xb674db40) at posix-sema.c:48
> > #6  0xb673bd3b in _gpgme_engine_info_copy (r_info=0x0) at engine.c:225
> > #7  0xb6743070 in gpgme_new (r_ctx=0x0) at gpgme.c:58
> > #8  0xb732e9f3 in MyWrapperClass (this=0xb1782768) at
> > MyWrapperClass.cc:187
> >
> > It still doesn't crash all the time though. It mostly works so I think
> > it's some strange race condition.
> > Maybe this helps.
>
> I don't trust the "lock=0x0" parameter in the debug output, it is
> clearly bogus which indicates optimization (likely) or a corrupt stack
> (less likely). 

Of course. I often get stuff like this and it's never to be trusted. We 
use -O2 btw.

> above.  So I assume that this is what it actually is, because
> otherwise you would get a segmentation fault and not an assertion
> failure.

Yes. I didn't get any of those. All crashes I noticed were sig6.

> The non-threaded version of GPGME has some rudimentary error checking:
> It makes the same mutex calls as the threaded version, but just checks
> if the locks are taken and released properly.  This can catch some
> simple bugs where locks are not unlocked when they should be or used
> after they are destroyed.

This is ath.c right?

> The above assertion failure means that it was attempted to take the
> engine_info_lock in engine.c twice without unlocking it inbetween.

I patched around a bit and at times had a version running with this mutex 
removed altogether. I tried to rely on my own mutex instead. I tried this and 
linking against the non-pthread version.
The result was that I didn't get thos crashes around this mutex 
(engine_info_lock) anymore but different ones. I just looked, but don't have 
the stacktraces anymore :-(

> Frankly, aside from a "spurious writes to random memory locations"
> (and accidentially hitting the above PRIVATE member), I can not
> imagine what might cause this assertion failure if all calls into
> GPGME are properly serialized.  It's a mystery.

It sure is to me.
I looked briefly into it too but I came to the same conclusion. However, since 
we valgrinded the daemon a lot I just trust that we don't have any messing 
around in it's mem like you describe. Given our use cases that would be quite 
desastrous and I really think we would have already noticed that. Segfault 
crashes would have to result.

> Mmh.  There is one issue in GPGME which may not be handled properly.
> We set a null handler for SIGPIPE at initialization time, and one
> should make sure that this is also the case for all GPGME-using
> threads.  You may want to check if this could be a potential cause for
> trouble in your application.  I would expect this to show up
> differently though.

Could I with my limited time do anything to prove that right or wrong?

many Thanks and Greetings....

Stephan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : /pipermail/attachments/20070221/a2984329/attachment.pgp