[gpgme] fork() problem
marcus.brinkmann at ruhr-uni-bochum.de
Wed Feb 21 23:26:33 CET 2007
At Wed, 21 Feb 2007 11:42:02 +0100,
Stephan Menzel <smenzel at gmx-gmbh.de> wrote:
> [1 <multipart/signed (7bit)>]
> [1.1 <text/plain; iso-8859-1 (quoted-printable)>]
> Hi Marcus,
> Am Mittwoch, 21. Februar 2007 01:56:07 schrieb Marcus Brinkmann:
> > Did you serialize *all* calls into GPGME properly, with memory barrier
> > (a single global mutex locked around all calls should do the trick)?
> Yes and no.
> I just doublechecked it once again to make sure. I think I can say that it's
> impossible for two calls into the lib to happen simultaneously. It's all
> protected by a scoped_lock around each of the objects.
> So I don't have global mutexes. The wrapper objects itself are boost_mutexes
> and lock within themselfes which if prefer.
> However, it is theoretically possible for subsequent calls to happen to
> different contexts (objects for me). Meaning something like this
> GPGObject a(); // does initialization routines using gpgme_new()
> GPGObject b();
> That would mean context switches to the engine. I think this not too likely
> but possible. Btw, all coredumps I got seemed to happen within gpgme_new().
> None happening within the actual verify. But I was instanciating a bit
> unnessecarily at times, so this could be explained. I'm not any more though.
I don't know what a scope lock or a boost mutex is. I am also
confused, as it seems to me you are now talking again about your
original implementation, and not what I understood to be a second
version which serialized all calls into GPGME.
I am referring to this one:
"I played around a bit and I tried to wrap my own mutex around the whole thing
and disable it's internal locking by linking against libgpgme.so.11 instead
of libgpgme-pthread.so.11 even though it's a multithreaded app."
Clearly, you can not expect this to work with concurrent access even
to different objects. GPGME has internal locking, and that needs to
work if there is *any* concurrency. If you disable internal locking
by linking to libgpgme.so.11 rather than libgpgme-pthread.so.11, then
you have to serialize all calls into GPGME properly.
> > > #0 0xffffe410 in __kernel_vsyscall ()
> > > (gdb) backtrace
> > > #0 0xffffe410 in __kernel_vsyscall ()
> > > #1 0xb711d885 in raise () from /lib/tls/i686/cmov/libc.so.6
> > > #2 0xb711f002 in abort () from /lib/tls/i686/cmov/libc.so.6
> > > #3 0xb7117318 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> > > #4 0xb6731e03 in _gpgme_ath_mutex_lock (lock=0x0) at ath.c:71
> > > #5 0xb6741e2f in _gpgme_sema_cs_enter (s=0xb674db40) at posix-sema.c:48
> > > #6 0xb673bd3b in _gpgme_engine_info_copy (r_info=0x0) at engine.c:225
> > > #7 0xb6743070 in gpgme_new (r_ctx=0x0) at gpgme.c:58
> > > #8 0xb732e9f3 in MyWrapperClass (this=0xb1782768) at
> > > MyWrapperClass.cc:187
> > >
> > > It still doesn't crash all the time though. It mostly works so I think
> > > it's some strange race condition.
> > > Maybe this helps.
> > I don't trust the "lock=0x0" parameter in the debug output, it is
> > clearly bogus which indicates optimization (likely) or a corrupt stack
> > (less likely).
> Of course. I often get stuff like this and it's never to be trusted. We
> use -O2 btw.
> > above. So I assume that this is what it actually is, because
> > otherwise you would get a segmentation fault and not an assertion
> > failure.
> Yes. I didn't get any of those. All crashes I noticed were sig6.
> > The non-threaded version of GPGME has some rudimentary error checking:
> > It makes the same mutex calls as the threaded version, but just checks
> > if the locks are taken and released properly. This can catch some
> > simple bugs where locks are not unlocked when they should be or used
> > after they are destroyed.
> This is ath.c right?
> > The above assertion failure means that it was attempted to take the
> > engine_info_lock in engine.c twice without unlocking it inbetween.
> I patched around a bit and at times had a version running with this mutex
> removed altogether. I tried to rely on my own mutex instead. I tried this and
> linking against the non-pthread version.
> The result was that I didn't get thos crashes around this mutex
> (engine_info_lock) anymore but different ones. I just looked, but don't have
> the stacktraces anymore :-(
> > Frankly, aside from a "spurious writes to random memory locations"
> > (and accidentially hitting the above PRIVATE member), I can not
> > imagine what might cause this assertion failure if all calls into
> > GPGME are properly serialized. It's a mystery.
> It sure is to me.
> I looked briefly into it too but I came to the same conclusion. However, since
> we valgrinded the daemon a lot I just trust that we don't have any messing
> around in it's mem like you describe. Given our use cases that would be quite
> desastrous and I really think we would have already noticed that. Segfault
> crashes would have to result.
Did you valgrind the version using GPGME that produced the failures?
Again, I am not sure I can follow which versions you mean by any of
your references. We can not have any hope making progress by
following several variants at the same time.
> > Mmh. There is one issue in GPGME which may not be handled properly.
> > We set a null handler for SIGPIPE at initialization time, and one
> > should make sure that this is also the case for all GPGME-using
> > threads. You may want to check if this could be a potential cause for
> > trouble in your application. I would expect this to show up
> > differently though.
> Could I with my limited time do anything to prove that right or wrong?
You could look at how your application installs and uses signal
handlers for the various threads. But I think it is unlikely that
this is relevant to the primary concerns we have here.
More information about the Gnupg-devel