[GPGME PATCH] core: Restore get_max_fds optimization on Linux

Tue Sep 19 17:42:22 CEST 2017

On Sat 2017-09-16 04:17:19 +0100, Colin Watson wrote:
> On Sat, Sep 16, 2017 at 04:16:44AM +0100, Colin Watson wrote:
>> * src/posix-io.c (get_max_fds): Restore Linux optimization, this time
>> using open/getdents/close rather than opendir/readdir/closedir.
>> --
>> 
>> opendir/readdir/closedir may allocate/free memory, and aren't required
>> to do so in an async-signal-safe way.  On the other hand, opening
>> /proc/self/fd directly and iterating over it using getdents is safe.
>> 
>> (getdents is not strictly speaking documented to be async-signal-safe
>> because it's not in POSIX.  However, the Linux implementation is
>> essentially just a souped-up read.  Python >= 3.2.3 makes the same
>> assumption.)
>
> Incidentally, on my system, this reduces the time required for "make -C
> tests/gpg -j4 check" from 42 seconds to 6 seconds.  How dramatic this
> difference is will of course depend on the value of the hard limit for
> RLIMIT_NOFILE ("ulimit -Hn"), which on my system is 1048576 by default,
> although I haven't bothered to hunt down where that default is set; with
> a more traditional hard limit of 4096 the difference is negligible.

I support adoption of this patch in GPGME upstream.  It addresses a
pretty severe performance degradation in GPGME on Linux platforms that
was introduced between version 1.8.0 and 1.9.0.

Colin and i most likely use similar systems or , but i've got 1Mi as the
RLIMIT_NOFILE as well.

This patch looks good to me, since it reduces the number of spurious
close() calls invoked at every _gpgme_io_spawn by about 5 orders of
magnitude (from ~1M to ~10).

In particular, on my Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz CPU
running Linux 4.12, my system consumes over 60ms doing 1Mi spurious
close() calls (about half in the kernel, half in userland), as compared
with 2ms or 3ms for doing basic process initialization itself.  That's a
lot of unnecessary overhead!

Another data point
------------------

i ran "notmuch show --decrypt --format=json" on a thread of 16 encrypted
messages, with session keys already cached, built against gmime 3.0,
which uses gpgme.

without this patch, the above command took ~4.5s wall-clock time.  with
this patch, the same command took ~1.6s.  (fwiw, i still think 1.6s is
too long for a 16-message thread, esp. with no asymmetric crypto
involved, but i'm happy for the improvement)

source clarity
--------------

It might be nice to have a clearer comment above get_max_fds(), to
indicate that the goal is to return the value of the highest open file
descriptor, but on systems where that is not available, it just returns
the highest possible file descriptor value.

This particular patch won't help for processes which do something
perverse like:

  getrlimit(RLIMIT_NOFILE, &rl) || dup2(0, rl.rlim_max-1);

but it will help the overwhelming majority of processes.

If we find that such processes exist, we can optimize this further, but
this looks like an unmitigated win to me.

> I noticed this because some tests of the launchpad.net codebase that
> exercise GPGME were timing out in my test container.  I thought about
> capping our hard limit for RLIMIT_NOFILE to something more reasonable,
> and that might still make sense, but I think there's more global benefit
> in improving GPGME.

Agreed.  I've included this in debian for our Linux-based architectures
in gpgme 1.9.0-5.  I think it should really be adopted by upstream,
though.  It'd be great if there was something comparable for the BSD and
Hurd ports too, but that can come later.

Thanks for finding this and proposing the improvement, Colin!

    --dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: </pipermail/attachments/20170919/ba24ec84/attachment.sig>