running all the tests

Mon Aug 7 14:36:34 CEST 2017

hello GnuPG folks--

This morning i noticed b0112dbca91e720a4ff622ad0e88d99eba56203a on master:

    tests: Do not run all tests unless in maintainer mode.

    * configure.ac: Leak the maintainer mode flag into 'config.h'.
    * tests/gpgscm/ffi.c: Pass it into the scheme environment.
    * tests/openpgp/all-tests.scm: Only run tests against non-default
    configurations (keyring, extended-key-format) in maintainer mode.
    --

    Werner is concerned that the tests do take up too much time and asked
    me to reduce the runtime of the tests for normal users.

    Signed-off-by: Justus Winter <justus at g10code.com>

I sympathize with Werner's concern: the test suite does currently take a
long time.

But I'm a little worried about this resolution, because it means we'll
be less likely to get alerts about failure in many places (including the
debian build daemon network, which tests over a dozen platforms but does
not run the tests with --maintainer-mode).

I took a look at the timing of test suite with --maintainer-mode
enabled.  I built all the binaries, and then i ran all the tests while
timing them.  I ran this on a machine with 4 cores of Intel(R) Core(TM)
i5-2450M CPU @ 2.50GHz.  Here's what i saw:

$ make -j4 check && time make -j4 check
[ ... all tests pass ... ]
real	7m7.513s
user	1m56.696s
sys	0m14.821s
$ python3 -c 'print(1-(60+56.696+14.821)/(7*60+7.513))'
0.6923672496508878
$ 

So 69% of the time that the user is waiting for the test suite, the test
suite is not using any CPU.

If we could profile what the test suite is waiting on and fix those
waits, then we could shave off over 2/3rds of the time without losing
any test coverage.

It also looked to me like the test suite wasn't running jobs in
parallel.  If the test suite could be parallelized (i don't know whether
that's possible or if there's ) then on most modern computers it could
run significantly faster as well.

Do we know what the test suite is waiting on?   if the waits are in the
test suite, fixing them would let us run more tests in more places.  If
the waits are in GnuPG itself, then fixing them would give all GnuPG
users lower latency.  Either one seems like a good win.

Is there any prospect of doing that profiling work instead of just
disabling the tests?

I think the expanded and more extensive test suite has been one of the
major improvements in the modern branch of GnuPG and i'd hate to lose
its reach and coverage.

Regards,

        --dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: </pipermail/attachments/20170807/99fd5eb6/attachment.sig>