running all the tests

Mon Aug 7 23:12:54 CEST 2017

On Mon 2017-08-07 20:32:11 +0200, Werner Koch wrote:
> Let's get some facts: My development box is an X220 with 8 GiB RAM, an
> SSD, and a i5-2410M CPU @ 2.30GHz.

mine is also an x220 with an ssd, but with 16GiB of RAM and only a
marginally faster CPU (i5-2450M CPU @ 2.50GHz).

> configure run not timed, "make clean" between tests, no other large
> jobs running.

did you run "make check" before you ran "time make -j3 check" ?  if not,
that would account for your higher level of CPU usage -- you'd be
measuring building the code (both the actual production suite, as well
as the test binaries), not just running the tests.  I figure since this
is a discussion about the cost of the test suite, we should focus on the
test suite specifically -- we don't expect anything else to change as a
result of this change.

> Sure, that is pretty fast.  But a quick wc shows 90 gpgscm processes and
> three dozen gpg-agents.  I'd call that a resource hog - which might be
> acceptable for a developer but I doubt that build daemons like that.

the build daemons have seen so much worse than you don't even want to
know about it. Try looking at the thunderbird build, or at libreoffice
if you're into this sort of thing ;)

This is the work that build daemons are made to do.  If it means better
coverage for an important suite like GnuPG, i don't think anyone would
begrudge their use.

I'm pretty curious about the differences in our timings, though, and i'd
like to understand why they're different.

To rule out the effect of the disk, i've re-run my test in a tmpfs, with
swap disabled, and in a newly empty directory.  so that looks like,
starting from scratch:

    git clone https://dev.gnupg.org/source/gnupg.git
    cd gnupg
    ./autogen.sh
    ./configure --sysconfdir=/etc --enable-maintainer-mode
    make
    make check
    time make check TESTFLAGS=--parallel
    time make check

(i realized that the -jN argument for make is irrelevant here, that
TESTFLAGS=--parallel will try to consume all your CPU if you let it
regardless of how you set -j)

During most of the parallel build, my CPU was pegged -- 0 for id,wa,st
in vmstat, all the time spent in us,sy

The timings i saw were:

parallel:
real	1m11.491s
user	2m6.226s
sys	0m8.157s

On the other hand, the non-parallel build left my quad-core CPU between
75% and 94% idle.

non-parallel:
real	7m10.321s
user	2m0.721s
sys	0m16.063s

Note that "time" claims both runs used about 135 seconds of CPU time.

However, i suspect that gpg-agent is daemonizing itself, and therefore
escaping the purview of the shell's "time" builtin.  And a large chunk
of the CPU consumed by the test is probably in gpg-agent itself, though
i'm not sure.

But the wall clock numbers don't lie, nor does the CPU idle statistics.
If the concern is that the tests are too slow, it sounds like we should
just encourage the use of TESTFLAGS=parallel, rather than disabling the
tests.  Alternately, we could figure out why the non-parallel test
doesn't at least peg *one* of the CPU cores continuously.  That'll
probably improve user experience latency as well :)

Werner, i'm surprised that the stats from your tests were so different
from mine.  can you try the approach i've outlined above and report
back?  I'm running debian testing/unstable.

happy hacking,

       --dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: </pipermail/attachments/20170807/e05e6cd4/attachment.sig>