benchmarking security tokens speed

Thu Sep 7 19:27:48 CEST 2017

On Sat Aug 26 08:34:25 UTC 2017, Szczepan Zalega | Nitrokey wrote:
> Hi!

Hi!

[note that I didn't see your reply because you didn't add me in CC - I
forgot to mention i wasn't registered to the mailing lists... ]

> Nice initiative! It is good you have a script already. The more it is
> automated the better.
> To make the tests reproducible please give environment details: OS name,
> bits and version, GPG version and from hardware side firmware versions
> of used devices.

Yeah, here are the details:

 * Debian 9 ("stretch"/stable) amd64
 * Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
 * GnuPG 2.1.18-6 (from the stable Debian package)
 * Nitrokey PRO 0.8 (latest)
 * FST-01, running Gnuk version 1.2.5 (latest)
 * Yubikey NEO 3.4.3 (not upgradable?)
 * Yubikey 4 4.2.6 (not upgradable?)

Good enough? :)

> I would also remove the `pv` from pipeline since it does its own
> buffering and could influence the test results. The tests should be done
> on ramdisk (/dev/shm etc) to exclude disk access sharing with OS - with
> so small times this is a necessity.

I rewrote the shell scripts in Python which now shells out to gpg and
drops the output. We still use an on-disk file, but it's small enough
(16 bytes, ie. one AES-128 block) to not be a significant overhead - it
will end up in the VM cache soon enough, and there's an extra run before
the real benchmark to make sure that's the case.

> Why not using PKCS#11 directly (and measure real RSA speed of the
> device, since AES is done in CPU anyway) instead of blackboxing the
> GPG?

Because that's harder to implement and there's only so much time I can
spend implementing new software that is basically throwaway once I'm
done making those graphs.

Besides, gpg is surprisingly fast, I must say. We can see the overhead
due to the gpg calls in the "CPU" column in the graph - it's basically
insignificant compared to the communication and decryption time from the
card, at this stage.

A feedback I heard from another reviewer was that I should communicate
directly with gnupg-agent, for what it's worth. I could also skip the
Linux kernel USB stack to speed things up, for example. This misses the
point: there is always an overhead in the various tests, even if it's
only the Python interpretor or the CPU pipelines and caches. The point
is that this is kept constant across tests for the multiple devices so
we can have comparable data points.

> How many runs have you done for each device?

100

> Have you removed the outliers?

No, but the graphs show the standard deviation of the samples, which
seems to be statistically insignificant. Only in the case of the FST-01
doing RSA-4096 operations do we even see it at all.

> You can also try to compare the results with another benchmarking tool,
> like graphene-cli [1]. They have some test results already but I cannot
> find it right now.
> 
> [1] https://github.com/PeculiarVentures/graphene-cli

Well crap - I didn't even know there *was* such a tool. It's good to
know, but that thing looks much harder to use than my current
configuration. Results from the README file do look comparable to the
performance of the Yubikey 4, however, so I'm confident that I can
continue with my current approach.

Unless, of course, someone provides patches to skip GnuPG and talk
directly to PKCS#11 or whatever acronym you feel is better suited to
produce relevant metrics. ;)

I published the script and updated benchmarks here, now with Nitrokey
results:

https://gitlab.com/anarcat/crypto-bench/

This will be the object of a coming article about OpenPGP best
practices, including offline certification key storage and, of course,
crypto tokens magic. Any recommendations or reviewers would obviously be
welcome so that I don't look too much like a dork. ;)

Thanks for the feedback everyone!

A.

-- 
Antoine Beaupré
LWN.net