trustdb locking

NIIBE Yutaka gniibe at
Mon Jun 13 04:34:52 CEST 2016


We have this bug in 1.4, 2.0, and 2.1.  My intention is to fix
in 2.1 and backporting fix to 1.4 and 2.0.

On 06/09/2016 05:12 PM, NIIBE Yutaka wrote:
> In the issue 1675, we handle trustdb locking:
> I had identified a race condition for creation of trustdb.gpg.  This
> was fixed last year.  However, the problem of trustdb corruption has
> not gone yet.
>     The serialization of newly creating hash table in the function
>     create_hashtable (tdbio.c).  <--- I think this is the issue now.
>     When two processes race for the position of end of file by lseek
>     (db_fd, 0, SEEK_END), it might result corrupted trustdb.  A
>     process which comes later will also create a record for hash table
>     at the end of file at later time, but the block will be
>     overwritten by another process which comes first.

While this is a race condition (WRITE vs. WRITE), I think I finally
identified another race condition (WRITE vs. READ) which can reproduce
same result of issue1675.

In 2.1, I managed to reproduce it by following steps.

(0) We will use the file SHA256SUMS.gpg and SHA256SUMS to verify the
    signature as the report of issue1675 does.  Get the files from:
    (Any signature works, here we use same files as the report)

(1) Make a temporal homedir environment.

    $ mkdir /tmp/gpghome; chmod og-rwx /tmp/gpghome

(2) Prepare a key for verification (any key works, here we use same
    key as the report).

    $ gpg2-1-12 --homedir=/tmp/gpghome
--recv-key 40976EAF437D05B5

    Then, we have pubring.kbx and trustdb.gpg of 1200-byte.
    Here, trustdb.gpg is composed by version record and hash table.

    To reproduce the bug, we setup artificial situation.

    $ rm /tmp/gpghome/trustdb.gpg
    $ mv /tmp/gpghome/pubring.kbx /tmp/gpghome/pubring.kbx.bak

(3) Let GPG make a VERSION record-only of trustdb.gpg by -k command
    with no-key.

    $ gpg2-1-12 --homedir=/tmp/gpghome -k

    Then, restore the public key

    $ mv  /tmp/gpghome/pubring.kbx.bak /tmp/gpghome/pubring.kbx

(4) Now, it has 40-byte trustdb.gpg, which is VERSION record only.
    ls -l /tmp/gpghome shows like:

    -rw-r--r-- 1 gniibe gniibe 10059 Jun 13 11:04 pubring.kbx
    -rw------- 1 gniibe gniibe    40 Jun 13 11:06 trustdb.gpg

(5) In this situation invoke GPG under GDB.

    $ gdb /path-to/gpg2-1-12

    Then, let GDB have a break point at write_cache_item, and run GPG.

    (gdb) break write_cache_item
    (gdb) run --homedir=/tmp/gpghome --verify SHA256SUMS.gpg SHA256SUMS

    Then, we see:

    Breakpoint 1, write_cache_item (r=r at entry=0xbf380)
        at ../../gnupg/g10/tdbio.c:201
    201	  if (lseek (db_fd, r->recno * TRUST_RECORD_LEN, SEEK_SET) == -1)
    (gdb) list
    196	write_cache_item (CACHE_CTRL r)
    197	{
    198	  gpg_error_t err;
    199	  int n;
    201	  if (lseek (db_fd, r->recno * TRUST_RECORD_LEN, SEEK_SET) == -1)
    202	    {
    203	      err = gpg_error_from_syserror ();
    204	      log_error (_("trustdb rec %lu: lseek failed: %s\n"),
    205	                 r->recno, strerror (errno));
    (gdb) print *r
    $67 = {next = 0xbf348, flags = {used = 1, dirty = 1}, recno = 0,
      data = "\001gpg\003\003\001\005\001\002\000\000W^\025\032", '\000'
<repeats 23 times>, "\001"}

    The version record is now updating.  Let GDB continue the execution
    of GPG.

    (gdb) cont

    Breakpoint 1, write_cache_item (r=r at entry=0xbf348)
        at ../../gnupg/g10/tdbio.c:201
    201	  if (lseek (db_fd, r->recno * TRUST_RECORD_LEN, SEEK_SET) == -1)
    (gdb) print *r
    $69 = {next = 0xbf310, flags = {used = 1, dirty = 1}, recno = 29,
data = "\n", '\000' <repeats 38 times>}

    After the update of the version record, now GPG is writing to the
    hash table.

    Note that the order of writing data to disk.  It updates the
    version record first, then, hash table.  When let it go further
    step, it will be more clear.

    (gdb) cont

    Breakpoint 3, write_cache_item (r=r at entry=0xbf310)
        at ../../gnupg/g10/tdbio.c:201
    201	  if (lseek (db_fd, r->recno * TRUST_RECORD_LEN, SEEK_SET) == -1)
    (gdb) print *r
    $71 = {next = 0xc7f00, flags = {used = 1, dirty = 1}, recno = 28,
data = "\n", '\000' <repeats 38 times>}

    The RECNO is, 0 for the version record, then 1...29 for the hash
    table.  It goes from 29 to 1.

    Here (after the write to the version record), we keep stopping the
    GPG process, and we invoke another GPG command.

(6) In another terminal, invoke GPG, then, we got the error.

    $ gpg2-1-12 --homedir=/tmp/gpghome --verify SHA256SUMS.gpg SHA256SUMS
    gpg: Signature made Tue 24 Apr 2012 04:52:09 AM JST using DSA key ID
    gpg: 12: read expected rec type 10, got 0
    gpg: lookup_hashtable failed: Trust DB error
    gpg: trustdb: searching trust record failed: Trust DB error
    gpg: Error: The trustdb is corrupted.
    gpg: You may try to re-create the trustdb using the commands:
    gpg:   cd ~/.gnupg
    gpg:   gpg --export-ownertrust > otrust.tmp
    gpg:   rm trustdb.gpg
    gpg:   gpg --import-ownertrust < otrust.tmp
    gpg: If that does not work, please consult the manual

It fails because the hash table is currently being written by the GPG
process under GDB (holding the write lock).


More information about the Gnupg-devel mailing list