full-text v. regular expression userid searches (was: Re: [svn] GnuPG - r3867 - trunk/keyserver)

Jason Harris jharris at widomaker.com
Fri Aug 19 15:49:19 CEST 2005


On Fri, Aug 19, 2005 at 12:35:22AM -0400, David Shaw wrote:
> On Fri, Aug 19, 2005 at 12:24:05AM -0400, Jason Harris wrote:

> > Supporting (e.g., POSIX) Regular Expression searches would be interesting,
> > both in GPG and (HKP) keyserver keyrings, but searching the 2545113+
> > userids (on 2209793+ keys) on (well-synchronized) keyservers could be
> > unacceptably slow.  The raw text is 99425942+ bytes (94.8+MB).
> 
> I don't know that it would be unusably slow: searching in a database
> is a very well researched problem and there are ways to speed things
> up. I seem to recall the old PGP LDAP server did quite well in
> searching, and it had quite a large number of keys to search through.
> Even though it wasn't synchronized with the HKP world, it was pretty
> up to date (being the default server for PGP).

It may have, but I just mentioned it (keyserver-legacy.pgp.com)
failing a partial-word search when it should have matched several keys,
so it might only be doing such searches from the beginning of words now.

Anyway, pks and SKS use Berkeley DB, which only allows partial matching
or range searching in sorted (Btree) dbs/tables from the beginning of
words/keys (via DBcursor->c_get (..., DB_SET_RANGE), in case anyone
wants to try it).  This is good for finding "david" as the first
available word after the (currently) non-existent "davic," for
example, but can't be used for other types of partial-word searches.

If you're thinking of a particular SQL feature, fine, but you'd have
to run CKS, onak, or OpenPKSD to get SQL.  (Also, please provide URLs
for any SQL/database feature(s) you are referring to.)

> Not that I'm suggesting regex searches.  I think they're overkill for
> the problem at hand.  Even LDAP doesn't do full regex.

Well, allowing "anchoring" of the (full-word) searches with ^ and $
sounds like it would be a good start,

> Which legacy LDAP server are you testing with?  PGP.com's old server
> is gone, and I think horowitz is currently broken: any search returns
> no responses.

keyserver-legacy.pgp.com.

-- 
Jason Harris           |  NIC:  JH329, PGP:  This _is_ PGP-signed, isn't it?
jharris at widomaker.com _|_ web:  http://keyserver.kjsl.com/~jharris/
          Got photons?   (TM), (C) 2004
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 313 bytes
Desc: not available
Url : /pipermail/attachments/20050819/8ef1e961/attachment.pgp


More information about the Gnupg-devel mailing list