IBM to Provide Security w/o Sacrificing Privacy Using Hash Functions

Thu May 26 16:34:07 CEST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 25 May 2005, Alex L. Mauer wrote:
> Florian Weimer wrote:
> > * Sean C.:
> >>The I.B.M. software would convert data on a person into a string of seemingly
> >>random characters, using a technique known as a one-way hash function. No
> >>names, addresses or Social Security numbers, for example, would be embedded
> >>within the character string.
> > For most applications, this is just a speed bump because the search
> > space is rather small.  It's even worse for the no-fly list because
> > you have to apply some data reduction first (think SOUNDEX): a lot of
> > the names on them have varying transliteration.
>
> Can you expand on this?
>
> How could the Name/address/ssn be retrieved from a hash of the same?

Organization A know a name and the hash they calculated from it.
Organization B know a name and the hash they calculated from it.  If the
hashes match, either A or B can request from B resp. A the plaintext
corresponding to the ordinal of the hash record that matched, to verify
the hit.  Now A and B share the plaintext.  The plaintext is not recovered
from the hash; it's requested from the entity which has it, using the hash
to find it.

The whole point of using a hash is to make it extremely unlikely that
either party could recover the plaintext unilaterally.  It's like having a
vault with two different locks, and giving the keys to two different
people, to make abuse more difficult by requiring collusion for a
successful penetration.

> How would data reduction be necessary?  Couldn't everything be
> represented in Unicode?  Of course, that doesn't solve the
> transliteration problem, but then again it's no different than the
> status quo in that respect ("Alex Mauer" != "Aleks Mauer")

It's worse than that.  I don't know of anybody who spells his name
"Aleks", but both "Yuri" and "Yuriy" are in use, not to mention (usually
from another part of the world) "Uri".  Likewise both "Mark" and "Marc"
are common.  It doesn't have to be an error to be a false mismatch.

If I understand what e.g. Soundex does, it should be possible to compare
hashes of Soundex-coded strings in order to reduce the incidence of false
mismatches.

- -- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Open-source executable:  $0.00.  Source:  $0.00  Control:  priceless!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: pgpenvelope 2.10.2 - http://pgpenvelope.sourceforge.net/

iD8DBQFCld5is/NR4JuTKG8RAqsqAKCXvFZw/mOM8GgknyYoUjSGl9CQWACfd19L
j0DKGl/aUDNSQbJPKifORzQ=
=Ebbn
-----END PGP SIGNATURE-----