[SUGGESTION NEEDED] A request for suggestion on furthering the discussion over ElGamal

Tue Oct 17 05:42:33 CEST 2017

[Discussion of this problem] The original paper of El Gamal does not
mention this problem. (Same does the first paper for RSA)

In 1998, Professor Dan Boneh (IACR fellow, Stanford) published a paper
"The Decisional Diffie-Hellman Problem"
http://crypto.stanford.edu/~dabo/pubs/papers/DDH.pdf    on the page
9-10, it mentions the necessity of using DDH for ElGamal. The leakage
mentioned above is from that paper.

The Wikipedia (https://en.wikipedia.org/wiki/ElGamal_encryption)
talked about the DDH assumption and CDH assumption. DDH assumptions
(https://en.wikipedia.org/wiki/Decisional_Diffie%E2%80%93Hellman_assumption)
does not hold for $Z_p^*$ because it is not prime-order (the order is
$p-1$).

The formal treatment for DDH is to:
(1) Generate a safe prime $p=2q+1$.
(2) Find a generator $g$ for $Z_p^*$.
(3) Compute $h=g^2$. This is a QR. Use this as the generator for ElGamal.

This makes sure that $g^{xk}$ and $g^k$ are always QR, removing the
first part of the leakage.

To ensure the message leaks nothing, we need to make $m$ as a QR. This
is not trivial. Because $m$ is generally binary string. They are free
and not controlled. It might not be a QR or be a QR.

In CRYPTO'98, the highest conference of cryptography, Ronald Cramer
and Victor Shoup (IACR fellow) proposed a solution
(http://people.csail.mit.edu/alinush/6.857-spring-2015/papers/cramer-shoup.pdf):
(1) Given the message $m$ in $Z_p^*$.
(2) Compute the encoding $e=m^2$.
(3) Decoding is done by computing the unique square root of $e$ (by
powering $(q+1)/2$)

Then the ciphertexts under ElGamal now are all QR. The leakage from
that is fixed.

More than that, such a construction of ElGamal is proved to be
semantic secure if the DDH assumption holds.

[Other materials]

Lecture note of Public Key Encryption at UC Berkeley on Graduate
Cryptography (CS276). Instructor Alessandro Chiesa, one of the
inventors of zerocash and then zCash. (Page 2-3)
https://128.32.189.73/~alexch/docs/CS276-F2015/lecture-14.pdf

Lecture note of CS276 at UC Berkeley. Instructor Luca Trevisan, a
professor in theory computer science. (Page 3-4)
https://people.eecs.berkeley.edu/~luca/cs276/lecture17.pdf

Lecture note of CSE207 in UCSD. Instructor Mihir Bellare, an inventor
or RSA-OAEP and HMAC. (last paper)
http://cseweb.ucsd.edu/~mihir/cse207/w-asym.pdf

Lecture note of CS4380 in Cornell, Instructor Rafeal Pass. (Page 105-106)
http://www.cs.cornell.edu/courses/cs4830/2010fa/lecnotes.pdf

These materials all support the idea above.

[Proposed Solution] This is totally not invented by me. But the idea
merged of these notes and papers. But as I said, there is no standard
on how to fix ElGamal, because cryptography world originally has no
so-called standard-making organization.

[1] We choose the generator as a QR. $h=g^2$
[2] We encode the message by squaring it, and decode it back by
finding the unique square root (i.e. powering (q+1)/2). $m2=m^2$.
$m=(m_2)^((q+1)/2)$.

[Not Compatible] I agree with Niibe that we may be able to make it
coexisting. But it turns out that we need to make it clear in the
"key".

Because a ciphertext generated by the old version cannot be decoded
back (the unique square root is nonsense). And new ciphertexts cannot
be recovered by the old system because of the lack of decoding.

[My Suggestion]
 - Add ElGamal-DDH independent to ElGamal.
 - Add a comment in original ElGamal that "it is not semantic secure
so cannot be used for applications besides hybrid encryption".

The hybrid encryption use case is okay because CDH can hold. This is
proved by Dan Boneh in that year 2000 paper.

=========== END. Should I make a slide?

On Mon, Oct 16, 2017 at 7:46 PM, Weikeng Chen <w.k at berkeley.edu> wrote:
> Thank Niibe.
>
> Actually, my teacher Alessandro (an assistant professor of
> cryptography and theory) and I (a Ph.D. student on applied
> cryptography) have a brief discussion after the class. As far as we
> see, the current ElGamal implementation does not achieve semantic
> security. Niibe mentions that the high-level application which uses
> ElGamal in black-box may be okay because it uses for hybrid
> encryption.
>
> I agree because if ElGamal is not used for direct data encryption, but
> only in the hybrid encryption, then knowing one bit of the encryption
> key does not help to break the symmetric encryption ---- However, it
> is annoying for cryptographers. Because we require semantic security
> on all other cryptographic primitives, and the rest are implemented
> generally correct. And in non-hybrid encryption cases, it is
> problematic.
>
> So if the scope of using ElGamal only for hybrid encryptions, it
> should be totally okay. But what I add is:
> (1) If it is used to load message directly, it significantly leaks
> information of that message.
> (2) Due to the usage of ElGamal in homomorphic encryption, there are
> use cases where we cannot use hybrid encryption, so at this case, the
> messages are loaded directly.
> (3) gcrypt as the model of many cryptographic libraries is treated (at
> least people do) as a standard. It is part of the source of the
> massive problem in implementing ElGamal.
>
> Let me add some words on the semantic security.
>
>
> [Quick Recap of ElGamal] Assume the ElGamal private key is $x$, and
> the public key is $g^x$ where $g$ is a generator of $Z_p^*$.
>
> The encryption of message $m$ where $k$ is selected randomly:
>    -   First part of the ciphertext:               m g^{xk}
>    -   Second part of the ciphertext:           g^k
>
>
> [Semantic Security] This is the general security requirement for all
> symmetric and asymmetric ciphers. The scheme can ensure that a
> computer run under a polynomial function of the security parameter
> cannot learn anything information from the ciphertexts except the
> length with a non-negligible possibility in respect to the security
> parameter.
>
> Or, tweaking a sufficiently large security parameter (like 2048 bit
> for RSA today), nobody except supermen can learn anything except the
> length.
>
> The non-textbook version of RSA (with random padding) achieves
> semantic security. The symmetric ciphers under some encryption modes
>
>
> [Test Example] Assume we have $m_0$ and $m_1$. They are different.
>
>
> [Are They Quadratic Residues?] Half of the all possible $m$ accepted
> by ElGamal are Quadratic Residues, and those remaining are not
> Quadratic Residues. (So it leaks information worth of 1 bit). A number
> $m$ is QR. Then there is a value $x$ so that $m=x^2 mod p$. If a
> number $m$ is not QR, then such $x$ does not exist.
>
>
> [Possibility that $m_0$ and $m_1$ different in QR] The possibility
> that $m_0$ and $m_1$ are, not both QR and not both non-QR, are 1/2. In
> this case, we can distinguish $m_0$ and $m_1$ by whether or not QR.
>
>
> [What happens after encryption?] Let us play a game: assume I know
> whether or not your private key $x$ is odd or even (this is a
> reasonable assumption because the same asymmetric key is used
> repeatedly). Without loss of generality, I assume $x$ is odd, and
> $g^x$ is non-QR.
>
> You encrypt a message $m$. The ciphertexts are $c=(m g^{xk}, g^k)$.
>
> If $g^k$ is QR, then $g^{xk}$ is also a QR. If $g^k$ is non-QR, then
> $g^{xk}$ is also a non-QR.
>
> If $m g^{xk}$ is a QR, $g^k$ is QR                 =>       $m$ is a QR
> If $m g^{xk}$ is a non-QR,  $g^k$ is QR          =>       $m$ is a non-QR
> If $m g^{xk}$ is a QR, $g^k$ is non-QR           =>       $m$ is a non-QR
> If $m g^{xk}$ is a non-QR,  $g^k$ is non-QR    =>       $m$ is a QR
>
> You see. I can distinguish whether $m$ is QR or not even if it is
> encrypted -- the ciphertext leaks one-bit information about the
> underlying message.
>
> Another case. Let us assume $x$ is even and $g^x$ is QR. You encrypt a
> message $m$. The ciphertexts are $c=(m g^{xk}, g^k)$.
>
> Then if $m g^{xk}$ is QR, $m$ is QR.
> If $m g^{xk}$ is non-QR, $m$ is non-QR.
>
> It is even easier to guess this one-bit information of the plaintext
> from the ciphertext.
>
>
> [Summary] The implementation that uses $g$ as the generator of $Z_p^*$
> and uses no specific encoding of $m$, the ciphertext leaks one bit
> about the underlying plaintext.
>
>
> ================ First part ended. Next email will continue.
>
> On Mon, Oct 16, 2017 at 7:03 PM, NIIBE Yutaka <gniibe at fsij.org> wrote:
>> Hello,
>>
>> Weikeng Chen <w.k at berkeley.edu> wrote:
>>> I am going to continue my post last month on the ElGamal in gcrypt. I
>>> am sorry to say gcrypt is the beginning of many wrong implementations
>>> of ElGamal. Some Github repo owners use this to show their
>>> implementation is correct. It uses a safe prime but the generator is
>>> still wrong. This makes the ElGamal not semantic secure.
>>
>> Historically speaking, libgcrypt was factored out from GnuPG.  So, its
>> ElGamal API and the implementation are primarily intended to be used by
>> GnuPG.  In this context, it is perfectly OK.  I mean, "semantic
>> security" of the crypto system doesn't matter to the security of the
>> application.
>>
>>> Niibe has mentioned that ElGamal used for hybrid encryption in general
>>> (which would be secure), but we two agree that we should improve.
>>
>> I agreed that it make sense to improve the generator.  I don't think it
>> is mandatory.  I didn't mean introducing new or drastic change which may
>> impact existing usages.
>>
>>> Some programmers may use ElGamal for homomorphic encryption and
>>> computation over encrypted data.
>>
>> Interesting.  In that case, I wonder if current API of libgcrypt is
>> enough or good.  (I guess that... I don't think it's enough.)
>>
>>> (1) Showing the current ElGamal implementation is not secure would be
>>> easy, as Wikipedia and some 17 years ago papers have some information.
>>> If we consult expert cryptographers in the field. They can clearly say
>>> the current implementation is insecure. --------- Do you think an
>>> experiment or proof-of-concept will be necessary?
>>
>> If possible, please use careful wording.
>>
>> The API of crypto library which implements crypto system lacking
>> "semantic security" property does not automatically mean "insecure".
>> For a library, its usage by an application matters.
>>
>>> (2) Providing a corrected version is not trivial -- it is not just
>>> making the generator correct. We also need to embed the message (a
>>> binary string) into an element in a specific subgroup (and recovered
>>> later). There are many papers, and many years ago, to say some ideas
>>> about this encoding. One of that is my proposed solution.
>>
>> You are talking about new crypto protocol, using ElGamal crypto system,
>> don't you?
>>
>>> However, there is no official example. Even the original paper by El
>>> Gamal is not secure under this concern. Besides gcrypt, ---------- how
>>> to find a gold standard for ElGamal implementations? OpenSSL
>>> implements DH (also wrong but not a big deal in the real world,
>>> because it is for DH key exchange not ElGamal) ------ and one thing
>>> for sure, how to make it easy to follow in order to persuade a
>>> sufficiently large community to accept this change?
>>
>> It seems for me that you have some confusions among:
>>
>>         (1) ElGamal crypto system
>>         (2) crypto protocol which uses ElGamal crypto system
>>             --- e.g. OpenPGP
>>         (3) an implementation, in this case, partial implementation of (1)
>>             --- e.g. libcrypt
>>         (4) an application of (3) which implements (2)
>>             --- e.g. GnuPG
>>
>> I would admit that libgcrypt API is not enough to provide full features
>> of ElGamal crypto system, for example, as you pointed out, it doesn't
>> provide "semantically secure" use case.  In other words, I could say,
>> libgcrypt's API for ElGamal just offers things needed for OpenPGP
>> protocol.
>>
>> Actually, I think that "ElGamal crypto system" can refer different
>> crypto systems.  I think that your interpretation would be too modern,
>> and I'm afraid we can't find any existing (free software) implementation
>> which offers full features you'd like.
>>
>>> (3) The solution will be incompatible with existing applications
>>> running ElGamal in gcrypt -- their ciphertexts will be undecryptable.
>>> ----- What is your opinion to deal with this?
>>
>> If you are introducing new crypto protocol, old and new protocol can
>> co-exist, if designed carefully.
>>
>>
>>> My doubt on ElGamal encryption is a result of my cryptographic class
>>> CS276 at Berkeley. This page contains some reading
>>> (http://people.eecs.berkeley.edu/~alexch/classes/CS276-F2015.html).
>>> LEC NOTE 14 and PROB SET 4 have further information.
>>
>> I think that you can ask your teacher.  That would be better.
>> --
>
>
>
> --
>
> Weikeng Chen @ 795 Soda Hall

-- 

Weikeng Chen @ 795 Soda Hall