[SUGGESTION NEEDED] A request for suggestion on furthering the discussion over ElGamal

Tue Oct 17 05:58:47 CEST 2017

And thank everybody who spends time reading these two long emails.

Also a question for confirmation. Upper applications in GNUPG only use
ElGamal for hybrid encryptions? Then it would be totally fine at this
moment.

The reason I came here is the PyCrypto.
https://github.com/Legrandin/pycryptodome/issues/90

The author seems to misunderstand the code (which passed from the
previous PyCrypto). Part of the reason is that they use libgcrypt to
SELF-TEST the implementation of ElGamal... That is why I came here --
gcrypt is the mother of many things, and this is the crucial first
step.
https://github.com/Legrandin/pycryptodome/blob/master/lib/Crypto/SelfTest/PublicKey/test_ElGamal.py

Almost all ElGamal implementations in Github have a gap with the
cryptography's perspective. (But yes, they are consistent with the
original paper of El Gamal...)

If El Gamal is as famous as RSA, I think we will consider Encoding, as
we do in RSA -- the original paper version of RSA is also not semantic
secure.

On Mon, Oct 16, 2017 at 8:42 PM, Weikeng Chen <w.k at berkeley.edu> wrote:
> [Discussion of this problem] The original paper of El Gamal does not
> mention this problem. (Same does the first paper for RSA)
>
> In 1998, Professor Dan Boneh (IACR fellow, Stanford) published a paper
> "The Decisional Diffie-Hellman Problem"
> http://crypto.stanford.edu/~dabo/pubs/papers/DDH.pdf    on the page
> 9-10, it mentions the necessity of using DDH for ElGamal. The leakage
> mentioned above is from that paper.
>
> The Wikipedia (https://en.wikipedia.org/wiki/ElGamal_encryption)
> talked about the DDH assumption and CDH assumption. DDH assumptions
> (https://en.wikipedia.org/wiki/Decisional_Diffie%E2%80%93Hellman_assumption)
> does not hold for $Z_p^*$ because it is not prime-order (the order is
> $p-1$).
>
> The formal treatment for DDH is to:
> (1) Generate a safe prime $p=2q+1$.
> (2) Find a generator $g$ for $Z_p^*$.
> (3) Compute $h=g^2$. This is a QR. Use this as the generator for ElGamal.
>
> This makes sure that $g^{xk}$ and $g^k$ are always QR, removing the
> first part of the leakage.
>
>
> To ensure the message leaks nothing, we need to make $m$ as a QR. This
> is not trivial. Because $m$ is generally binary string. They are free
> and not controlled. It might not be a QR or be a QR.
>
> In CRYPTO'98, the highest conference of cryptography, Ronald Cramer
> and Victor Shoup (IACR fellow) proposed a solution
> (http://people.csail.mit.edu/alinush/6.857-spring-2015/papers/cramer-shoup.pdf):
> (1) Given the message $m$ in $Z_p^*$.
> (2) Compute the encoding $e=m^2$.
> (3) Decoding is done by computing the unique square root of $e$ (by
> powering $(q+1)/2$)
>
> Then the ciphertexts under ElGamal now are all QR. The leakage from
> that is fixed.
>
> More than that, such a construction of ElGamal is proved to be
> semantic secure if the DDH assumption holds.
>
>
> [Other materials]
>
> Lecture note of Public Key Encryption at UC Berkeley on Graduate
> Cryptography (CS276). Instructor Alessandro Chiesa, one of the
> inventors of zerocash and then zCash. (Page 2-3)
> https://128.32.189.73/~alexch/docs/CS276-F2015/lecture-14.pdf
>
> Lecture note of CS276 at UC Berkeley. Instructor Luca Trevisan, a
> professor in theory computer science. (Page 3-4)
> https://people.eecs.berkeley.edu/~luca/cs276/lecture17.pdf
>
> Lecture note of CSE207 in UCSD. Instructor Mihir Bellare, an inventor
> or RSA-OAEP and HMAC. (last paper)
> http://cseweb.ucsd.edu/~mihir/cse207/w-asym.pdf
>
> Lecture note of CS4380 in Cornell, Instructor Rafeal Pass. (Page 105-106)
> http://www.cs.cornell.edu/courses/cs4830/2010fa/lecnotes.pdf
>
> These materials all support the idea above.
>
>
> [Proposed Solution] This is totally not invented by me. But the idea
> merged of these notes and papers. But as I said, there is no standard
> on how to fix ElGamal, because cryptography world originally has no
> so-called standard-making organization.
>
> [1] We choose the generator as a QR. $h=g^2$
> [2] We encode the message by squaring it, and decode it back by
> finding the unique square root (i.e. powering (q+1)/2). $m2=m^2$.
> $m=(m_2)^((q+1)/2)$.
>
>
>
> [Not Compatible] I agree with Niibe that we may be able to make it
> coexisting. But it turns out that we need to make it clear in the
> "key".
>
> Because a ciphertext generated by the old version cannot be decoded
> back (the unique square root is nonsense). And new ciphertexts cannot
> be recovered by the old system because of the lack of decoding.
>
>
> [My Suggestion]
>  - Add ElGamal-DDH independent to ElGamal.
>  - Add a comment in original ElGamal that "it is not semantic secure
> so cannot be used for applications besides hybrid encryption".
>
> The hybrid encryption use case is okay because CDH can hold. This is
> proved by Dan Boneh in that year 2000 paper.
>
>
> =========== END. Should I make a slide?
>
>
>
> On Mon, Oct 16, 2017 at 7:46 PM, Weikeng Chen <w.k at berkeley.edu> wrote:
>> Thank Niibe.
>>
>> Actually, my teacher Alessandro (an assistant professor of
>> cryptography and theory) and I (a Ph.D. student on applied
>> cryptography) have a brief discussion after the class. As far as we
>> see, the current ElGamal implementation does not achieve semantic
>> security. Niibe mentions that the high-level application which uses
>> ElGamal in black-box may be okay because it uses for hybrid
>> encryption.
>>
>> I agree because if ElGamal is not used for direct data encryption, but
>> only in the hybrid encryption, then knowing one bit of the encryption
>> key does not help to break the symmetric encryption ---- However, it
>> is annoying for cryptographers. Because we require semantic security
>> on all other cryptographic primitives, and the rest are implemented
>> generally correct. And in non-hybrid encryption cases, it is
>> problematic.
>>
>> So if the scope of using ElGamal only for hybrid encryptions, it
>> should be totally okay. But what I add is:
>> (1) If it is used to load message directly, it significantly leaks
>> information of that message.
>> (2) Due to the usage of ElGamal in homomorphic encryption, there are
>> use cases where we cannot use hybrid encryption, so at this case, the
>> messages are loaded directly.
>> (3) gcrypt as the model of many cryptographic libraries is treated (at
>> least people do) as a standard. It is part of the source of the
>> massive problem in implementing ElGamal.
>>
>> Let me add some words on the semantic security.
>>
>>
>> [Quick Recap of ElGamal] Assume the ElGamal private key is $x$, and
>> the public key is $g^x$ where $g$ is a generator of $Z_p^*$.
>>
>> The encryption of message $m$ where $k$ is selected randomly:
>>    -   First part of the ciphertext:               m g^{xk}
>>    -   Second part of the ciphertext:           g^k
>>
>>
>> [Semantic Security] This is the general security requirement for all
>> symmetric and asymmetric ciphers. The scheme can ensure that a
>> computer run under a polynomial function of the security parameter
>> cannot learn anything information from the ciphertexts except the
>> length with a non-negligible possibility in respect to the security
>> parameter.
>>
>> Or, tweaking a sufficiently large security parameter (like 2048 bit
>> for RSA today), nobody except supermen can learn anything except the
>> length.
>>
>> The non-textbook version of RSA (with random padding) achieves
>> semantic security. The symmetric ciphers under some encryption modes
>>
>>
>> [Test Example] Assume we have $m_0$ and $m_1$. They are different.
>>
>>
>> [Are They Quadratic Residues?] Half of the all possible $m$ accepted
>> by ElGamal are Quadratic Residues, and those remaining are not
>> Quadratic Residues. (So it leaks information worth of 1 bit). A number
>> $m$ is QR. Then there is a value $x$ so that $m=x^2 mod p$. If a
>> number $m$ is not QR, then such $x$ does not exist.
>>
>>
>> [Possibility that $m_0$ and $m_1$ different in QR] The possibility
>> that $m_0$ and $m_1$ are, not both QR and not both non-QR, are 1/2. In
>> this case, we can distinguish $m_0$ and $m_1$ by whether or not QR.
>>
>>
>> [What happens after encryption?] Let us play a game: assume I know
>> whether or not your private key $x$ is odd or even (this is a
>> reasonable assumption because the same asymmetric key is used
>> repeatedly). Without loss of generality, I assume $x$ is odd, and
>> $g^x$ is non-QR.
>>
>> You encrypt a message $m$. The ciphertexts are $c=(m g^{xk}, g^k)$.
>>
>> If $g^k$ is QR, then $g^{xk}$ is also a QR. If $g^k$ is non-QR, then
>> $g^{xk}$ is also a non-QR.
>>
>> If $m g^{xk}$ is a QR, $g^k$ is QR                 =>       $m$ is a QR
>> If $m g^{xk}$ is a non-QR,  $g^k$ is QR          =>       $m$ is a non-QR
>> If $m g^{xk}$ is a QR, $g^k$ is non-QR           =>       $m$ is a non-QR
>> If $m g^{xk}$ is a non-QR,  $g^k$ is non-QR    =>       $m$ is a QR
>>
>> You see. I can distinguish whether $m$ is QR or not even if it is
>> encrypted -- the ciphertext leaks one-bit information about the
>> underlying message.
>>
>> Another case. Let us assume $x$ is even and $g^x$ is QR. You encrypt a
>> message $m$. The ciphertexts are $c=(m g^{xk}, g^k)$.
>>
>> Then if $m g^{xk}$ is QR, $m$ is QR.
>> If $m g^{xk}$ is non-QR, $m$ is non-QR.
>>
>> It is even easier to guess this one-bit information of the plaintext
>> from the ciphertext.
>>
>>
>> [Summary] The implementation that uses $g$ as the generator of $Z_p^*$
>> and uses no specific encoding of $m$, the ciphertext leaks one bit
>> about the underlying plaintext.
>>
>>
>> ================ First part ended. Next email will continue.
>>
>> On Mon, Oct 16, 2017 at 7:03 PM, NIIBE Yutaka <gniibe at fsij.org> wrote:
>>> Hello,
>>>
>>> Weikeng Chen <w.k at berkeley.edu> wrote:
>>>> I am going to continue my post last month on the ElGamal in gcrypt. I
>>>> am sorry to say gcrypt is the beginning of many wrong implementations
>>>> of ElGamal. Some Github repo owners use this to show their
>>>> implementation is correct. It uses a safe prime but the generator is
>>>> still wrong. This makes the ElGamal not semantic secure.
>>>
>>> Historically speaking, libgcrypt was factored out from GnuPG.  So, its
>>> ElGamal API and the implementation are primarily intended to be used by
>>> GnuPG.  In this context, it is perfectly OK.  I mean, "semantic
>>> security" of the crypto system doesn't matter to the security of the
>>> application.
>>>
>>>> Niibe has mentioned that ElGamal used for hybrid encryption in general
>>>> (which would be secure), but we two agree that we should improve.
>>>
>>> I agreed that it make sense to improve the generator.  I don't think it
>>> is mandatory.  I didn't mean introducing new or drastic change which may
>>> impact existing usages.
>>>
>>>> Some programmers may use ElGamal for homomorphic encryption and
>>>> computation over encrypted data.
>>>
>>> Interesting.  In that case, I wonder if current API of libgcrypt is
>>> enough or good.  (I guess that... I don't think it's enough.)
>>>
>>>> (1) Showing the current ElGamal implementation is not secure would be
>>>> easy, as Wikipedia and some 17 years ago papers have some information.
>>>> If we consult expert cryptographers in the field. They can clearly say
>>>> the current implementation is insecure. --------- Do you think an
>>>> experiment or proof-of-concept will be necessary?
>>>
>>> If possible, please use careful wording.
>>>
>>> The API of crypto library which implements crypto system lacking
>>> "semantic security" property does not automatically mean "insecure".
>>> For a library, its usage by an application matters.
>>>
>>>> (2) Providing a corrected version is not trivial -- it is not just
>>>> making the generator correct. We also need to embed the message (a
>>>> binary string) into an element in a specific subgroup (and recovered
>>>> later). There are many papers, and many years ago, to say some ideas
>>>> about this encoding. One of that is my proposed solution.
>>>
>>> You are talking about new crypto protocol, using ElGamal crypto system,
>>> don't you?
>>>
>>>> However, there is no official example. Even the original paper by El
>>>> Gamal is not secure under this concern. Besides gcrypt, ---------- how
>>>> to find a gold standard for ElGamal implementations? OpenSSL
>>>> implements DH (also wrong but not a big deal in the real world,
>>>> because it is for DH key exchange not ElGamal) ------ and one thing
>>>> for sure, how to make it easy to follow in order to persuade a
>>>> sufficiently large community to accept this change?
>>>
>>> It seems for me that you have some confusions among:
>>>
>>>         (1) ElGamal crypto system
>>>         (2) crypto protocol which uses ElGamal crypto system
>>>             --- e.g. OpenPGP
>>>         (3) an implementation, in this case, partial implementation of (1)
>>>             --- e.g. libcrypt
>>>         (4) an application of (3) which implements (2)
>>>             --- e.g. GnuPG
>>>
>>> I would admit that libgcrypt API is not enough to provide full features
>>> of ElGamal crypto system, for example, as you pointed out, it doesn't
>>> provide "semantically secure" use case.  In other words, I could say,
>>> libgcrypt's API for ElGamal just offers things needed for OpenPGP
>>> protocol.
>>>
>>> Actually, I think that "ElGamal crypto system" can refer different
>>> crypto systems.  I think that your interpretation would be too modern,
>>> and I'm afraid we can't find any existing (free software) implementation
>>> which offers full features you'd like.
>>>
>>>> (3) The solution will be incompatible with existing applications
>>>> running ElGamal in gcrypt -- their ciphertexts will be undecryptable.
>>>> ----- What is your opinion to deal with this?
>>>
>>> If you are introducing new crypto protocol, old and new protocol can
>>> co-exist, if designed carefully.
>>>
>>>
>>>> My doubt on ElGamal encryption is a result of my cryptographic class
>>>> CS276 at Berkeley. This page contains some reading
>>>> (http://people.eecs.berkeley.edu/~alexch/classes/CS276-F2015.html).
>>>> LEC NOTE 14 and PROB SET 4 have further information.
>>>
>>> I think that you can ask your teacher.  That would be better.
>>> --
>>
>>
>>
>> --
>>
>> Weikeng Chen @ 795 Soda Hall
>
>
>
> --
>
> Weikeng Chen @ 795 Soda Hall

-- 

Weikeng Chen @ 795 Soda Hall