From dtsen at us.ibm.com Mon Mar 2 02:37:32 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 2 Mar 2026 01:37:32 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: <87h5r3r50l.fsf@jacob.g10code.de> References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:47] danny at ltcden12-lp1 mldsa-ntt_tests % ./perf_mldsa_ntt_opt === Optimized assembly NTT test cpu_time_used (sec)=0.046582 loops=100000 -->ops / sec = 2146751.964278 === Original C NTT test cpu_time_used (sec)=0.229215 loops=100000 -->ops / sec = 436271.622712 -->Optimized improvement over original = 3.920678 -->Optimized speed over original faster = 4.920678 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.052021 loops=100000 -->ops / sec = 1922300.609369 === Original C Inverse NTT test cpu_time_used (sec)=0.270790 loops=100000 -->ops / sec = 369289.855608 -->Optimized improvement over original = 4.205398 -->Optimized speed over original faster = 5.205398 ________________________________ From: Werner Koch Sent: Thursday, February 26, 2026 9:47 PM To: Danny Tsen via Gcrypt-devel Cc: Danny Tsen Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for On Thu, 26 Feb 2026 10:23, Danny Tsen said: > I don't have benchmark for libgcrypt. I do have my own testing > performance number on NTT operation. That probably not what you are I just noticed that we do have support for MLKEM and MLDSA in our ./bench-slope . We should change that to make it easier torun benchmarks. I was actually looking only for a rough figure on how much performance you gain with your patches. Salam-Shalom, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtsen at us.ibm.com Mon Mar 2 03:19:29 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 2 Mar 2026 02:19:29 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: Hi Werner, I do some modification for the ML-KEM format. Here is the raw performance number for ML-KEM NTT. Hope this help. Thanks. -Danny [16:33] danny at ltcden12-lp1 mlkem-ipcri % ./perf_mlkem_test === Optimized assembly NTT test cpu_time_used (sec)=0.016707 loops=100000 -->ops / sec = 5985515.053570 === Original C NTT test cpu_time_used (sec)=0.107232 loops=100000 -->ops / sec = 932557.445539 -->Optimized improvement over original = 5.418388 -->Optimized speed over original faster = 6.418388 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.031500 loops=100000 -->ops / sec = 3174603.174603 === Original C Inverse NTT test cpu_time_used (sec)=0.138457 loops=100000 -->ops / sec = 722245.895838 -->Optimized improvement over original = 3.395460 -->Optimized speed over original faster = 4.395460 ________________________________ From: Gcrypt-devel on behalf of Danny Tsen via Gcrypt-devel Sent: Monday, March 2, 2026 9:37 AM To: Werner Koch ; Danny Tsen via Gcrypt-devel Subject: [EXTERNAL] RE: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:?47] danny@?ltcden12-lp1 mldsa-ntt_tests Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:47] danny at ltcden12-lp1 mldsa-ntt_tests % ./perf_mldsa_ntt_opt === Optimized assembly NTT test cpu_time_used (sec)=0.046582 loops=100000 -->ops / sec = 2146751.964278 === Original C NTT test cpu_time_used (sec)=0.229215 loops=100000 -->ops / sec = 436271.622712 -->Optimized improvement over original = 3.920678 -->Optimized speed over original faster = 4.920678 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.052021 loops=100000 -->ops / sec = 1922300.609369 === Original C Inverse NTT test cpu_time_used (sec)=0.270790 loops=100000 -->ops / sec = 369289.855608 -->Optimized improvement over original = 4.205398 -->Optimized speed over original faster = 5.205398 ________________________________ From: Werner Koch Sent: Thursday, February 26, 2026 9:47 PM To: Danny Tsen via Gcrypt-devel Cc: Danny Tsen Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for On Thu, 26 Feb 2026 10:23, Danny Tsen said: > I don't have benchmark for libgcrypt. I do have my own testing > performance number on NTT operation. That probably not what you are I just noticed that we do have support for MLKEM and MLDSA in our ./bench-slope . We should change that to make it easier torun benchmarks. I was actually looking only for a rough figure on how much performance you gain with your patches. Salam-Shalom, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From gniibe at fsij.org Wed Mar 18 06:55:51 2026 From: gniibe at fsij.org (NIIBE Yutaka) Date: Wed, 18 Mar 2026 14:55:51 +0900 Subject: [PATCH] cipher:rsa: Fix the dead-code of stronger_key_check. Message-ID: <9a4f7395d4c784d3faba93d4baa97c2d9b5b185f.1773813207.git.gniibe@fsij.org> * cipher/rsa.c (check_secret_key): Rename from stronger_key_check to be enabled with ENABLE_STRONGER_CHECK. -- GnuPG-bug-id: 8171 Signed-off-by: NIIBE Yutaka --- cipher/rsa.c | 52 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 18 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-cipher-rsa-Fix-the-dead-code-of-stronger_key_check.patch Type: text/x-patch Size: 2990 bytes Desc: not available URL: From dtsen at us.ibm.com Mon Mar 23 15:17:55 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 23 Mar 2026 14:17:55 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: Hi Werner, It's been a while since my last response. Do you have more comments or questions? And what's the status of this patch? Thanks. -Danny ________________________________ From: Danny Tsen Sent: Sunday, March 1, 2026 8:19 PM To: Werner Koch ; Danny Tsen via Gcrypt-devel ; Danny Tsen Subject: Re: [EXTERNAL] RE: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for Hi Werner, I do some modification for the ML-KEM format. Here is the raw performance number for ML-KEM NTT. Hope this help. Thanks. -Danny [16:33] danny at ltcden12-lp1 mlkem-ipcri % ./perf_mlkem_test === Optimized assembly NTT test cpu_time_used (sec)=0.016707 loops=100000 -->ops / sec = 5985515.053570 === Original C NTT test cpu_time_used (sec)=0.107232 loops=100000 -->ops / sec = 932557.445539 -->Optimized improvement over original = 5.418388 -->Optimized speed over original faster = 6.418388 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.031500 loops=100000 -->ops / sec = 3174603.174603 === Original C Inverse NTT test cpu_time_used (sec)=0.138457 loops=100000 -->ops / sec = 722245.895838 -->Optimized improvement over original = 3.395460 -->Optimized speed over original faster = 4.395460 ________________________________ From: Gcrypt-devel on behalf of Danny Tsen via Gcrypt-devel Sent: Monday, March 2, 2026 9:37 AM To: Werner Koch ; Danny Tsen via Gcrypt-devel Subject: [EXTERNAL] RE: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:?47] danny@?ltcden12-lp1 mldsa-ntt_tests Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:47] danny at ltcden12-lp1 mldsa-ntt_tests % ./perf_mldsa_ntt_opt === Optimized assembly NTT test cpu_time_used (sec)=0.046582 loops=100000 -->ops / sec = 2146751.964278 === Original C NTT test cpu_time_used (sec)=0.229215 loops=100000 -->ops / sec = 436271.622712 -->Optimized improvement over original = 3.920678 -->Optimized speed over original faster = 4.920678 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.052021 loops=100000 -->ops / sec = 1922300.609369 === Original C Inverse NTT test cpu_time_used (sec)=0.270790 loops=100000 -->ops / sec = 369289.855608 -->Optimized improvement over original = 4.205398 -->Optimized speed over original faster = 5.205398 ________________________________ From: Werner Koch Sent: Thursday, February 26, 2026 9:47 PM To: Danny Tsen via Gcrypt-devel Cc: Danny Tsen Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for On Thu, 26 Feb 2026 10:23, Danny Tsen said: > I don't have benchmark for libgcrypt. I do have my own testing > performance number on NTT operation. That probably not what you are I just noticed that we do have support for MLKEM and MLDSA in our ./bench-slope . We should change that to make it easier torun benchmarks. I was actually looking only for a rough figure on how much performance you gain with your patches. Salam-Shalom, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From gniibe at fsij.org Tue Mar 24 03:05:42 2026 From: gniibe at fsij.org (NIIBE Yutaka) Date: Tue, 24 Mar 2026 11:05:42 +0900 Subject: [PATCH] cipher:rsa: Fix the dead-code of stronger_key_check. In-Reply-To: <9a4f7395d4c784d3faba93d4baa97c2d9b5b185f.1773813207.git.gniibe@fsij.org> References: <9a4f7395d4c784d3faba93d4baa97c2d9b5b185f.1773813207.git.gniibe@fsij.org> Message-ID: <87eclavv61.fsf@haruna.fsij.org> Hello, NIIBE Yutaka wrote: > * cipher/rsa.c (check_secret_key): Rename from stronger_key_check > to be enabled with ENABLE_STRONGER_CHECK. I pushed this change to master. Let us see if ENABLE_STRONGER_CHECK has any larger negative impact. It would not be good to enable this macro as default now, in a minor release. -- From wk at gnupg.org Tue Mar 24 09:22:27 2026 From: wk at gnupg.org (Werner Koch) Date: Tue, 24 Mar 2026 09:22:27 +0100 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: (Danny Tsen via Gcrypt-devel's message of "Mon, 23 Mar 2026 14:17:55 +0000") References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: <874im5d4cc.fsf@jacob.g10code.de> On Mon, 23 Mar 2026 14:17, Danny Tsen said: > It's been a while since my last response. Do you have more comments > or questions? And what's the status of this patch? I understand the background and from my POV it is okay to apply tye patch. Howeber Gniibe needs to check how your patch can be integrated into our system which is directly based on the reference code. Shalom-Salam, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- A non-text attachment was scrubbed... Name: openpgp-digital-signature.asc Type: application/pgp-signature Size: 284 bytes Desc: not available URL: From gniibe at fsij.org Fri Mar 27 07:21:39 2026 From: gniibe at fsij.org (NIIBE Yutaka) Date: Fri, 27 Mar 2026 15:21:39 +0900 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: <20260224002753.151873-1-dtsen@us.ibm.com> References: <20260224002753.151873-1-dtsen@us.ibm.com> Message-ID: <87tsu1bxn0.fsf@haruna.fsij.org> Hello, Sorry for late reply. Danny Tsen wrote: > Added optimized (i)NTT algorithm support for ppc64le (Power 8 and > above). Defined ENABLE_PPC_DILITHIUM and ENABLE_PPC_KYBER for > dilithium (ML-DSA) and kyber (ML-KEM) NTT and inverse NTT. Thank you for your work. The approach of optimizing NTT functions looks good. Let me start a discussion about Kyber. Then, we can apply the result to Dilithium. I wonder if we can do a bit more, so that we can avoid the duplication of the ZETA constant among NTT implementation and kyber-common.c. I'm considering about factoring following five functions from kyber-common.c: void _gcry_poly_ntt(poly *r); void _gcry_poly_invntt_tomont(poly *r); void _gcry_poly_reduce(poly *r) void _gcry_poly_tomont(poly *r); void _gcry_poly_basemul_montgomery(poly *r, const poly *a, const poly *b); into, say, kyber-common-generic.c. And provide archtecture specific kyber-common--.S for optimized version(s). This way, NTT functions are covered and ZETA is placed inside kyber-common-*. How do you think? I'll try with the optimized implementation of AVX2 in the reference code. https://www.pq-crystals.org/kyber/ -- From dtsen at us.ibm.com Mon Mar 30 14:28:24 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 30 Mar 2026 12:28:24 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: <87tsu1bxn0.fsf@haruna.fsij.org> References: <20260224002753.151873-1-dtsen@us.ibm.com> <87tsu1bxn0.fsf@haruna.fsij.org> Message-ID: Hi, That's seems like a good idea. Let me know when you have the framework available. I can follow that. Thanks. -Danny ________________________________ From: NIIBE Yutaka Sent: Friday, March 27, 2026 1:21 AM To: Danny Tsen ; gcrypt-devel at gnupg.org Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for Hello, Sorry for late reply. Danny Tsen wrote: > Added optimized (i)NTT algorithm support for ppc64le (Power 8 and > above). Defined ENABLE_PPC_DILITHIUM and ENABLE_PPC_KYBER for > dilithium (ML-DSA) and kyber (ML-KEM) NTT and inverse NTT. Thank you for your work. The approach of optimizing NTT functions looks good. Let me start a discussion about Kyber. Then, we can apply the result to Dilithium. I wonder if we can do a bit more, so that we can avoid the duplication of the ZETA constant among NTT implementation and kyber-common.c. I'm considering about factoring following five functions from kyber-common.c: void _gcry_poly_ntt(poly *r); void _gcry_poly_invntt_tomont(poly *r); void _gcry_poly_reduce(poly *r) void _gcry_poly_tomont(poly *r); void _gcry_poly_basemul_montgomery(poly *r, const poly *a, const poly *b); into, say, kyber-common-generic.c. And provide archtecture specific kyber-common--.S for optimized version(s). This way, NTT functions are covered and ZETA is placed inside kyber-common-*. How do you think? I'll try with the optimized implementation of AVX2 in the reference code. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.pq-2Dcrystals.org_kyber_&d=DwIBAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=zspFcGYEyUrRywX_TdjlLwwrCx0eBFnzcs6XZVVVMh0&m=cyhP1gGDXWh9JCIn4z5NrebvLkC7bN89aMGL_HFl26R2f9h7kqRDsaD6W5C2Q8tQ&s=guKkLMabJUVbm4bjm61GKleAAubKCEyFJJobD1MSghQ&e= -- -------------- next part -------------- An HTML attachment was scrubbed... URL: