### Re: [Cfrg] Curve selection revisited

Robert Ransom <rransom.8774@gmail.com> Tue, 29 July 2014 11:14 UTC

Return-Path: <rransom.8774@gmail.com>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (ietfa.amsl.com [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id CDA981B27BC
for <cfrg@ietfa.amsl.com>; Tue, 29 Jul 2014 04:14:21 -0700 (PDT)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: -1.75

X-Spam-Level:

X-Spam-Status: No, score=-1.75 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25,
FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=no

Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id j6be-UwNVrR3 for <cfrg@ietfa.amsl.com>;
Tue, 29 Jul 2014 04:14:17 -0700 (PDT)

Received: from mail-qa0-x230.google.com (mail-qa0-x230.google.com
[IPv6:2607:f8b0:400d:c00::230])
(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id E88A11B27BB
for <cfrg@irtf.org>; Tue, 29 Jul 2014 04:14:16 -0700 (PDT)

Received: by mail-qa0-f48.google.com with SMTP id m5so9255513qaj.35
for <cfrg@irtf.org>; Tue, 29 Jul 2014 04:14:16 -0700 (PDT)

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type:content-transfer-encoding;
bh=ToAQxgq93EYAF2DBQuynz+VxfUHFQ+/Jn1ElbF4A13o=;
b=KICrNCLeaeXROubf4KaaVLOSBqM2ASCRG/g83GTqtmxfrQePKtGNStCWMzklyYDH0h
RoY13tswdMDswXexeR5LiIxIDBMw7gljYRfSojf8kEeYljwgM1FvduMinIRyalC2wpBV
rSBxZKUrRC62suKy40muhJ6Q3ogDHSI3jRZHv+vkFgFdITDPGsyCXoJWqH9isBwtaySH
EiCLpFi/+1jc2CU2BftkPsrK8xf7I5+LjJ4pfvO9ogTHY2RszroJ7nKElLOOTsXq8KiL
cszQtKa+AtV6G42AzXkGhJUOKSeWC9rKBnGKFr5SUeI4biRw1lWNqWz1BYl8IG5gzmsA
9JaQ==

MIME-Version: 1.0

X-Received: by 10.229.231.74 with SMTP id jp10mr2286567qcb.29.1406632455890;
Tue, 29 Jul 2014 04:14:15 -0700 (PDT)

Received: by 10.140.86.135 with HTTP; Tue, 29 Jul 2014 04:14:15 -0700 (PDT)

In-Reply-To: <53D420B3.10707@brainhub.org>

References: <CA+Vbu7xroa68=HOZtbf=oz7kK2EeUv_z1okpnjxHPR0ZtHD5cA@mail.gmail.com>
<CFF7E184.28E9F%kenny.paterson@rhul.ac.uk>
<53D2781B.8030605@sbcglobal.net>
<CACsn0ckqFigWoH2+OOEHSd2VWPp8y6=m8H5OsFRyjXmjK7+m4w@mail.gmail.com>
<CABqy+srxMNuG0AaQd0SaegHvZWgbW762EQq+iAHL_fbu6sOJJQ@mail.gmail.com>
<53D420B3.10707@brainhub.org>

Date: Tue, 29 Jul 2014 04:14:15 -0700

Message-ID: <CABqy+so6JcL3drjXuiQfLhm-LPMOJuS9ES5Hyb1UQRhi-gV2jA@mail.gmail.com>

From: Robert Ransom <rransom.8774@gmail.com>

To: Andrey Jivsov <crypto@brainhub.org>

Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: quoted-printable

Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/ineI42CtwjeMF8HbBdm0bz1iqno

Cc: cfrg@irtf.org

Subject: Re: [Cfrg] Curve selection revisited

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.15

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>,
<mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>,
<mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Tue, 29 Jul 2014 11:14:22 -0000

On 7/26/14, Andrey Jivsov <crypto@brainhub.org> wrote: > On 07/25/2014 07:25 PM, Robert Ransom wrote: >> The exact value of c determines whether a portable or vectorized >> implementation can delay carries until after reduction. For example, >> after curve25519-donna performs a multiply or squaring, each of 19 >> 64-bit limbs has absolute value less than 14*2^54. When >> freduce_degree reduces a field element to 10 limbs without carrying, >> the result has absolute value less than (c+1)*14*2^54 = 280*2^54, or a >> bit more than 2^62. In the Microsoft field 2^255 - 765, or even 2^256 >> - 189, this would not be possible. >> >> Carries between limbs can be performed more efficiently than in >> curve25519-donna -- curve25519-donna goes to extra effort to keep >> limbs signed. In the Montgomery ladder, this optimization does not >> add any extra carry steps, because the Montgomery-ladder formulas >> never perform more than two add/subtract operations between a >> multiply/square operation. With Edwards-curve operations, this >> optimization may require an additional carry step in the doubling >> formula. >> >> The exact value of s determines whether carries can be vectorized >> within a coordinate-field element. For example, 2^251 - 9 can be >> represented using one 26-bit limb (in the least significant position) >> and 9 25-bit limbs, but this interferes with the vectorization >> strategy described in the neoncrypto paper (by Dr. Bernstein et al.). >> Since c=9 is so small, this can be worked around by representing >> coordinates modulo 2^252 - 18 instead, but this trick is rarely >> possible without running into the bound on c. > The main argument for p = 2^255-19 v.s. p = 1^256-189 is that the C=19 > is smaller than C=189 and this helps to "manage" the carries. Actually, the main argument for p = 2^255 - 19 is that Curve25519 is the de-facto standard curve for high-performance ECC in a group of near-256-bit order. Part of the reason for that is that implementers have, in practice, been able to rely on the small value of what you denote as “C” for that field in optimizing their software, so no one has felt a need to switch to a different field of that form for performance reasons. (Mike Hamburg introduced ‘special Montgomery primes’ and showed evidence that they can provide better performance in some implementations than the Curve25519 coordinate field in <https://eprint.iacr.org/2012/309.pdf>, but that's a different form of prime.) > Let's assume that an F(p) integer is represented as n limbs (words). > > 256 bit Montgomery curve will have on the order of (5 mul + 4 squares) > and about the same number F(p) additions per one scalar bit. > > Therefore, we can talk about one unreduced F(p) multiplication + F(p) > reduction (where C plays its role) + F(p) addition. With this in mind > the contribution of C can be estimated within this group. > > F(p) multiplications (counting squares here as well) dominate the > performance of EC operations: > * the result of the multiplication is 2*n limbs and the complexity of > this operation is n^2 limb multiplications and O(n^2) additions. > * the 2*n limbs must be reduced (mod p) to n limbs immediately after > multiplication > * there is on average ~1 addition/subtraction of n limbs per each > multiplication due to EC formula > > Each multiplication needs degree reduction and settling of accumulated > carries before the next multiplication. One can only postpone settling > of carries while he does additions (prior to the next multiplciation). > > We have about n^2 limb additions, plus n limb additions due to EC > formular, plus we must perform n additions of the form <some_limb>*C > (for modp reduction). > > Assuming n=10 limbs, the entire contribution of n <some_limb>*C is bound > by ~10% of all the additions. > > We are not talking about C=0, however, but rather about |C|=4 bits v.s. > |C| = 8 bits difference. > > A rough estimate here is 5% penalty. As Dr. Bernstein has pointed out, a larger value of C requires 3*n carries instead of n carries (see <https://moderncrypto.org/mail-archive/curves/2014/000237.html>), and carries have high latency (see section 4, subsection ‘Reduction’ of naclcrypto). > One reason why I think this is a good estimate is that there may be a > benefit of more fine-tuned control of carries. The highest accumulation > of the carries happens in the inner limbs of the 2*n integer (imagine > paper-and-pencil multiplication of two n-limb numbers: the stack is the > highest in the middle). Therefore, if we have the need to control > overflow within, say, multiplication, perhaps it's possible to > accomplish this within a subset of n limbs. So you expect greater implementation effort and complexity to reduce the added cost, but not eliminate it? > ( BTW, I don't see a clear argument why can't any F(p) be vectorized > (some more efficiently than others). The vectorization using x86 SSE2 is > known to speeds up RSA operations for generic F(p)s. ) As I said, the vectorization strategy used in the neoncrypto paper is incompatible with 251-bit moduli. That strategy requires pairing together limbs of the same size, so that a vectorized shift operation can be used to carry out of both limbs at the same time. In RSA, where (in your notation) C is necessarily huge, there is no performance penalty from making C a few bits larger, so it's easy to lengthen the modulus and make every limb the same size, so it's easy to vectorize carries across any pair of limbs. But that's not very relevant to the primes used for fast ECC. Do you see any benefit of p = 2^256 - 189 over p = 2^255 - 19 that you believe offsets the costs of switching to a different coordinate field? Robert Ransom

- [Cfrg] Curve selection revisited Benjamin Black
- Re: [Cfrg] Curve selection revisited Yoav Nir
- Re: [Cfrg] Curve selection revisited Paterson, Kenny
- Re: [Cfrg] Curve selection revisited David Jacobson
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Ilari Liusvaara
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Michael Jenkins
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Stephen Farrell
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Mike Hamburg
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Russ Housley
- Re: [Cfrg] Curve selection revisited Salz, Rich
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Salz, Rich