Skip to main content

Module endosclmul

Module endosclmul 

Source
Expand description

This module implements the EndoMul gate for short Weierstrass curve endomorphism-optimized variable base scalar multiplication.

§Purpose

Compute [scalar] * base_point where the scalar is given as bits and the base point is a curve point. This is the core operation for EC-based cryptography in-circuit.

§Notation

  • T: The fixed base point for the full scalar multiplication
  • P: The running accumulator point (changes row by row)

§Inputs (per row)

  • (x_T, y_T): Base point T being multiplied (columns 0, 1)
  • (x_P, y_P): Current accumulator point P (columns 4, 5)
  • n: Current accumulated scalar value (column 6)
  • b1, b2, b3, b4: Four scalar bits for this row (columns 11-14)

§Outputs (in next row)

  • (x_S, y_S): Updated accumulator point after processing 4 bits (cols 4,5)
  • n': Updated accumulated scalar: n' = 16*n + 8*b1 + 4*b2 + 2*b3 + b4

§Endomorphism-optimized scalar multiplication

For curves of the form y^2 = x^3 + b (like Pallas and Vesta), there exists an efficient endomorphism phi defined by:

phi(x, y) = (endo * x, y)

where endo (also called xi) is a primitive cube root of unity in the base field. This works because (endo * x)^3 = endo^3 * x^3 = x^3, so the point remains on the curve.

This endomorphism corresponds to scalar multiplication by lambda:

phi(T) = [lambda]T

where lambda is a primitive cube root of unity in the scalar field.

§How the optimization works

The key insight is that we can compute P + phi(T) or P - phi(T) almost as cheaply as P + T or P - T, because applying phi only requires multiplying the x-coordinate by endo.

For a 2-bit window (b1, b2), we can encode 4 different point operations:

b1b2Point Q added to accumulator P
00-T
01T
10-phi(T)
11phi(T)

This is achieved by:

  • xq = (1 + (endo - 1) * b1) * x_T = x_T if b1=0, or endo * x_T if b1=1
  • yq = (2 * b2 - 1) * y_T = -y_T if b2=0, or y_T if b2=1

So (xq, yq) represents one of {T, -T, phi(T), -phi(T)} based on (b1, b2).

§Why phi(T)? The GLV optimization

When we want to compute [k]T for a large scalar k, a standard variable-base method uses roughly one double-and-add update per scalar bit (~256 updates for a 256-bit scalar). The GLV method (Gallant-Lambert-Vanstone) cuts this roughly in half.

The key insight is that any scalar k can be decomposed as:

k = k1 + k2 * lambda (mod r)

where k1, k2 are roughly half the bit-length of k (about 128 bits each). Since phi(T) = [lambda]T, we can rewrite:

[k]T = [k1]T + [k2][lambda]T = [k1]T + [k2]phi(T)

Now instead of one 256-bit scalar multiplication, we have two 128-bit scalar multiplications: [k1]T and [k2]phi(T). But we can do even better by computing both simultaneously using a multi-scalar multiplication approach.

In each step, we process one bit from k1 and one bit from k2 together. The 2-bit encoding (b1, b2) selects which combination of T and phi(T) to add:

  • b1 selects between T (b1=0) and phi(T) (b1=1)
  • b2 selects the sign: negative (b2=0) or positive (b2=1)
b1b2Point added
00-T
01T
10-phi(T)
11phi(T)

The negative points come from the y-coordinate formula: yq = (2*b2 - 1)*y_T. When b2=0, we get -y_T, which negates the point (since -P = (x, -y) on elliptic curves). We need both positive and negative points to encode the scalar using a signed digit representation. With 2 bits we represent 4 distinct values {-1, +1} x {T, phi(T)}, which is more expressive than just {0, 1} x {T, phi(T)}. This signed representation is part of what makes the GLV method efficient - it allows the scalar decomposition to use both positive and negative contributions.

This interleaves the bits of k1 and k2, processing one bit of each per accumulator update. Since k1 and k2 are ~128 bits, we need only ~128 updates instead of ~256, halving the circuit size.

The gate processes 4 bits per row (two consecutive accumulator updates), so a 128-bit scalar requires 32 rows of EndoMul gates.

§Protocol fit

In Kimchi/Snarky terminology, this gate enforces the EC_endoscale point-update constraint (the point side of endomorphism-optimized scaling rounds). In Pickles terms, this corresponds to endomorphism-optimized point updates used with scalar challenges.

This gate is used to implement recursive verifier IPA/bulletproof point-folding logic efficiently (GLV endomorphism optimization), including repeated accumulator updates of the form A <- (A + Q) + A.

Typical protocol usage:

  • Wrap proofs: used as part of normal wrap-circuit verification of step proofs (part of the wrap recursion flow).
  • Step proofs (recursive setting): used when the step circuit verifies previous proofs (e.g. max_proofs_verified > 0).
  • Non-recursive step circuits: not inherently required; if no recursive verification gadget is instantiated, this gate need not be active.

§Usage

To compute [scalar] * base for a 128-bit scalar:

  1. Set up gates: Create 32 consecutive EndoMul gates (128 bits / 4 bits per row), followed by one Zero gate. The Zero gate is required because each EndoMul gate reads the accumulator from the next row.

  2. Compute initial accumulator: To avoid point-at-infinity edge cases, initialize the accumulator as acc0 = 2 * (T + phi(T)) where T is the base point and phi is the endomorphism.

  3. Prepare scalar bits: Convert the scalar to bits in MSB-first order (most significant bit at index 0).

  4. Generate witness: Call gen_witness with the witness array, starting row, endo coefficient, base point coordinates, MSB-first bits, and initial accumulator. The function returns the final accumulated point and the reconstructed scalar value.

See kimchi/src/tests/endomul.rs for a complete example.

§Invariants

The following invariants must be respected:

  1. Bit count: bits.len() must be a multiple of 4.

  2. Bit order: Bits must be in MSB-first order (most significant bit at index 0).

  3. Gate chain: For n bits, you need n/4 consecutive EndoMul gates, followed by a Zero gate (or any gate that doesn’t constrain the EndoMul output columns). The Zero gate is needed because EndoMul reads from the next row.

  4. Initial accumulator: acc0 must not be the point at infinity. The standard initialization is acc0 = 2 * (T + phi(T)) where T is the base point. This ensures the accumulator never hits the point at infinity during computation.

  5. Endo coefficient: The endo parameter must be the correct cube root of unity for the curve, obtained via endos::<Curve>().

  6. Base point consistency: The base point (x_T, y_T) must be the same across all rows of a single scalar multiplication.

  7. Scalar value verification: The EndoMul gate only constrains the row-to-row relationship n' = 16*n + 8*b1 + 4*b2 + 2*b3 + b4. It does not constrain the initial or final value of n. The calling circuit must add external constraints. To enforce:

    • Initial n = 0 at the first EndoMul row
    • Final n = k where k is the expected scalar value

§References

Structs§

EndoMulResult
The result of performing an endomorphism-optimized scalar multiplication.
EndosclMul
Implementation of the EndosclMul gate.

Functions§

gen_witness
Generates the witness values for a series of EndoMul gates.