Expand description
This module implements the EndoMul gate for short Weierstrass curve endomorphism-optimized variable base scalar multiplication.
§Purpose
Compute [scalar] * base_point where the scalar is given as bits and the
base point is a curve point. This is the core operation for EC-based
cryptography in-circuit.
§Notation
T: The fixed base point for the full scalar multiplicationP: The running accumulator point (changes row by row)
§Inputs (per row)
(x_T, y_T): Base point T being multiplied (columns 0, 1)(x_P, y_P): Current accumulator point P (columns 4, 5)n: Current accumulated scalar value (column 6)b1, b2, b3, b4: Four scalar bits for this row (columns 11-14)
§Outputs (in next row)
(x_S, y_S): Updated accumulator point after processing 4 bits (cols 4,5)n': Updated accumulated scalar:n' = 16*n + 8*b1 + 4*b2 + 2*b3 + b4
§Endomorphism-optimized scalar multiplication
For curves of the form y^2 = x^3 + b (like Pallas and Vesta), there exists an efficient endomorphism phi defined by:
phi(x, y) = (endo * x, y)
where endo (also called xi) is a primitive cube root of unity in the base
field. This works because (endo * x)^3 = endo^3 * x^3 = x^3, so the point
remains on the curve.
This endomorphism corresponds to scalar multiplication by lambda:
phi(T) = [lambda]T
where lambda is a primitive cube root of unity in the scalar field.
§How the optimization works
The key insight is that we can compute P + phi(T) or P - phi(T) almost
as cheaply as P + T or P - T, because applying phi only requires
multiplying the x-coordinate by endo.
For a 2-bit window (b1, b2), we can encode 4 different point operations:
| b1 | b2 | Point Q added to accumulator P |
|---|---|---|
| 0 | 0 | -T |
| 0 | 1 | T |
| 1 | 0 | -phi(T) |
| 1 | 1 | phi(T) |
This is achieved by:
xq = (1 + (endo - 1) * b1) * x_T= x_T if b1=0, or endo * x_T if b1=1yq = (2 * b2 - 1) * y_T= -y_T if b2=0, or y_T if b2=1
So (xq, yq) represents one of {T, -T, phi(T), -phi(T)} based on (b1, b2).
§Why phi(T)? The GLV optimization
When we want to compute [k]T for a large scalar k, a standard
variable-base method uses roughly one double-and-add update per scalar bit
(~256 updates for a 256-bit scalar). The GLV method
(Gallant-Lambert-Vanstone) cuts this roughly in half.
The key insight is that any scalar k can be decomposed as:
k = k1 + k2 * lambda (mod r)
where k1, k2 are roughly half the bit-length of k (about 128 bits each).
Since phi(T) = [lambda]T, we can rewrite:
[k]T = [k1]T + [k2][lambda]T = [k1]T + [k2]phi(T)
Now instead of one 256-bit scalar multiplication, we have two 128-bit scalar
multiplications: [k1]T and [k2]phi(T). But we can do even better by
computing both simultaneously using a multi-scalar multiplication
approach.
In each step, we process one bit from k1 and one bit from k2 together. The 2-bit encoding (b1, b2) selects which combination of T and phi(T) to add:
- b1 selects between T (b1=0) and phi(T) (b1=1)
- b2 selects the sign: negative (b2=0) or positive (b2=1)
| b1 | b2 | Point added |
|---|---|---|
| 0 | 0 | -T |
| 0 | 1 | T |
| 1 | 0 | -phi(T) |
| 1 | 1 | phi(T) |
The negative points come from the y-coordinate formula: yq = (2*b2 - 1)*y_T.
When b2=0, we get -y_T, which negates the point (since -P = (x, -y) on
elliptic curves). We need both positive and negative points to encode the
scalar using a signed digit representation. With 2 bits we represent 4
distinct values {-1, +1} x {T, phi(T)}, which is more expressive than just
{0, 1} x {T, phi(T)}. This signed representation is part of what makes the
GLV method efficient - it allows the scalar decomposition to use both
positive and negative contributions.
This interleaves the bits of k1 and k2, processing one bit of each per accumulator update. Since k1 and k2 are ~128 bits, we need only ~128 updates instead of ~256, halving the circuit size.
The gate processes 4 bits per row (two consecutive accumulator updates), so a 128-bit scalar requires 32 rows of EndoMul gates.
§Protocol fit
In Kimchi/Snarky terminology, this gate enforces the EC_endoscale point-update constraint (the point side of endomorphism-optimized scaling rounds). In Pickles terms, this corresponds to endomorphism-optimized point updates used with scalar challenges.
This gate is used to implement recursive verifier IPA/bulletproof
point-folding logic efficiently (GLV endomorphism optimization), including
repeated accumulator updates of the form A <- (A + Q) + A.
Typical protocol usage:
- Wrap proofs: used as part of normal wrap-circuit verification of step proofs (part of the wrap recursion flow).
- Step proofs (recursive setting): used when the step circuit verifies
previous proofs (e.g.
max_proofs_verified > 0). - Non-recursive step circuits: not inherently required; if no recursive verification gadget is instantiated, this gate need not be active.
§Usage
To compute [scalar] * base for a 128-bit scalar:
-
Set up gates: Create 32 consecutive EndoMul gates (128 bits / 4 bits per row), followed by one Zero gate. The Zero gate is required because each EndoMul gate reads the accumulator from the next row.
-
Compute initial accumulator: To avoid point-at-infinity edge cases, initialize the accumulator as
acc0 = 2 * (T + phi(T))where T is the base point and phi is the endomorphism. -
Prepare scalar bits: Convert the scalar to bits in MSB-first order (most significant bit at index 0).
-
Generate witness: Call
gen_witnesswith the witness array, starting row, endo coefficient, base point coordinates, MSB-first bits, and initial accumulator. The function returns the final accumulated point and the reconstructed scalar value.
See kimchi/src/tests/endomul.rs for a complete example.
§Invariants
The following invariants must be respected:
-
Bit count:
bits.len()must be a multiple of 4. -
Bit order: Bits must be in MSB-first order (most significant bit at index 0).
-
Gate chain: For
nbits, you needn/4consecutive EndoMul gates, followed by a Zero gate (or any gate that doesn’t constrain the EndoMul output columns). The Zero gate is needed because EndoMul reads from the next row. -
Initial accumulator:
acc0must not be the point at infinity. The standard initialization isacc0 = 2 * (T + phi(T))where T is the base point. This ensures the accumulator never hits the point at infinity during computation. -
Endo coefficient: The
endoparameter must be the correct cube root of unity for the curve, obtained viaendos::<Curve>(). -
Base point consistency: The base point
(x_T, y_T)must be the same across all rows of a single scalar multiplication. -
Scalar value verification: The EndoMul gate only constrains the row-to-row relationship
n' = 16*n + 8*b1 + 4*b2 + 2*b3 + b4. It does not constrain the initial or final value ofn. The calling circuit must add external constraints. To enforce:- Initial
n = 0at the first EndoMul row - Final
n = kwherekis the expected scalar value
- Initial
§References
- Halo paper, Section 6.2: https://eprint.iacr.org/2019/1021
- GLV method: https://www.iacr.org/archive/crypto2001/21390189.pdf
Structs§
- Endo
MulResult - The result of performing an endomorphism-optimized scalar multiplication.
- Endoscl
Mul - Implementation of the
EndosclMulgate.
Functions§
- gen_
witness - Generates the witness values for a series of EndoMul gates.