Kimchi
 This document specifies kimchi, a zeroknowledge proof system that’s a variant of PLONK.
 This document does not specify how circuits are created or executed, but only how to convert a circuit and its execution into a proof.
Table of content:
 Overview
 Dependencies
 Constraints
 Setup
 Proof Construction & Verification
 Optimizations
 Security Considerations
Overview
There are three main algorithms to kimchi:
 Setup: takes a circuit and produces a prover index, and a verifier index.
 Proof creation: takes the prover index, and the execution trace of the circuit to produce a proof.
 Proof verification: takes the verifier index and a proof to verify.
As part of these algorithms, a number of tables are created (and then converted into polynomials) to create a proof.
Tables used to describe a circuit
The following tables are created to describe the circuit:
Gates. A circuit is described by a series of gates, that we list in a table. The columns of the tables list the gates, while the rows are the length of the circuit. For each row, only a single gate can take a value $1$ while all other gates take the value $0$.
row  Generic  Poseidon  CompleteAdd  VarBaseMul  EndoMul  EndoMulScalar  ChaCha0  ChaCha1  ChaCha2  ChaChaFinal 

0  1  0  0  0  0  0  0  0  0  0 
1  0  1  0  0  0  0  0  0  0  0 
Coefficients. The coefficient table has 15 columns, and is used to tweak the gates. Currently, only the Generic and the Poseidon gates use it (refer to their own sections to see how). All other gates set their values to $0$.
row  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

0  /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Wiring (or Permutation, or sigmas). For gates to take the outputs of other gates as inputs, we use a wiring table to wire registers together.
To learn about registers, see the next section.
It is defined at every row, but only for the first $7$ registers.
Each cell specifies a (row, column)
tuple that it should be wired to. Cells that are not connected to another cell are wired to themselves.
Note that if three or more registers are wired together, they must form a cycle.
For example, if register (0, 4)
is wired to both registers (80, 6)
and (90, 0)
then you would have the following table:
row  0  1  2  3  4  5  6 

0  0,0  0,1  0,2  0,3  80,6  0,5  0,6 
…  
80  80,0  80,1  80,2  80,3  80,4  80,5  90,0 
…  
90  0,4  90,1  90,2  90,3  90,4  90,5  90,6 
The lookup feature is currently optional, as it can add some overhead to the protocol. In the case where you would want to use lookups, the following tables would be needed:
Lookup Tables. The different lookup tables that are used in the circuit. For example, the XOR lookup table:
l  r  o 

1  0  1 
0  1  1 
1  1  0 
0  0  0 
Lookup selectors. A lookup selector is used to perform a number of queries in different lookup tables. Any gate can advertise its use of a lookup selector (so a lookup selector can be associated to several gates), and on which rows they want to use them (current and/or next). In cases where a gate need to use lookups in its current row only, and is the only one performing a specific combination of queries, then its gate selector can be used in place of a lookup selector. As with gates, lookup selectors (including gates used as lookup selectors) are mutually exclusives (only one can be used on a given row).
We currently have two lookup selectors:
row  ChaChaQuery  ChaChaFinalQuery 

0  0  0 
1  1  0 
Where each apply 4 queries. A query is a table describing which lookup table it queries, and the linear combination of the witness to use in the query. For example, the following table describes a query into the XOR table made out of linear combinations of registers (checking that $r_{0}⊕r_{2}=2⋅r_{1}$):
table_id  l  r  o 

XOR  1, r0  1, r2  2, r1 
Tables produced during proof creation
The following tables are created by the prover at runtime:
Registers (or Witness). Registers are also defined at every row, and are split into two types: the IO registers from $0$ to $6$ usually contain input or output of the gates (note that a gate can output a value on the next row as well).
I/O registers can be wired to each other (they’ll be forced to have the same value), no matter what row they’re on (for example, the register at row:0, col:4
can be wired to the register at row:80, col:6
).
The rest of the registers, $7$ through $14$, are called advice registers as they can store values that useful only for the row’s active gate.
Think of them as intermediary or temporary values needed in the computation when the prover executes a circuit.
row  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

0  /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Wiring (Permutation) trace. You can think of the permutation trace as an extra register that is used to enforce the wiring specified in the wiring table. It is a single column that applies on all the rows as well, which the prover computes as part of a proof.
row  pt 

0  / 
Queries trace. These are the actual values made by queries, calculated by the prover at runtime, and used to construct the proof.
Table trace. Represents the concatenation of all the lookup tables, combined into a single column at runtime by both the prover and the verifier.
Sorted trace. Represents the processed (see the lookup section) concatenation of the queries trace and the table trace. It is produced at runtime by the prover. The sorted trace is long enough that it is split in several columns.
Lookup (aggregation, or permutation) trace. This is a one column table that is similar to the wiring (permutation) trace we talked above. It is produced at runtime by the prover.
Dependencies
To specify kimchi, we rely on a number of primitives that are specified outside of this specification. In this section we list these specifications, as well as the interfaces we make use of in this specification.
Polynomial Commitments
Refer to the specification on polynomial commitments. We make use of the following functions from that specification:
PolyCom.non_hiding_commit(poly) > PolyCom::NonHidingCommitment
PolyCom.commit(poly) > PolyCom::HidingCommitment
PolyCom.evaluation_proof(poly, commitment, point) > EvaluationProof
PolyCom.verify(commitment, point, evaluation, evaluation_proof) > bool
Poseidon hash function
Refer to the specification on Poseidon. We make use of the following functions from that specification:
Poseidon.init(params) > FqSponge
Poseidon.update(field_elem)
Poseidon.finalize() > FieldElem
specify the following functions on top:
Poseidon.produce_challenge()
(TODO: uses the endomorphism)Poseidon.to_fr_sponge() > state_of_fq_sponge_before_eval, FrSponge
With the current parameters:
 SBox alpha:
7
 Width:
3
 Rate:
2
 Full rounds:
55
 Round constants:
fp_kimchi
,fq_kimchi
 MDS matrix:
fp_kimchi
,fq_kimchi
Pasta
Kimchi is made to work on cycles of curves, so the protocol switch between two fields Fq and Fr, where Fq represents the base field and Fr represents the scalar field.
See the Pasta curves specification.
Constraints
Kimchi enforces the correct execution of a circuit by creating a number of constraints and combining them together. In this section, we describe all the constraints that make up the main polynomial $f$ once combined.
We define the following functions:
combine_constraints(range_alpha, constraints)
, which takes a range of contiguous powers of alpha and a number of constraints. It returns the sum of all the constraints, where each constraint has been multiplied by a power of alpha. In other words it returns: $i∑ α_{i}⋅constraint_{i}$
The different ranges of alpha are described as follows:
 gates. Offset starts at 0 and 21 powers of $α$ are used
 Permutation. Offset starts at 21 and 3 powers of $α$ are used
As gates are mutually exclusive (a single gate is used on each row), we can reuse the same range of powers of alpha across all the gates.
TODO: linearization
Permutation
The permutation constraints are the following 4 constraints:
The two sides of the coin (with $shift_{0}=1$):
$ z(x)⋅zkpm(x)⋅α_{PERM0}⋅(w_{0}(x)+β⋅shift_{0}x+γ)⋅(w_{1}(x)+β⋅shift_{1}x+γ)⋅(w_{2}(x)+β⋅shift_{2}x+γ)⋅(w_{3}(x)+β⋅shift_{3}x+γ)⋅(w_{4}(x)+β⋅shift_{4}x+γ)⋅(w_{5}(x)+β⋅shift_{5}x+γ)⋅(w_{6}(x)+β⋅shift_{6}x+γ) $
and
$ −1⋅z(xω)⋅zkpm(x)⋅α_{PERM0}⋅(w_{0}(x)+β⋅σ_{0}(x)+γ)⋅(w_{1}(x)+β⋅σ_{1}(x)+γ)⋅(w_{2}(x)+β⋅σ_{2}(x)+γ)⋅(w_{3}(x)+β⋅σ_{3}(x)+γ)⋅(w_{4}(x)+β⋅σ_{4}(x)+γ)⋅(w_{5}(x)+β⋅σ_{5}(x)+γ)⋅(w_{6}(x)+β⋅σ_{6}(x)+γ)⋅ $
the initialization of the accumulator:
$(z(x)−1)L_{1}(x)α_{PERM1}$
and the accumulator’s final value:
$(z(x)−1)L_{n−k}(x)α_{PERM2}$
You can read more about why it looks like that in this post.
The quotient contribution of the permutation is split into two parts $perm$ and $bnd$. They will be used by the prover.
$perm(x)= a_{PERM0}⋅zkpl(x)⋅[z(x)⋅(w_{0}(x)+γ+x⋅β⋅shift_{0})⋅(w_{1}(x)+γ+x⋅β⋅shift_{1})⋅(w_{2}(x)+γ+x⋅β⋅shift_{2})⋅(w_{3}(x)+γ+x⋅β⋅shift_{3})⋅(w_{4}(x)+γ+x⋅β⋅shift_{4})⋅(w_{5}(x)+γ+x⋅β⋅shift_{5})⋅(w_{6}(x)+γ+x⋅β⋅shift_{6})⋅−z(x⋅w)⋅(w_{0}(x)+γ+σ_{0}⋅β)⋅(w_{1}(x)+γ+σ_{1}⋅β)⋅(w_{2}(x)+γ+σ_{2}⋅β)⋅(w_{3}(x)+γ+σ_{3}⋅β)⋅(w_{4}(x)+γ+σ_{4}⋅β)⋅(w_{5}(x)+γ+σ_{5}⋅β)⋅(w_{6}(x)+γ+σ_{6}⋅β)⋅] $
and bnd
:
$bnd(x)=a_{PERM1}⋅x−1z(x)−1 +a_{PERM2}⋅x−sid[n−k]z(x)−1 $
The linearization:
$scalar⋅σ_{6}(x)$
where $scalar$ is computed as:
$z(ζω)βα_{PERM0}zkpl(ζ)⋅(γ+βσ_{0}(ζ)+w_{0}(ζ))⋅(γ+βσ_{1}(ζ)+w_{1}(ζ))⋅(γ+βσ_{2}(ζ)+w_{2}(ζ))⋅(γ+βσ_{3}(ζ)+w_{3}(ζ))⋅(γ+βσ_{4}(ζ)+w_{4}(ζ))⋅(γ+βσ_{5}(ζ)+w_{5}(ζ))⋅ $
To compute the permutation aggregation polynomial, the prover interpolates the polynomial that has the following evaluations. The first evaluation represents the initial value of the accumulator: $z(g_{0})=1$ For $i=0,⋅,n−4$, where $n$ is the size of the domain, evaluations are computed as:
$z(g_{i+1})=z_{1}/z_{2}$
with
$z_{1}= (w_{0}(g_{i}+sid(g_{i})⋅beta⋅shift_{0}+γ)⋅(w_{1}(g_{i})+sid(g_{i})⋅beta⋅shift_{1}+γ)⋅(w_{2}(g_{i})+sid(g_{i})⋅beta⋅shift_{2}+γ)⋅(w_{3}(g_{i})+sid(g_{i})⋅beta⋅shift_{3}+γ)⋅(w_{4}(g_{i})+sid(g_{i})⋅beta⋅shift_{4}+γ)⋅(w_{5}(g_{i})+sid(g_{i})⋅beta⋅shift_{5}+γ)⋅(w_{6}(g_{i})+sid(g_{i})⋅beta⋅shift_{6}+γ) $
and
$z_{2}= (w_{0}(g_{i})+σ_{0}⋅beta+γ)⋅(w_{1}(g_{i})+σ_{1}⋅beta+γ)⋅(w_{2}(g_{i})+σ_{2}⋅beta+γ)⋅(w_{3}(g_{i})+σ_{3}⋅beta+γ)⋅(w_{4}(g_{i})+σ_{4}⋅beta+γ)⋅(w_{5}(g_{i})+σ_{5}⋅beta+γ)⋅(w_{6}(g_{i})+σ_{6}⋅beta+γ) $
If computed correctly, we should have $z(g_{n−3})=1$.
Finally, randomize the last EVAL_POINTS
evaluations $z(g_{n−2})$ and $z(g_{n−1})$,
in order to add zeroknowledge to the protocol.
Lookup
Lookups in kimchi allows you to check if a single value, or a series of values, are part of a table. The first case is useful to check for checking if a value belongs to a range (from 0 to 1,000, for example), whereas the second case is useful to check truth tables (for example, checking that three values can be found in the rows of an XOR table) or write and read from a memory vector (where one column is an index, and the other is the value stored at that index).
Similarly to the generic gate, each values taking part in a lookup can be scaled with a fixed field element.
The lookup functionality is an optin feature of kimchi that can be used by custom gates. From the user’s perspective, not using any gates that make use of lookups means that the feature will be disabled and there will be no overhead to the protocol.
Refer to the lookup RFC for an overview of the lookup feature.
In this section, we describe the tables kimchi supports, as well as the different lookup selectors (and their associated queries)
The Lookup Tables
Kimchi currently supports a single lookup table:
/// The table ID associated with the XOR lookup table.
pub const XOR_TABLE_ID: i32 = 0;
/// The range check table ID.
pub const RANGE_CHECK_TABLE_ID: i32 = 1;
XOR. The lookup table for 4bit xor.
Note that it is constructed so that (0, 0, 0)
is the last position in the table.
This is because tables are extended to the full size of a column (essentially)
by padding them with their final value. And, having the value (0, 0, 0)
here means
that when we commit to this table and use the dummy value in the lookup_sorted
columns, those entries that have the dummy value of
$0=0+j∗0+j_{2}∗0$
will translate into a scalar multiplication by 0, which is free.
The Lookup Selectors
ChaChaSelector. Performs 4 queries to the XOR lookup table.
l  r  o    l  r  o    l  r  o    l  r  o 

1, r3  1, r7  1, r11    1, r4  1, r8  1, r12    1, r5  1, r9  1, r13    1, r6  1, r10  1, r14 
ChaChaFinalSelector. Performs 4 different queries to the XOR lookup table. (TODO: specify the layout)
Producing the sorted table as the prover
Because of our ZKrows, we can’t do the trick in the plookup paper of wrapping around to enforce consistency between the sorted lookup columns.
Instead, we arrange the LookupSorted table into columns in a snakeshape.
Like so,
_ _
    
    
_ _ 
or, imagining the full sorted array is [ s0, ..., s8 ]
, like
s0 s4 s4 s8
s1 s3 s5 s7
s2 s2 s6 s6
So the direction (“increasing” or “decreasing” (relative to LookupTable) is
if i % 2 = 0 { Increasing } else { Decreasing }
Then, for each i < max_lookups_per_row
, if i % 2 = 0
, we enforce that the
last element of LookupSorted(i) = last element of LookupSorted(i + 1)
,
and if i % 2 = 1
, we enforce that
the first element of LookupSorted(i) = first element of LookupSorted(i + 1)
.
Gates
A circuit is described as a series of gates. In this section we describe the different gates currently supported by kimchi, the constraints associated to them, and the way the register table, coefficient table, and permutation can be used in conjunction.
TODO: for each gate describe how to create it?
Double Generic Gate
The double generic gate contains two generic gates.
A generic gate is simply the 2fan in gate specified in the vanilla PLONK protocol that allows us to do operations like:
 addition of two registers (into an output register)
 or multiplication of two registers
 equality of a register with a constant
More generally, the generic gate controls the coefficients $c_{i}$ in the equation:
$c_{0}⋅l+c_{1}⋅r+c_{2}⋅o+c_{3}⋅(l×r)+c_{4}$
The layout of the gate is the following:
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

l1  r1  o1  l2  r2  o2 
where l1, r1, and o1 (resp. l2, r2, o2) are the left, right, and output registers of the first (resp. second) generic gate.
The selectors are stored in the coefficient table as:
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

l1  r1  o1  m1  c1  l2  r2  o2  m2  c2 
with m1 (resp. m2) the mul selector for the first (resp. second) gate, and c1 (resp. c2) the constant selector for the first (resp. second) gate.
The constraints:
 $w_{0}⋅c_{0}+w_{1}⋅c_{1}+w_{2}⋅c_{2}+w_{0}⋅w_{1}⋅c_{3}+c_{4}$
 $w_{3}⋅c_{5}+w_{4}⋅c_{6}+w_{5}⋅c_{7}+w_{3}w_{4}c_{8}+c_{9}$
where the $c_{i}$ are the coefficients.
Poseidon
The poseidon gate encodes 5 rounds of the poseidon permutation.
A state is represents by 3 field elements. For example,
the first state is represented by (s0, s0, s0)
,
and the next state, after permutation, is represented by (s1, s1, s1)
.
Below is how we store each state in the register table:
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

s0  s0  s0  s4  s4  s4  s1  s1  s1  s2  s2  s2  s3  s3  s3 
s5  s5  s5 
The last state is stored on the next row. This last state is either used:
 with another Poseidon gate on that next row, representing the next 5 rounds.
 or with a Zero gate, and a permutation to use the output elsewhere in the circuit.
 or with another gate expecting an input of 3 field elements in its first registers.
As some of the poseidon hash variants might not use $5k$ rounds (for some $k$), the result of the 4th round is stored directly after the initial state. This makes that state accessible to the permutation.
We define $M_{r,c}$ as the MDS matrix at row $r$ and column $c$.
We define the Sbox operation as $w_{S}$ for $S$ the SPONGE_BOX
constant.
We store the 15 round constants $r_{i}$ required for the 5 rounds (3 per round) in the coefficient table:
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

r0  r1  r2  r3  r4  r5  r6  r7  r8  r9  r10  r11  r12  r13  r14 
The initial state, stored in the first three registers, are not constrained. The following 4 states (of 3 field elements), including 1 in the next row, are constrained to represent the 5 rounds of permutation. Each of the associated 15 registers is associated to a constraint, calculated as:
first round:
 $w_{6}−[r_{0}+(M_{0,0}w_{0}+M_{0,1}w_{1}+M_{0,2}w_{2})]$
 $w_{7}−[r_{1}+(M_{1,0}w_{0}+M_{1,1}w_{1}+M_{1,2}w_{2})]$
 $w_{8}−[r_{2}+(M_{2,0}w_{0}+M_{2,1}w_{1}+M_{2,2}w_{2})]$
second round:
 $w_{9}−[r_{3}+(M_{0,0}w_{6}+M_{0,1}w_{7}+M_{0,2}w_{8})]$
 $w_{10}−[r_{4}+(M_{1,0}w_{6}+M_{1,1}w_{7}+M_{1,2}w_{8})]$
 $w_{11}−[r_{5}+(M_{2,0}w_{6}+M_{2,1}w_{7}+M_{2,2}w_{8})]$
third round:
 $w_{12}−[r_{6}+(M_{0,0}w_{9}+M_{0,1}w_{10}+M_{0,2}w_{11})]$
 $w_{13}−[r_{7}+(M_{1,0}w_{9}+M_{1,1}w_{10}+M_{1,2}w_{11})]$
 $w_{14}−[r_{8}+(M_{2,0}w_{9}+M_{2,1}w_{10}+M_{2,2}w_{11})]$
fourth round:
 $w_{3}−[r_{9}+(M_{0,0}w_{12}+M_{0,1}w_{13}+M_{0,2}w_{14})]$
 $w_{4}−[r_{10}+(M_{1,0}w_{12}+M_{1,1}w_{13}+M_{1,2}w_{14})]$
 $w_{5}−[r_{11}+(M_{2,0}w_{12}+M_{2,1}w_{13}+M_{2,2}w_{14})]$
fifth round:
 $w_{0,next}−[r_{12}+(M_{0,0}w_{3}+M_{0,1}w_{4}+M_{0,2}w_{5})]$
 $w_{1,next}−[r_{13}+(M_{1,0}w_{3}+M_{1,1}w_{4}+M_{1,2}w_{5})]$
 $w_{2,next}−[r_{14}+(M_{2,0}w_{3}+M_{2,1}w_{4}+M_{2,2}w_{5})]$
where $w_{i,next}$ is the polynomial $w_{i}(ωx)$ which points to the next row.
Chacha
There are four chacha constraint types, corresponding to the four lines in each quarter round.
a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<= 8;
c += d; b ^= c; b <<<= 7;
or, written without mutation, (and where +
is mod $2_{3}2$),
a' = a + b ; d' = (d ⊕ a') <<< 16;
c' = c + d'; b' = (b ⊕ c') <<< 12;
a'' = a' + b'; d'' = (d' ⊕ a') <<< 8;
c'' = c' + d''; b'' = (c'' ⊕ b') <<< 7;
We lay each line as two rows.
Each line has the form
x += z; y ^= x; y <<<= k
or without mutation,
x' = x + z; y' = (y ⊕ x') <<< k
which we abbreviate as
L(x, x’, y, y’, z, k)
In general, such a line will be laid out as the two rows
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

x  y  z  (y^x’)_0  (y^x’)_1  (y^x’)_2  (y^x’)_3  (x+z)_0  (x+z)_1  (x+z)_2  (x+z)_3  y_0  y_1  y_2  y_3 
x’  y’  (x+z)_8  (y^x’)_4  (y^x’)_5  (y^x’)_6  (y^x’)_7  (x+z)_4  (x+z)_5  (x+z)_6  (x+z)_7  y_4  y_5  y_6  y_7 
where A_i indicates the i^th nybble (fourbit chunk) of the value A.
$(x+z)_{8}$ is special, since we know it is actually at most 1 bit (representing the overflow bit of x + z).
So the first line L(a, a', d, d', b, 8)
for example becomes the two rows
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

a  d  b  (d^a’)_0  (d^a’)_1  (d^a’)_2  (d^a’)_3  (a+b)_0  (a+b)_1  (a+b)_2  (a+b)_3  d_0  d_1  d_2  d_3 
a’  d’  (a+b)_8  (d^a’)_4  (d^a’)_5  (d^a’)_6  (d^a’)_7  (a+b)_4  (a+b)_5  (a+b)_6  (a+b)_7  d_4  d_5  d_6  d_7 
along with the equations
 $(a+b)_{8}=(a+b)_{8}$ (booleanity check)
 $a_{′}=∑_{i=0}(2_{4})_{i}(a+b)_{i}$
 $a+b=2_{32}(a+b)_{8}+a_{′}$
 $d=∑_{i=0}(2_{4})_{i}d_{i}$
 $d_{′}=∑_{i=0}(2_{4})_{(i+4)mod8}(a+b)_{i}$
The $(i+4)mod8$ rotates the nybbles left by 4, which means bitrotating by $4×4=16$ as desired.
The final line is a bit more complicated as we have to rotate by 7, which is not a multiple of 4. We accomplish this as follows.
Let’s say we want to rotate the nybbles $A_{0},⋯,A_{7}$ left by 7. First we’ll rotate left by 4 to get
$A_{7},A_{0},A_{1},⋯,A_{6}$
Rename these as $B_{0},⋯,B_{7}$
We now want to leftrotate each $B_{i}$ by 3.
Let $b_{i}$ be the low bit of $B_{i}$. Then, the low 3 bits of $B_{i}$ are $(B_{i}−b_{i})/2$.
The result will thus be
 $2_{3}b_{0}+(B_{7}−b_{7})/2$
 $2_{3}b_{1}+(B_{0}−b_{0})/2$
 $2_{3}b_{2}+(B_{1}−b_{1})/2$
 $⋯$
 $2_{3}b_{7}+(B_{6}−b_{6})/2$
or rewriting in terms of our original nybbles $A_{i}$,
 $2_{3}a_{7}+(A_{6}−a_{6})/2$
 $2_{3}a_{0}+(A_{7}−a_{7})/2$
 $2_{3}a_{1}+(A_{0}−a_{0})/2$
 $2_{3}a_{2}+(A_{1}−a_{1})/2$
 $2_{3}a_{3}+(A_{2}−a_{2})/2$
 $2_{3}a_{4}+(A_{3}−a_{3})/2$
 $2_{3}a_{5}+(A_{4}−a_{4})/2$
 $2_{3}a_{6}+(A_{5}−a_{5})/2$
For neatness, letting $(x,y,z)=(c_{′},b_{′},d_{′′})$, the first 2 rows for the final line will be:
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

x  y  z  (y^x’)_0  (y^x’)_1  (y^x’)_2  (y^x’)_3  (x+z)_0  (x+z)_1  (x+z)_2  (x+z)_3  y_0  y_1  y_2  y_3 
x’  _  (x+z)_8  (y^x’)_4  (y^x’)_5  (y^x’)_6  (y^x’)_7  (x+z)_4  (x+z)_5  (x+z)_6  (x+z)_7  y_4  y_5  y_6  y_7 
but then we also need to perform the bitrotate by 1.
For this we’ll add an additional 2 rows. It’s probably possible to do it with just 1, but I think we’d have to change our plookup setup somehow, or maybe expand the number of columns, or allow access to the previous row.
Let $lo(n)$ be the low bit of the nybble n. The 2 rows will be
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

y’  (y^x’)_0  (y^x’)_1  (y^x’)_2  (y^x’)_3  lo((y^x’)_0)  lo((y^x’)_1)  lo((y^x’)_2)  lo((y^x’)_3)  
_  (y^x’)_4  (y^x’)_5  (y^x’)_6  (y^x’)_7  lo((y^x’)_4)  lo((y^x’)_5)  lo((y^x’)_6)  lo((y^x’)_7) 
On each of them we’ll do the plookups
((cols[1]  cols[5])/2, (cols[1]  cols[5])/2, 0) in XOR
((cols[2]  cols[6])/2, (cols[2]  cols[6])/2, 0) in XOR
((cols[3]  cols[7])/2, (cols[3]  cols[7])/2, 0) in XOR
((cols[4]  cols[8])/2, (cols[4]  cols[8])/2, 0) in XOR
which checks that $(y_{x_{′}})_{i}−lo((y_{x_{′}})_{i})$ is a nybble, which guarantees that the low bit is computed correctly.
There is no need to check nybbleness of (y^x’)_i because those will be constrained to be equal to the copies of those values from previous rows, which have already been constrained for nybbleness (by the lookup in the XOR table).
And we’ll check that y’ is the sum of the shifted nybbles.
Elliptic Curve Addition
The layout is
0  1  2  3  4  5  6  7  8  9  10 

x1  y1  x2  y2  x3  y3  inf  same_x  s  inf_z  x21_inv 
where
(x1, y1), (x2, y2)
are the inputs and(x3, y3)
the output.inf
is a boolean that is true iff the result (x3, y3) is the point at infinity.
The rest of the values are inaccessible from the permutation argument, but
same_x
is a boolean that is true iffx1 == x2
.
The following constraints are generated:
constraint 1:
 $x_{0}=w_{2}−w_{0}$
 $(w_{10}⋅x_{0}−F(1)−w_{7})$
constraint 2:
 $x_{0}=w_{2}−w_{0}$
 $w_{7}⋅x_{0}$
constraint 3:
 $x_{0}=w_{2}−w_{0}$
 $x_{1}=w_{3}−w_{1}$
 $x_{2}=w_{0}⋅w_{0}$
 $w_{7}⋅(2⋅w_{8}⋅w_{1}−2⋅x_{2}−x_{2})+(F(1)−w_{7})⋅(x_{0}⋅w_{8}−x_{1})$
constraint 4:
 $w_{0}+w_{2}+w_{4}−w_{8}⋅w_{8}$
constraint 5:
 $w_{8}⋅(w_{0}−w_{4})−w_{1}−w_{5}$
constraint 6:
 $x_{1}=w_{3}−w_{1}$
 $x_{1}⋅(w_{7}−w_{6})$
constraint 7:
 $x_{1}=w_{3}−w_{1}$
 $x_{1}⋅w_{9}−w_{6}$
Endo Scalar
We give constraints for the endomul scalar computation.
Each row corresponds to 8 iterations of the inner loop in “Algorithm 2” on page 29 of the Halo paper.
The state of the algorithm that’s updated across iterations of the loop is (a, b)
.
It’s clear from that description of the algorithm that an iteration of the loop can
be written as
(a, b, i) >
( 2 * a + c_func(r_{2 * i}, r_{2 * i + 1}),
2 * b + d_func(r_{2 * i}, r_{2 * i + 1}) )
for some functions c_func
and d_func
. If one works out what these functions are on
every input (thinking of a twobit input as a number in ${0,1,2,3}$), one finds they
are given by
c_func(x)
, defined by
c_func(0) = 0
c_func(1) = 0
c_func(2) = 1
c_func(3) = 1
d_func(x)
, defined by
d_func(0) = 1
d_func(1) = 1
d_func(2) = 0
d_func(3) = 0
One can then interpolate to find polynomials that implement these functions on ${0,1,2,3}$.
You can use sage
, as
R = PolynomialRing(QQ, 'x')
c_func = R.lagrange_polynomial([(0, 0), (1, 0), (2, 1), (3, 1)])
d_func = R.lagrange_polynomial([(0, 1), (1, 1), (2, 0), (3, 0)])
Then, c_func
is given by
2/3 * x^3  5/2 * x^2 + 11/6 * x
and d_func
is given by
2/3 * x^3  7/2 * x^2 + 29/6 * x  1 <=> c_func + (x^2 + 3x  1)
We lay it out the witness as
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  Type 

n0  n8  a0  b0  a8  b8  x0  x1  x2  x3  x4  x5  x6  x7  ENDO 
where each xi
is a twobit “crumb”.
We also use a polynomial to check that each xi
is indeed in ${0,1,2,3}$,
which can be done by checking that each $x_{i}$ is a root of the polyunomial below:
crumb(x)
= x (x  1) (x  2) (x  3)
= x^4  6*x^3 + 11*x^2  6*x
= x *(x^3  6*x^2 + 11*x  6)
Each iteration performs the following computations
 Update $n$: $n_{i+1}=2⋅n_{i}+x_{i}$
 Update $a$: $a_{i+1}=2⋅a_{i}+c_{i}$
 Update $b$: $b_{i+1}=2⋅b_{i}+d_{i}$
Then, after the 8 iterations, we compute expected values of the above operations as:
expected_n8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * n0 + x0) + x1 ) + x2 ) + x3 ) + x4 ) + x5 ) + x6 ) + x7
expected_a8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * a0 + c0) + c1 ) + c2 ) + c3 ) + c4 ) + c5 ) + c6 ) + c7
expected_b8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * b0 + d0) + d1 ) + d2 ) + d3 ) + d4 ) + d5 ) + d6 ) + d7
Putting together all of the above, these are the 11 constraints for this gate
 Checking values after the 8 iterations:
 Constrain $n$:
0 = expected_n8  n8
 Constrain $a$:
0 = expected_a8  a8
 Constrain $b$:
0 = expected_b8  b8
 Constrain $n$:
 Checking the crumbs, meaning each $x$ is indeed in the range ${0,1,2,3}$:
 Constrain $x_{0}$:
0 = x0 * ( x0^3  6 * x0^2 + 11 * x0  6 )
 Constrain $x_{1}$:
0 = x1 * ( x1^3  6 * x1^2 + 11 * x1  6 )
 Constrain $x_{2}$:
0 = x2 * ( x2^3  6 * x2^2 + 11 * x2  6 )
 Constrain $x_{3}$:
0 = x3 * ( x3^3  6 * x3^2 + 11 * x3  6 )
 Constrain $x_{4}$:
0 = x4 * ( x4^3  6 * x4^2 + 11 * x4  6 )
 Constrain $x_{5}$:
0 = x5 * ( x5^3  6 * x5^2 + 11 * x5  6 )
 Constrain $x_{6}$:
0 = x6 * ( x6^3  6 * x6^2 + 11 * x6  6 )
 Constrain $x_{7}$:
0 = x7 * ( x7^3  6 * x7^2 + 11 * x7  6 )
 Constrain $x_{0}$:
Endo Scalar Multiplication
We implement custom gate constraints for short Weierstrass curve endomorphism optimised variable base scalar multiplication.
Given a finite field $F_{q}$ of order $q$, if the order is not a multiple of 2 nor 3, then an elliptic curve over $F_{q}$ in short Weierstrass form is represented by the set of points $(x,y)$ that satisfy the following equation with $a,b∈F_{q}$ and $4a_{3}+27b_{2}=_{F_{q}}0$: $E(F_{q}):y_{2}=x_{3}+ax+b$ If $P=(x_{p},y_{p})$ and $T=(x_{t},y_{t})$ are two points in the curve $E(F_{q})$, the goal of this operation is to perform the operation $2P±T$ efficiently as $(P±T)+P$.
S = (P + (b ? T : −T)) + P
The same algorithm can be used to perform other scalar multiplications, meaning it is not restricted to the case $2⋅P$, but it can be used for any arbitrary $k⋅P$. This is done by decomposing the scalar $k$ into its binary representation. Moreover, for every step, there will be a onebit constraint meant to differentiate between addition and subtraction for the operation $(P±T)+P$:
In particular, the constraints of this gate take care of 4 bits of the scalar within a single EVBSM row. When the scalar is longer (which will usually be the case), multiple EVBSM rows will be concatenated.
Row  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  Type 

i  xT  yT  Ø  Ø  xP  yP  n  xR  yR  s1  s3  b1  b2  b3  b4  EVBSM 
i+1  =  =  xS  yS  n’  xR’  yR’  s1’  s3’  b1’  b2’  b3’  b4’  EVBSM 
The layout of this gate (and the next row) allows for this chained behavior where the output point of the current row $S$ gets accumulated as one of the inputs of the following row, becoming $P$ in the next constraints. Similarly, the scalar is decomposed into binary form and $n$ ($n_{′}$ respectively) will store the current accumulated value and the next one for the check.
For readability, we define the following variables for the constraints:
endo
$:=$EndoCoefficient
xq1
$:=(1+($endo
$−1)⋅b_{1})⋅x_{t}$xq2
$:=(1+($endo
$−1)⋅b_{3})⋅x_{t}$yq1
$:=(2⋅b_{2}−1)⋅y_{t}$yq2
$:=(2⋅b_{4}−1)⋅y_{t}$
These are the 11 constraints that correspond to each EVBSM gate, which take care of 4 bits of the scalar within a single EVBSM row:
 First block:
(xq1  xp) * s1 = yq1  yp
(2 * xp – s1^2 + xq1) * ((xp – xr) * s1 + yr + yp) = (xp – xr) * 2 * yp
(yr + yp)^2 = (xp – xr)^2 * (s1^2 – xq1 + xr)
 Second block:
(xq2  xr) * s3 = yq2  yr
(2*xr – s3^2 + xq2) * ((xr – xs) * s3 + ys + yr) = (xr – xs) * 2 * yr
(ys + yr)^2 = (xr – xs)^2 * (s3^2 – xq2 + xs)
 Booleanity checks:
 Bit flag $b_{1}$:
0 = b1 * (b1  1)
 Bit flag $b_{2}$:
0 = b2 * (b2  1)
 Bit flag $b_{3}$:
0 = b3 * (b3  1)
 Bit flag $b_{4}$:
0 = b4 * (b4  1)
 Bit flag $b_{1}$:
 Binary decomposition:
 Accumulated scalar:
n_next = 16 * n + 8 * b1 + 4 * b2 + 2 * b3 + b4
 Accumulated scalar:
The constraints above are derived from the following EC Affine arithmetic equations:
 (1) => $(x_{q_{1}}−x_{p})⋅s_{1}=y_{q_{1}}−y_{p}$
 (2&3) => $(x_{p}–x_{r})⋅s_{2}=y_{r}+y_{p}$
 (2) => $(2⋅x_{p}+x_{q_{1}}–s_{1})⋅(s_{1}+s_{2})=2⋅y_{p}$
 <=> $(2⋅x_{p}–s_{1}+x_{q_{1}})⋅((x_{p}–x_{r})⋅s_{1}+y_{r}+y_{p})=(x_{p}–x_{r})⋅2⋅y_{p}$
 (3) => $s_{1}−s_{2}=x_{q_{1}}−x_{r}$
 <=> $(y_{r}+y_{p})_{2}=(x_{p}–x_{r})_{2}⋅(s_{1}–x_{q_{1}}+x_{r})$
 (4) => $(x_{q_{2}}−x_{r})⋅s_{3}=y_{q_{2}}−y_{r}$
 (5&6) => $(x_{r}–x_{s})⋅s_{4}=y_{s}+y_{r}$
 (5) => $(2⋅x_{r}+x_{q_{2}}–s_{3})⋅(s_{3}+s_{4})=2⋅y_{r}$
 <=> $(2⋅x_{r}–s_{3}+x_{q_{2}})⋅((x_{r}–x_{s})⋅s_{3}+y_{s}+y_{r})=(x_{r}–x_{s})⋅2⋅y_{r}$
 (6) => $s_{3}–s_{4}=x_{q_{2}}−x_{s}$
 <=> $(y_{s}+y_{r})_{2}=(x_{r}–x_{s})_{2}⋅(s_{3}–x_{q_{2}}+x_{s})$
Defining $s_{2}$ and $s_{4}$ as
 $s_{2}:=2∗x_{P}+x_{T}−s_{1}2⋅y_{P} −s_{1}$
 $s_{4}:=2∗x_{R}+x_{T}−s_{3}2⋅y_{R} −s_{3}$
Gives the following equations when substituting the values of $s_{2}$ and $s_{4}$:
(xq1  xp) * s1 = (2 * b1  1) * yt  yp
(2 * xp – s1^2 + xq1) * ((xp – xr) * s1 + yr + yp) = (xp – xr) * 2 * yp
(yr + yp)^2 = (xp – xr)^2 * (s1^2 – xq1 + xr)
(xq2  xr) * s3 = (2 * b2  1) * yt  yr
(2 * xr – s3^2 + xq2) * ((xr – xs) * s3 + ys + yr) = (xr – xs) * 2 * yr
(ys + yr)^2 = (xr – xs)^2 * (s3^2 – xq2 + xs)
Scalar Multiplication
We implement custom Plonk constraints for short Weierstrass curve variable base scalar multiplication.
Given a finite field $F_{q}$ of order $q$, if the order is not a multiple of 2 nor 3, then an elliptic curve over $F_{q}$ in short Weierstrass form is represented by the set of points $(x,y)$ that satisfy the following equation with $a,b∈F_{q}$ and $4a_{3}+27b_{2}=_{F_{q}}0$: $E(F_{q}):y_{2}=x_{3}+ax+b$ If $P=(x_{p},y_{p})$ and $Q=(x_{q},y_{q})$ are two points in the curve $E(F_{q})$, the algorithm we represent here computes the operation $2P+Q$ (point doubling and point addition) as $(P+Q)+Q$.
The original algorithm that is being used can be found in the Section 3.1 of https://arxiv.org/pdf/math/0208038.pdf, which can perform the above operation using 1 multiplication, 2 squarings and 2 divisions (one more squaring) if $P=Q$), thanks to the fact that computing the $Y$coordinate of the intermediate addition is not required. This is more efficient to the standard algorithm that requires 1 more multiplication, 3 squarings in total and 2 divisions.
Moreover, this algorithm can be applied not only to the operation $2P+Q$, but any other scalar multiplication $kP$. This can be done by expressing the scalar $k$ in biwise form and performing a doubleandadd approach. Nonetheless, this requires conditionals to differentiate $2P$ from $2P+Q$. For that reason, we will implement the following pseudocode from https://github.com/zcash/zcash/issues/3924 (where instead, they give a variant of the above efficient algorithm for Montgomery curves $b⋅y_{2}=x_{3}+a⋅x_{2}+x$).
Acc := [2]T
for i = n1 ... 0:
Q := (r_i == 1) ? T : T
Acc := Acc + (Q + Acc)
return (d_0 == 0) ? Q  P : Q
The layout of the witness requires 2 rows.
The ith row will be a VBSM
gate whereas the next row will be a ZERO
gate.
Row  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  Type 

i  xT  yT  x0  y0  n  n’  x1  y1  x2  y2  x3  y3  x4  y4  VBSM  
i+1  x5  y5  b0  b1  b2  b3  b4  s0  s1  s2  s3  s4  ZERO 
The gate constraints take care of 5 bits of the scalar multiplication.
Each single bit consists of 4 constraints.
There is one additional constraint imposed on the final number.
Thus, the VarBaseMul
gate argument requires 21 constraints.
For every bit, there will be one constraint meant to differentiate between addition and subtraction for the operation $(P±T)+P$:
S = (P + (b ? T : −T)) + P
We follow this criteria:
 If the bit is positive, the sign should be a subtraction
 If the bit is negative, the sign should be an addition
Then, paraphrasing the above, we will represent this behavior as:
S = (P  (2 * b  1) * T ) + P
Let us call Input
the point with coordinates (xI, yI)
and
Target
is the point being added with coordinates (xT, yT)
.
Then Output
will be the point with coordinates (xO, yO)
resulting from O = ( I ± T ) + I
Do not confuse our Output
point (xO, yO)
with the point at infinity that is normally represented as $O$.
In each step of the algorithm, we consider the following elliptic curves affine arithmetic equations:
 $s_{1}:=x_{i}−x_{t}y_{i}−(2⋅b−1)⋅y_{t} $
 $s_{2}:=2∗x_{i}+x_{t}−s_{1}2⋅y_{i} −s_{1}$
 $x_{o}:=x_{t}+s_{2}−s_{1}$
 $y_{o}:=s_{2}⋅(x_{i}−x_{o})−y_{i}$
For readability, we define the following 3 variables
in such a way that $s_{2}$ can be expressed as u / t
:
rx
$:=s_{1}−x_{i}−x_{t}$t
$:=x_{i}−$rx
$⟺2⋅x_{i}−s_{1}+x_{t}$u
$:=2⋅y_{i}−$t
$⋅s_{1}⟺2⋅y_{i}−s_{1}⋅(2⋅x_{i}−s_{1}+x_{t})$
Next, for each bit in the algorithm, we create the following 4 constraints that derive from the above:
 Booleanity check on the bit $b$:
0 = b * b  b
 Constrain $s_{1}$:
(xI  xT) * s1 = yI – (2b  1) * yT
 Constrain
Output
$X$coordinate $x_{o}$ and $s_{2}$:0 = u^2  t^2 * (xO  xT + s1^2)
 Constrain
Output
$Y$coordinate $y_{o}$ and $s_{2}$:0 = (yO + yI) * t  (xI  xO) * u
When applied to the 5 bits, the value of the Target
point (xT, yT)
is maintained,
whereas the values for the Input
and Output
points form the chain:
[(x0, y0) > (x1, y1) > (x2, y2) > (x3, y3) > (x4, y4) > (x5, y5)]
Similarly, 5 different s0..s4
are required, just like the 5 bits b0..b4
.
Finally, the additional constraint makes sure that the scalar is being correctly expressed into its binary form (using the doubleandadd decomposition) as: $n_{′}=2_{5}⋅n+2_{4}⋅b_{0}+2_{3}⋅b_{1}+2_{2}⋅b_{2}+2_{1}⋅b_{3}+b_{4}$ This equation is translated as the constraint:
 Binary decomposition:
0 = n'  (b4 + 2 * (b3 + 2 * (b2 + 2 * (b1 + 2 * (b0 + 2*n)))))
Range Check
The range check gadget is comprised of three circuit gates (RangeCheck0
, RangeCheck1
and Zero
) and can perform range checks on three values of up to 88 bits: $v_{0},v_{1}$ and $v_{2}$.
Optionally, RangeCheck0
can be used on its own to perform 64bit range checks by
constraining witness cells 12 to zero.
Byteorder:
 Each cell value is in littleendian byte order
 Limbs are mapped to columns in bigendian order (i.e. the lowest columns contain the highest bits)
 We also have the highest bits covered by copy constraints and plookups, so that we can copy the highest two constraints to zero and get a 64bit lookup, which are envisioned to be a common case
The values are decomposed into limbs as follows.
L
is a 12bit lookup (or copy) limb,C
is a 2bit “crumb” limb (we call half a nybble a crumb).
<6> <8>
v0 = L L L L L L C C C C C C C C
v1 = L L L L L L C C C C C C C C
<2> <4> <18>
v2 = C C L L L L C C C C C C C C C C C C C C C C C C
Witness structure:
Row  Contents 

0  $v_{0}$ 
1  $v_{1}$ 
2  $v_{2}$ 
3  $v_{0},v_{1},v_{2}$ 
 The first 2 rows contain $v_{0}$ and $v_{1}$ and their respective decompositions into 12bit and 2bit limbs
 The 3rd row contains $v_{2}$ and part of its decomposition: four 12bit limbs and the 1st 10 crumbs
 The final row contains $v_{0}$’s and $v_{1}$’s 5th and 6th 12bit limbs as well as the remaining 10 crumbs of $v_{2}$
Because we are constrained to 4 lookups per row, we are forced to postpone some lookups of v0 and v1 to the final row.
Constraints:
For efficiency, the limbs are constrained differently according to their type.
 12bit limbs are constrained with plookups
 2bit crumbs are constrained with degree4 constraints $x(x−1)(x−2)(x−3)$
Layout:
This is how the three 88bit inputs $v_{0},v_{1}$ and $v_{2}$ are layed out and constrained.
vipj
is the jth 12bit limb of value $v_{i}$vicj
is the jth 2bit crumb limb of value $v_{i}$
Gates  RangeCheck0  RangeCheck0  RangeCheck1  Zero 

Rows  0  1  2  3 
Cols  
0  v0  v1  v2  0 
MS:1  copy v0p0  copy v1p0  crumb v2c0  crumb v2c10 
2  copy v0p1  copy v1p1  crumb v2c1  crumb v2c11 
3  plookup v0p2  plookup v1p2  plookup v2p0  plookup v0p0 
4  plookup v0p3  plookup v1p3  plookup v2p1  plookup v0p1 
5  plookup v0p4  plookup v1p4  plookup v2p2  plookup v1p0 
6  plookup v0p5  plookup v1p5  plookup v2p3  plookup v1p1 
7  crumb v0c0  crumb v1c0  crumb v2c2  crumb v2c12 
8  crumb v0c1  crumb v1c1  crumb v2c3  crumb v2c13 
9  crumb v0c2  crumb v1c2  crumb v2c4  crumb v2c14 
10  crumb v0c3  crumb v1c3  crumb v2c5  crumb v2c15 
11  crumb v0c4  crumb v1c4  crumb v2c6  crumb v2c16 
12  crumb v0c5  crumb v1c5  crumb v2c7  crumb v2c17 
13  crumb v0p6  crumb v1c6  crumb v2c8  crumb v2c18 
LS:14  crumb v0p7  crumb v1c7  crumb v2c9  crumb v2c19 
The 12bit chunks are constrained with plookups and the 2bit crumbs are constrained with degree4 constraints of the form $x(x−1)(x−2)(x−3)$.
Note that copy denotes a plookup that is deferred to the 4th gate (i.e. Zero
).
This is because of the limitation that we have at most 4 lookups per row.
The copies are constrained using the permutation argument.
Gate types:
Different rows are constrained using different CircuitGate
types
Row  CircuitGate  Purpose 

0  RangeCheck0  Partially constrain $v_{0}$ 
1  RangeCheck0  Partially constrain $v_{1}$ 
2  RangeCheck1  Fully constrain $v_{2}$ (and trigger plookups constraints on row 3) 
3  Zero  Complete the constraining of $v_{0}$ and $v_{1}$ using lookups 
Each CircuitGate type corresponds to a unique polynomial and thus is assigned its own unique powers of alpha
 This circuit gate is used to partially constrain values $v_{0}$ and $v_{1}$
 Optionally, it can be used on its own as a single 64bit range check by constraining columns 1 and 2 to zero
 The rest of $v_{0}$ and $v_{1}$ are constrained by the lookups in the
Zero
gate row  This gate operates on the
Curr
row
It uses three different types of constraints
 copy  copy to another cell (12bits)
 plookup  plookup (12bits)
 crumb  degree4 constraint (2bits)
Given value v
the layout looks like this
Column  Curr 

0  v 
1  copy vp0 
2  copy vp1 
3  plookup vp2 
4  plookup vp3 
5  plookup vp4 
6  plookup vp5 
7  crumb vc0 
8  crumb vc1 
9  crumb vc2 
10  crumb vc3 
11  crumb vc4 
12  crumb vc5 
13  crumb vc6 
14  crumb vc7 
where the notation vpi
and vci
defined in the “Layout” section above.
RangeCheck1
 Range check constraints
 This circuit gate is used to fully constrain $v_{2}$
 It operates on the
Curr
andNext
rows
It uses two different types of constraints
 plookup  plookup (12bits)
 crumb  degree4 constraint (2bits)
Given value v2
the layout looks like this
Column  Curr  Next 

0  v2  (ignored) 
1  crumb v2c0  crumb v2c10 
2  crumb v2c1  crumb v2c11 
3  plookup v2p0  (ignored) 
4  plookup v2p1  (ignored) 
5  plookup v2p2  (ignored) 
6  plookup v2p3  (ignored) 
7  crumb v2c2  crumb v2c12 
8  crumb v2c3  crumb v2c13 
9  crumb v2c4  crumb v2c14 
10  crumb v2c5  crumb v2c15 
11  crumb v2c6  crumb v2c16 
12  crumb v2c7  crumb v2c17 
13  crumb v2c8  crumb v2c18 
14  crumb v2c9  crumb v2c19 
where the notation v2ci
and v2pi
defined in the “Layout” section above.
Setup
In this section we specify the setup that goes into creating two indexes from a circuit:
 A prover index, necessary for the prover to to create proofs.
 A verifier index, necessary for the verifier to verify proofs.
The circuit creation part is not specified in this document. It might be specified in a separate document, or we might want to specify how to create the circuit description tables.
As such, the transformation of a circuit into these two indexes can be seen as a compilation step. Note that the prover still needs access to the original circuit to create proofs, as they need to execute it to create the witness (register table).
Common Index
In this section we describe data that both the prover and the verifier index share.
URS
(Uniform Reference String) The URS is a set of parameters that is generated once, and shared between the prover and the verifier.
It is used for polynomial commitments, so refer to the polycommitment specification for more details.
Kimchi currently generates the URS based on the circuit, and attach it to the index. So each circuit can potentially be accompanied with a different URS. On the other hand, Mina reuses the same URS for multiple circuits (see zkapps for more details).
Domain
. A domain large enough to contain the circuit and the zeroknowledge rows (used to provide zeroknowledge to the protocol). Specifically, the smallest subgroup in our field that has order greater or equal to n + ZK_ROWS
, with n
is the number of gates in the circuit.
TODO: what if the domain is larger than the URS?
Note that in this specification we always assume that the first element of a domain is $1$.
Shifts
. As part of the permutation, we need to create PERMUTS
shifts.
To do that, the following logic is followed (in pseudo code):
(TODO: move shift creation within the permutation section?)
shifts[0] = 1 # first shift is identity
for i in 0..7: # generate 7 shifts
i = 7
shift, i = sample(domain, i)
while shifts.contains(shift) do:
shift, i = sample(domain, i)
shift[i] = shift
def sample(domain, i):
i += 1
shift = Field(Blake2b512(to_be_bytes(i)))
while is_not_quadratic_non_residue(shift)  domain.contains(shift):
i += 1
shift = Field(Blake2b512(to_be_bytes(i)))
return shift, i
Public
. This variable simply contains the number of public inputs. (TODO: actually, it’s not contained in the verifier index)
The compilation steps to create the common index are as follow:
 If the circuit is less than 2 gates, abort.
 Create a domain for the circuit. That is,
compute the smallest subgroup of the field that
has order greater or equal to
n + ZK_ROWS
elements.  Pad the circuit: add zero gates to reach the domain size.
 sample the
PERMUTS
shifts.
Lookup Index
If lookup is used, the following values are added to the common index:
LookupSelectors
. The list of lookup selectors used. In practice, this tells you which lookup tables are used.
TableIds
. This is a list of table ids used by the Lookup gate.
MaxJointSize
. This is the maximum number of columns appearing in the lookup tables used by the lookup selectors. For example, the XOR lookup has 3 columns.
To create the index, follow these steps:

If no lookup is used in the circuit, do not create a lookup index

Get the lookup selectors and lookup tables (TODO: how?)

Concatenate runtime lookup tables with the ones used by gates

Get the highest number of columns
max_table_width
that a lookup table can have. 
Create the concatenated table of all the fixed lookup tables. It will be of height the size of the domain, and of width the maximum width of any of the lookup tables. In addition, create an additional column to store all the tables’ table IDs.
For example, if you have a table with ID 0
1 2 3 5 6 7 0 0 0 and another table with ID 1
8 9 the concatenated table in a domain of size 5 looks like this:
1 2 3 5 6 7 0 0 0 8 9 0 0 0 0 with the table id vector:
table id 0 0 0 1 0 To do this, for each table:
 Update the corresponding entries in a table id vector (of size the domain as well) with the table ID of the table.
 Copy the entries from the table to new rows in the corresponding columns of the concatenated table.
 Fill in any unused columns with 0 (to match the dummy value)

Pad the end of the concatened table with the dummy value.

Pad the end of the table id vector with 0s.

precompute polynomial and evaluation form for the look up tables

precompute polynomial and evaluation form for the table IDs, only if a table with an ID different from zero was used.
Prover Index
Both the prover and the verifier index, besides the common parts described above, are made out of precomputations which can be used to speed up the protocol. These precomputations are optimizations, in the context of normal proofs, but they are necessary for recursion.
pub struct ProverIndex<G: KimchiCurve> {
/// constraints system polynomials
#[serde(bound = "ConstraintSystem<G::ScalarField>: Serialize + DeserializeOwned")]
pub cs: ConstraintSystem<G::ScalarField>,
/// The symbolic linearization of our circuit, which can compile to concrete types once certain values are learned in the protocol.
#[serde(skip)]
pub linearization: Linearization<Vec<PolishToken<G::ScalarField>>>,
/// The mapping between powers of alpha and constraints
#[serde(skip)]
pub powers_of_alpha: Alphas<G::ScalarField>,
/// polynomial commitment keys
#[serde(skip)]
pub srs: Arc<SRS<G>>,
/// maximal size of polynomial section
pub max_poly_size: usize,
/// maximal size of the quotient polynomial according to the supported constraints
pub max_quot_size: usize,
/// The verifier index corresponding to this prover index
#[serde(skip)]
pub verifier_index: Option<VerifierIndex<G>>,
/// The verifier index digest corresponding to this prover index
#[serde_as(as = "Option<o1_utils::serialization::SerdeAs>")]
pub verifier_index_digest: Option<G::BaseField>,
}
Verifier Index
Same as the prover index, we have a number of precomputations as part of the verifier index.
#[serde_as]
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct LookupVerifierIndex<G: CommitmentCurve> {
pub lookup_used: LookupsUsed,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_table: Vec<PolyComm<G>>,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_selectors: LookupSelectors<PolyComm<G>>,
/// Table IDs for the lookup values.
/// This may be `None` if all lookups originate from table 0.
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub table_ids: Option<PolyComm<G>>,
/// The maximum joint size of any joint lookup in a constraint in `kinds`. This can be computed from `kinds`.
pub max_joint_size: u32,
/// An optional selector polynomial for runtime tables
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub runtime_tables_selector: Option<PolyComm<G>>,
}
#[serde_as]
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct VerifierIndex<G: KimchiCurve> {
/// evaluation domain
#[serde_as(as = "o1_utils::serialization::SerdeAs")]
pub domain: D<G::ScalarField>,
/// maximal size of polynomial section
pub max_poly_size: usize,
/// maximal size of the quotient polynomial according to the supported constraints
pub max_quot_size: usize,
/// polynomial commitment keys
#[serde(skip)]
pub srs: OnceCell<Arc<SRS<G>>>,
/// number of public inputs
pub public: usize,
/// number of previous evaluation challenges, for recursive proving
pub prev_challenges: usize,
// index polynomial commitments
/// permutation commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub sigma_comm: [PolyComm<G>; PERMUTS],
/// coefficient commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub coefficients_comm: [PolyComm<G>; COLUMNS],
/// coefficient commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub generic_comm: PolyComm<G>,
// poseidon polynomial commitments
/// poseidon constraint selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub psm_comm: PolyComm<G>,
// ECC arithmetic polynomial commitments
/// EC addition selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub complete_add_comm: PolyComm<G>,
/// EC variable base scalar multiplication selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub mul_comm: PolyComm<G>,
/// endoscalar multiplication selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub emul_comm: PolyComm<G>,
/// endoscalar multiplication scalar computation selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub endomul_scalar_comm: PolyComm<G>,
/// Chacha polynomial commitments
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub chacha_comm: Option<[PolyComm<G>; 4]>,
// Range check gates polynomial commitments
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub range_check_comm: Option<[PolyComm<G>; range_check::gadget::GATE_COUNT]>,
/// wire coordinate shifts
#[serde_as(as = "[o1_utils::serialization::SerdeAs; PERMUTS]")]
pub shift: [G::ScalarField; PERMUTS],
/// zeroknowledge polynomial
#[serde(skip)]
pub zkpm: OnceCell<DensePolynomial<G::ScalarField>>,
// TODO(mimoo): isn't this redundant with domain.d1.group_gen ?
/// domain offset for zeroknowledge
#[serde(skip)]
pub w: OnceCell<G::ScalarField>,
/// endoscalar coefficient
#[serde(skip)]
pub endo: G::ScalarField,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_index: Option<LookupVerifierIndex<G>>,
#[serde(skip)]
pub linearization: Linearization<Vec<PolishToken<G::ScalarField>>>,
/// The mapping between powers of alpha and constraints
#[serde(skip)]
pub powers_of_alpha: Alphas<G::ScalarField>,
}
Proof Construction & Verification
Originally, kimchi is based on an interactive protocol that was transformed into a noninteractive one using the FiatShamir transform. For this reason, it can be useful to visualize the highlevel interactive protocol before the transformation:
sequenceDiagram participant Prover participant Verifier Note over Prover,Verifier: Prover produces commitments to secret polynomials Prover>>Verifier: public input & witness commitment Verifier>>Prover: beta & gamma Prover>>Verifier: permutation commitment opt lookup Prover>>Verifier: sorted Prover>>Verifier: aggreg end Note over Prover,Verifier: Prover produces commitment to quotient polynomial Verifier>>Prover: alpha Prover>>Verifier: quotient commitment Note over Prover,Verifier: Verifier produces an evaluation point Verifier>>Prover: zeta Note over Prover,Verifier: Prover provides helper evaluations Prover>>Verifier: the generic selector gen(zeta) & gen(zeta * omega) Prover>>Verifier: the poseidon selector pos(zeta) & pos(zeta * omega) Prover>>Verifier: negated public input p(zeta) & p(zeta * omega) Note over Prover,Verifier: Prover provides needed evaluations for the linearization Note over Verifier: change of verifier (change of sponge) Prover>>Verifier: permutation poly z(zeta) & z(zeta * omega) Prover>>Verifier: the 15 registers w_i(zeta) & w_i(zeta * omega) Prover>>Verifier: the 6 sigmas s_i(zeta) & s_i(zeta * omega) Prover>>Verifier: ft(zeta * omega) opt lookup Prover>>Verifier: sorted(zeta) & sorted(zeta * omega) Prover>>Verifier: aggreg(zeta) & aggreg(zeta * omega) Prover>>Verifier: table(zeta) & table(zeta * omega) end Note over Prover,Verifier: Batch verification of evaluation proofs Verifier>>Prover: u, v Note over Verifier: change of verifier (change of sponge) Prover>>Verifier: aggregated evaluation proof (involves more interaction)
The FiatShamir transform simulates the verifier messages via a hash function that hashes the transcript of the protocol so far before outputing verifier messages. You can find these operations under the proof creation and proof verification algorithms as absorption and squeezing of values with the sponge.
Proof Structure
A proof consists of the following data structures:
/// Evaluations of lookup polynomials
#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
#[serde(bound(
serialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::SerializeAs<Field>",
deserialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::DeserializeAs<'de, Field>"
))]
pub struct LookupEvaluations<Field> {
/// sorted lookup table polynomial
#[serde_as(as = "Vec<Vec<o1_utils::serialization::SerdeAs>>")]
pub sorted: Vec<Field>,
/// lookup aggregation polynomial
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub aggreg: Field,
// TODO: May be possible to optimize this away?
/// lookup table polynomial
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub table: Field,
/// Optionally, a runtime table polynomial.
#[serde_as(as = "Option<Vec<o1_utils::serialization::SerdeAs>>")]
pub runtime: Option<Field>,
}
// TODO: this should really be vectors here, perhaps create another type for chunked evaluations?
/// Polynomial evaluations contained in a `ProverProof`.
///  **Chunked evaluations** `Field` is instantiated with vectors with a length that equals the length of the chunk
///  **Non chunked evaluations** `Field` is instantiated with a field, so they are singlesized#[serde_as]
#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
#[serde(bound(
serialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::SerializeAs<Field>",
deserialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::DeserializeAs<'de, Field>"
))]
pub struct ProofEvaluations<Field> {
/// witness polynomials
#[serde_as(as = "[Vec<o1_utils::serialization::SerdeAs>; COLUMNS]")]
pub w: [Field; COLUMNS],
/// permutation polynomial
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub z: Field,
/// permutation polynomials
/// (PERMUTS1 evaluations because the last permutation is only used in commitment form)
#[serde_as(as = "[Vec<o1_utils::serialization::SerdeAs>; PERMUTS  1]")]
pub s: [Field; PERMUTS  1],
/// lookuprelated evaluations
pub lookup: Option<LookupEvaluations<Field>>,
/// evaluation of the generic selector polynomial
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub generic_selector: Field,
/// evaluation of the poseidon selector polynomial
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub poseidon_selector: Field,
}
/// Commitments linked to the lookup feature
#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct LookupCommitments<G: AffineCurve> {
/// Commitments to the sorted lookup table polynomial (may have chunks)
pub sorted: Vec<PolyComm<G>>,
/// Commitment to the lookup aggregation polynomial
pub aggreg: PolyComm<G>,
/// Optional commitment to concatenated runtime tables
pub runtime: Option<PolyComm<G>>,
}
/// All the commitments that the prover creates as part of the proof.
#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct ProverCommitments<G: AffineCurve> {
/// The commitments to the witness (execution trace)
pub w_comm: [PolyComm<G>; COLUMNS],
/// The commitment to the permutation polynomial
pub z_comm: PolyComm<G>,
/// The commitment to the quotient polynomial
pub t_comm: PolyComm<G>,
/// Commitments related to the lookup argument
pub lookup: Option<LookupCommitments<G>>,
}
/// The proof that the prover creates from a [ProverIndex](super::prover_index::ProverIndex) and a `witness`.
#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct ProverProof<G: AffineCurve> {
/// All the polynomial commitments required in the proof
pub commitments: ProverCommitments<G>,
/// batched commitment opening proof
pub proof: OpeningProof<G>,
/// Two evaluations over a number of committed polynomials
// TODO(mimoo): that really should be a type Evals { z: PE, zw: PE }
pub evals: [ProofEvaluations<Vec<G::ScalarField>>; 2],
/// Required evaluation for [Maller's optimization](https://o1labs.github.io/minabook/crypto/plonk/maller_15.html#theevaluationofl)
#[serde_as(as = "o1_utils::serialization::SerdeAs")]
pub ft_eval1: G::ScalarField,
/// The public input
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub public: Vec<G::ScalarField>,
/// The challenges underlying the optional polynomials folded into the proof
pub prev_challenges: Vec<RecursionChallenge<G>>,
}
/// A struct to store the challenges inside a `ProverProof`
#[serde_as]
#[derive(Clone, Deserialize, Serialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct RecursionChallenge<G>
where
G: AffineCurve,
{
/// Vector of scalar field elements
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub chals: Vec<G::ScalarField>,
/// Polynomial commitment
pub comm: PolyComm<G>,
}
The following sections specify how a prover creates a proof, and how a verifier validates a number of proofs.
Proof Creation
To create a proof, the prover expects:
 A prover index, containing a representation of the circuit (and optionaly precomputed values to be used in the proof creation).
 The (filled) registers table, representing parts of the execution trace of the circuit.
The following constants are set:
EVAL_POINTS = 2
. This is the number of points that the prover has to evaluate their polynomials at. ($ζ$ and $ζω$ where $ζ$ will be deterministically generated.)ZK_ROWS = 3
. This is the number of rows that will be randomized to provide zeroknowledgeness. Note that it only needs to be greater or equal to the number of evaluations (2) in the protocol. Yet, it contains one extra row to take into account the last constraint (final value of the permutation accumulator). (TODO: treat the final constraint separately so that ZK_ROWS = 2)
The prover then follows the following steps to create the proof:

Ensure we have room in the witness for the zeroknowledge rows. We currently expect the witness not to be of the same length as the domain, but instead be of the length of the (smaller) circuit. If we cannot add
ZK_ROWS
rows to the columns of the witness before reaching the size of the domain, abort. 
Pad the witness columns with Zero gates to make them the same length as the domain. Then, randomize the last
ZK_ROWS
of each columns. 
Setup the FqSponge.

Absorb the digest of the VerifierIndex.

Absorb the commitments of the previous challenges with the Fqsponge.

Compute the negated public input polynomial as the polynomial that evaluates to $−p_{i}$ for the first
public_input_size
values of the domain, and $0$ for the rest. 
Commit (nonhiding) to the negated public input polynomial.

Absorb the commitment to the public polynomial with the FqSponge.
Note: unlike the original PLONK protocol, the prover also provides evaluations of the public polynomial to help the verifier circuit. This is why we need to absorb the commitment to the public polynomial at this point.

Commit to the witness columns by creating
COLUMNS
hidding commitments.Note: since the witness is in evaluation form, we can use the
commit_evaluation
optimization. 
Absorb the witness commitments with the FqSponge.

Compute the witness polynomials by interpolating each
COLUMNS
of the witness. TODO: why not do this first, and then commit? Why commit from evaluation directly? 
If using lookup:
 if using runtime table:
 check that all the provided runtime tables have length and IDs that match the runtime table configuration of the index we expect the given runtime tables to be sorted as configured, this makes it easier afterwards
 calculate the contribution to the second column of the lookup table (the runtime vector)
 If queries involve a lookup table with multiple columns then squeeze the FqSponge to obtain the joint combiner challenge $j_{′}$, otherwise set the joint combiner challenge $j_{′}$ to $0$.
 Derive the scalar joint combiner $j$ from $j_{′}$ using the endomorphism (TOOD: specify)
 If multiple lookup tables are involved,
set the
table_id_combiner
as the $j_{i}$ with $i$ the maximum width of any used table. Essentially, this is to add a last column of table ids to the concatenated lookup tables.  Compute the dummy lookup value as the combination of the last entry of the XOR table (so
(0, 0, 0)
). Warning: This assumes that we always use the XOR table when using lookups.  Compute the lookup table values as the combination of the lookup table entries.
 Compute the sorted evaluations.
 Randomize the last
EVALS
rows in each of the sorted polynomials in order to add zeroknowledge to the protocol.  Commit each of the sorted polynomials.
 Absorb each commitments to the sorted polynomials.
 if using runtime table:

Sample $β$ with the FqSponge.

Sample $γ$ with the FqSponge.

If using lookup:
 Compute the lookup aggregation polynomial.
 Commit to the aggregation polynomial.
 Absorb the commitment to the aggregation polynomial with the FqSponge.

Compute the permutation aggregation polynomial $z$.

Commit (hidding) to the permutation aggregation polynomial $z$.

Absorb the permutation aggregation polynomial $z$ with the FqSponge.

Sample $α_{′}$ with the FqSponge.

Derive $α$ from $α_{′}$ using the endomorphism (TODO: details)

TODO: instantiate alpha?

Compute the quotient polynomial (the $t$ in $f=Z_{H}⋅t$). The quotient polynomial is computed by adding all these polynomials together:
 the combined constraints for all the gates
 the combined constraints for the permutation
 TODO: lookup
 the negated public polynomial and by then dividing the resulting polynomial with the vanishing polynomial $Z_{H}$. TODO: specify the split of the permutation polynomial into perm and bnd?

commit (hiding) to the quotient polynomial $t$ TODO: specify the dummies

Absorb the the commitment of the quotient polynomial with the FqSponge.

Sample $ζ_{′}$ with the FqSponge.

Derive $ζ$ from $ζ_{′}$ using the endomorphism (TODO: specify)

If lookup is used, evaluate the following polynomials at $ζ$ and $ζω$:
 the aggregation polynomial
 the sorted polynomials
 the table polynonial

Chunk evaluate the following polynomials at both $ζ$ and $ζω$:
 $s_{i}$
 $w_{i}$
 $z$
 lookup (TODO)
 generic selector
 poseidon selector
By “chunk evaluate” we mean that the evaluation of each polynomial can potentially be a vector of values. This is because the index’s
max_poly_size
parameter dictates the maximum size of a polynomial in the protocol. If a polynomial $f$ exceeds this size, it must be split into several polynomials like so: $f(x)=f_{0}(x)+x_{n}f_{1}(x)+x_{2n}f_{2}(x)+⋯$And the evaluation of such a polynomial is the following list for $x∈ζ,ζω$:
$(f_{0}(x),f_{1}(x),f_{2}(x),…)$
TODO: do we want to specify more on that? It seems unecessary except for the t polynomial (or if for some reason someone sets that to a low value)

Evaluate the same polynomials without chunking them (so that each polynomial should correspond to a single value this time).

Compute the ft polynomial. This is to implement Maller’s optimization.

construct the blinding part of the ft polynomial commitment see https://o1labs.github.io/minabook/crypto/plonk/maller_15.html#evaluationproofandblindingfactors

Evaluate the ft polynomial at $ζω$ only.

Setup the FrSponge

Squeeze the Fqsponge and absorb the result with the FrSponge.

Absorb the previous recursion challenges.

Compute evaluations for the previous recursion challenges.

Evaluate the negated public polynomial (if present) at $ζ$ and $ζω$.

Absorb the unique evaluation of ft: $ft(ζω)$.

Absorb all the polynomial evaluations in $ζ$ and $ζω$:
 the public polynomial
 z
 generic selector
 poseidon selector
 the 15 register/witness
 6 sigmas evaluations (the last one is not evaluated)

Sample $v_{′}$ with the FrSponge

Derive $v$ from $v_{′}$ using the endomorphism (TODO: specify)

Sample $u_{′}$ with the FrSponge

Derive $u$ from $u_{′}$ using the endomorphism (TODO: specify)

Create a list of all polynomials that will require evaluations (and evaluation proofs) in the protocol. First, include the previous challenges, in case we are in a recursive prover.

Then, include:
 the negated public polynomial
 the ft polynomial
 the permutation aggregation polynomial z polynomial
 the generic selector
 the poseidon selector
 the 15 registers/witness columns
 the 6 sigmas
 optionally, the runtime table

if using lookup:
 add the lookup sorted polynomials
 add the lookup aggreg polynomial
 add the combined table polynomial
 if present, add the runtime table polynomial

Create an aggregated evaluation proof for all of these polynomials at $ζ$ and $ζω$ using $u$ and $v$.
Proof Verification
TODO: we talk about batch verification, but is there an actual batch operation? It seems like we’re just verifying an aggregated opening proof
We define two helper algorithms below, used in the batch verification of proofs.
FiatShamir argument
We run the following algorithm:

Setup the FqSponge.

Absorb the digest of the VerifierIndex.

Absorb the commitments of the previous challenges with the Fqsponge.

Absorb the commitment of the public input polynomial with the FqSponge.

Absorb the commitments to the registers / witness columns with the FqSponge.

If lookup is used:
 If it involves queries to a multiplecolumn lookup table, then squeeze the FqSponge to obtain the joint combiner challenge $j_{′}$, otherwise set the joint combiner challenge $j_{′}$ to $0$.
 Derive the scalar joint combiner challenge $j$ from $j_{′}$ using the endomorphism. (TODO: specify endomorphism)
 absorb the commitments to the sorted polynomials.

Sample $β$ with the FqSponge.

Sample $γ$ with the FqSponge.

If using lookup, absorb the commitment to the aggregation lookup polynomial.

Absorb the commitment to the permutation trace with the FqSponge.

Sample $α_{′}$ with the FqSponge.

Derive $α$ from $α_{′}$ using the endomorphism (TODO: details).

Enforce that the length of the $t$ commitment is of size
PERMUTS
. 
Absorb the commitment to the quotient polynomial $t$ into the argument.

Sample $ζ_{′}$ with the FqSponge.

Derive $ζ$ from $ζ_{′}$ using the endomorphism (TODO: specify).

Setup the FrSponge.

Squeeze the Fqsponge and absorb the result with the FrSponge.

Compute evaluations for the previous recursion challenges.

Absorb the previous recursion challenges.

Evaluate the negated public polynomial (if present) at $ζ$ and $ζω$.
NOTE: this works only in the case when the poly segment size is not smaller than that of the domain.

Absorb the unique evaluation of ft: $ft(ζω)$.

Absorb all the polynomial evaluations in $ζ$ and $ζω$:
 the public polynomial
 z
 generic selector
 poseidon selector
 the 15 register/witness
 6 sigmas evaluations (the last one is not evaluated)

Sample $v_{′}$ with the FrSponge.

Derive $v$ from $v_{′}$ using the endomorphism (TODO: specify).

Sample $u_{′}$ with the FrSponge.

Derive $u$ from $u_{′}$ using the endomorphism (TODO: specify).

Create a list of all polynomials that have an evaluation proof.

Compute the evaluation of $ft(ζ)$.
Partial verification
For every proof we want to verify, we defer the proof opening to the very end. This allows us to potentially batch verify a number of partially verified proofs. Essentially, this steps verifies that $f(ζ)=t(ζ)∗Z_{H}(ζ)$.
 Commit to the negated public input polynomial.
 Run the FiatShamir argument.
 Combine the chunked polynomials’ evaluations (TODO: most likely only the quotient polynomial is chunked) with the right powers of $ζ_{n}$ and $(ζ∗ω)_{n}$.
 Compute the commitment to the linearized polynomial $f$. To do this, add the constraints of all of the gates, of the permutation, and optionally of the lookup. (See the separate sections in the constraints section.) Any polynomial should be replaced by its associated commitment, contained in the verifier index or in the proof, unless a polynomial has its evaluation provided by the proof in which case the evaluation should be used in place of the commitment.
 Compute the (chuncked) commitment of $ft$ (see Maller’s optimization).
 List the polynomial commitments, and their associated evaluations,
that are associated to the aggregated evaluation proof in the proof:
 recursion
 public input commitment
 ft commitment (chunks of it)
 permutation commitment
 index commitments that use the coefficients
 witness commitments
 sigma commitments
 lookup commitments
Batch verification of proofs
Below, we define the steps to verify a number of proofs (each associated to a verifier index). You can, of course, use it to verify a single proof.
 If there’s no proof to verify, the proof validates trivially.
 Ensure that all the proof’s verifier index have a URS of the same length. (TODO: do they have to be the same URS though? should we check for that?)
 Validate each proof separately following the partial verification steps.
 Use the
PolyCom.verify
to verify the partially evaluated proofs.
Optimizations
commit_evaluation
: TODO
Security Considerations
TODO