Introduction
This page hosts documentations and specifications for some of the cryptographic algorithms of Mina. For the Rust documentation, see here.
Note that this book is a work in progress, not necessarily reflecting the current state of Mina.
Authors: Izaak Meckler, Vanishree Rao, Matthew Ryan, Anaïs Querol, Joseph Spadavecchia, David Wong
In memory of Vitaly Zelov.
Terminology
This document is intended primarily to communicate mathematical ideas to programmers with some experience of math.
To that end, we will often be ambivalent about the difference between sets and types (in whatever programming language, but usually I am imagining some idealized form of Rust or OCaml). So for example we may write

$A$ is a type

$A$ is a set

A
is a type
and these should all be interpreted the same way, assuming it makes sense in the context. Similarly, we may write either of the following, and potentially mean the same thing:

$a:A$

$a∈A$

a : A
We use
 $→$ for function types
 $↦$ for defining functions
Also, we usually assume functions are computed by a program (in whatever sense of “program” makes sense in the given context). So if we say “let $f:A→B$”, we typically mean that

$A$ and $B$ are types in some programming language which makes sense in the context (or maybe that they are sets, again depending on context)

$f$ is actually a program computing a function of the type $A→B$ (again, in whatever sense of program makes sense in context)
Groups
Groups are an algebraic structure which

are used in most widely deployed signature schemes

are used in most widely deployed zkSNARKs

generalize the notion of a collection of symmetries of an object

generalize certain aspects of numbers
I will not touch on the last two too much.
First let’s do a quick definition and then some examples.
Definition. A group is a set $G$ with a binary operation $∗:G×G→G$, an identity $e∈G$ for that operation, and a inverse map $()_{−1}:G→G$ such that

$∗$ is associative: $x∗(y∗z)=(x∗y)∗z$ for all $x,y,z$.

$e$ is an identity for $∗$. I.e., $e∗x=x∗e=x$ for all $x$.

$()_{−1}$ is an inverse map for $∗$ with identity $e$. I.e., $x∗x_{−1}=x_{−1}∗x=e$ for all $x$.
So basically, an invertible binary operation. Definition in hand, we can see some examples:

The integers $Z$ with $+$, $0$ as identity, and negation as inverse.

For any natural number $n$, the integers mod $n$. This can be thought of as the set of numbers ${0,…,n−1}$ with the operation being $+$ followed by taking remainder mod $n$. It can also be thought of as the group of rotations of an $n$gon.

For any set $X$, the set of invertible functions $X→X$ (a.k.a permutations) with function composition as the operation, the identity function as the identity, and function inverse as the inverse operation.

The set of translations of a plane $R×R$ with composition of translations (since the composition of translations is a translation). The identity is translating by $(0,0)$ and inverse is reversing the translation. This is equivalent to the group $R×R$ with coordinatewise addition as the operation and coordinatewise negation as the inverse.

The set of rotations of a sphere with composition as the operation, doing nothing as the identity, and performing the rotation in reverse as the inverse map.

For any field $F$, the set $F_{×}:=F∖{0}$, with field multiplication as the operation, $1$ as the identity, and $x↦1/x$ as the inverse map. This is called the group of units of $F$.
Sometimes, instead of $x∗y$ we simply write $xy$ when $∗$ is clear from context.
Abelian groups
An abelian group is a group in which $∗$ is commutative, meaning $x∗y=y∗x$. Typically, in that case, we write the group operation as $+$ instead and we write the identity as $0$.
Some examples are

the integers with addition

the integers mod $n$ with addition

the group of units of a field $F$

the real numbers with addition

the rational numbers with addition

the real numbers with multiplication

the rational numbers with multiplication

vectors of integers of some fixed dimension with coordinatewise addition

The set of polynomials over a ring $R$, $R[x]$ with addition
Abelian groups are equivalently described as what are called $Z$modules. A $Z$module is like a vector space, but with the integers instead of a field.
Basically a $Z$module (or equivalently an Abelian group) is a structure where you can add elements and where it makes sense to multiply a group element by an integer.
If $G$ is an abelian group, we can define this “multiplication by an integer” as follows. If $n≥0$, then for $g∈G$, we can define $n⋅g$ by
$n⋅g:=ng+⋯+g $
and if $n<0$, then define $n⋅g:=∣n∣⋅(−g)$. Or equivalently.,
$n⋅g=∣n∣(−g)+⋯+(−g) =− ∣n∣g+⋯+g $
This is the general sense of what is called scalarmultiplication or sometimes exponentiation in cryptography.
Cyclic groups
A cyclic group $G$ is a special kind of abelian group. It is an abelian group generated by a single element $g∈G$. That is, a cyclic group $G$ (generated by $g∈G$) is one in which for every $h∈G$ we have $h=ng$ for some $n∈Z$.
Groups in cryptography
Many cryptographic protocols are defined generically with respect to any abelian group that has a “hard discrete log”.
Let

$G$ be a cyclic group

$g$ a generator of $G$

$μ$ a probability distribution on $G$

$A$ a set of algorithms of type $G→Z$ with runtime and memory usage bounds. In other words, a set of tuples $(P,t,m)$ where $P$ is a program of type $G→Z$, $t$ is a bound on the time that program can be run, and $m$ is a bound on the amount of memory that program can use.
In practice you fix this to be something like, the set of all computations that can be run for less than 1 trillion $.

$ε∈[0,1]$ a probability, usually taken to be something close to 0 like $1/2_{128}$
Then we can define the $(G,g,μ,A,ε)$computational discretelog assumption which says:
For any $(P,t,m)∈A$, if you sample h from G according to $μ$, then the probability that $P(h)⋅h=g$ is smaller than $ε$, assuming $P$ successfully runs within the given resource bounds.
Sometimes this is called the computational Diffie–Helman assumption. Basically what it’s saying is that for a group element $h$ sampled randomly, you can’t practically compute how many times you have to add $g$ to itself to get $h$.
Another really important assumption is the norelation assumption (TODO: name).
Basically what this says is that randomly selected group elements have no efficiently computable linear relations between them. Formally, letting $G$ be a group and $μ$ a probability distribution on $G$, and $A$ a set of programs (with resource bounds) that take in a list of group elements as inputs and outputs a list of integers of the same length.
Then the $(G,μ,A,N,ε)$norelation assumption says for all $(P,t,m)∈A$, for any $n≤N$, if you sample $g_{1},…,g_{n}$ according to $G$, letting $[e_{1},…,e_{n}]=P([g_{1},…,g_{n}])$, it is not the case that
$e_{1}⋅g_{1}+⋯+e_{n}⋅g_{n}=0$
except with probability $ε$ (assuming program $P$ runs in time $t$ with memory $m$).
Generic group model
Elliptic curves
Now, what are some concrete groups that we can safely make the norelation or computational discrete log hardness assumptions about?
Well, the most efficiently implementable groups that people have come up with – and that we believe satisfy the above assumptions for $A$ being the class of realistic computations and $ε$ being something like $1/2_{128}$ – are elliptic curves over finite fields.
Giving a complete definition of what an elliptic curve is requires a lot of math, and is not very useful from the point of view of cryptography. So we will give a definition that is not complete, but more useful.
An elliptic curve $E$ over a field $F$ is a set of the form
${(x,y)∈F×F∣y_{2}=x_{3}+ax+b}$
for some $a,b∈F$, plus an additional point $O$ which is called the point at infinity. Basically it’s a set of pairs satisfying some equation of that form. The data of the elliptic curve itself is just the field $F$ together with the constants $a$ and $b$.
What is special about elliptic curves is that there is a way of giving them a group structure with simple to compute operations that we believe satisfy the hardness assumptions described above.
Group negation $(−):E→E$ is defined by
$−(x_{0},y_{0})=(x_{0},−y_{0})$
so we just negate the $y$ coordinate.
The identity for the group is $O$, the point at infinity. For that point we may also therefore write it as $0$.
Group addition is more complicated to define, so we will not, but here is what’s worth knowing about how to compute $(x_{0},y_{0})+(x_{1},y_{1})$

There are three cases

$x_{0}=x_{1}$

$x_{0}=x_{1}$ and $y_{0}=y_{1}$

$x_{0}=x_{1}$ but $y_{0}=y_{1}$. In this case it turns out $y_{0}=−y_{1}$ and so the two points are inverse, and we return $O$
In cases 1 and 2, the algorithm to compute the result just performs a simple sequence of some field operations (multiplications, additions, divisions, subtractions) with the input values. In other words, there is a simple arithmetic formula for computing the result.

Elliptic curves in Rust
Elliptic curves in Sage
Serializing curve points
Given a curve point $(x,y)$ we know $y_{2}=x_{3}+ax+b$ and thus $y$ is one of the two square roots of $x_{3}+ax+b$. If $y$ is a given square root, the other square root is $−y$ since $y_{2}=(−y)_{2}$. So, if we know $x$, then we almost know the whole curve point: we just need a single bit to tell us which of the two possible values (i.e., the two square roots of $x_{3}+ax+b$) is the y coordinate.
Here is how this is commonly done over prime order fields $F_{p}$, assuming $p=2$. Remember that we represent elements of $F_{p}$ as integers in the range $[0,p−1]$. In this representation, for a field element $y$, $−y$ is the integer $p−y$ (or $0$ if $y=0$). Thus, if $y$ is odd, then $−y$ is even (since $p$ is odd and an odd minus an odd is even). Similarly, if $y$ is even, then $−y$ is odd (since an odd minus an even is odd).
So, for any $y$, unless $y$ is 0, $y$ and $−y$ have opposite parities. Parity can easily be computed: it’s just the least significant bit. Thus, we have an easy way of encoding a curve point $(x,y)$. Send

$x$ in its entirety

The least significant bit $b_{0}$ of $y$
Given this, we can reconstruct $y$ as follows.

Compute any square root $Y$ of $x_{3}+ax+b$

If the least significant bit of $Y$ is equal to $b_{0}$, then $y=Y$, otherwise, $y=−Y$
In the case of Mina, our field elements require 255 bits to store. Thus, we can encode a curve point in $255+1=256$ bits, or 32 bytes. At the moment this optimized encoding is not implemented for messages on the network. It would be a welcome improvement.
Projective form / projective coordinates
The above form of representing curve points as pairs – which is called affine form – is sometimes suboptimal from an efficiency perspective. There are several other forms in which curve points can be represented, which are called projective forms.
The simple projective form represents a curve $E$ as above as the set
${(X,Y,Z)∈F_{3}∣(Y/Z)_{2}=(X/Z)_{3}+a(X/Z)+b}$
If you think about it, this is saying that $(X/Z,Y/Z)$ is a point on the original curve in affine form. In other words, in projective form we let the first two coordinates get scaled by some arbitrary scaling factor $Z$, but we keep track of it as the third coordinate.
To be clear, this means curve points have many different representations. If $(x,y,z)$ is a curve point in projective coordinates, and $s$ is any element of $F$, then $(sx,sy,sz)$ is another representation of the same curve point.
This means curve points require more space to store, but it makes the group operation much more efficient to compute, as we can avoid having to do any field divisions, which are expensive.
Jacobian form / Jacobian coordinates
There is another form, which is also sometimes called a projective form, which is known as the jacobian form. There, a curve $E$ would be represented as
${(X,Y,Z)∈F_{3}∣(X/Z_{2},Y/Z_{3})∈E}$
so the triple $(X,Y,Z)$ corresponds to the affine point $(X/Z_{2},Y/Z_{3})$. These are the fastest for computing group addition on a normal computer.
Take aways

Use affine coordinates when the cost of division doesn’t matter and saving space is important. Specific contexts to keep in mind:
 Working with elliptic curve points inside a SNARK circuit

Use Jacobian coordinates for computations on normal computers, or other circumstances where space usage is not that costly and division is expensive.

Check out this website for the formulas for implementing the group operation.
More things to know
 When cryptographers talk about “groups”, they usually mean a “computational group”, which is a group equipped with efficient algorithms for computing the group operation and inversion. This is different because a group in the mathematical sense need not have its operations be computable at all.
Exercises

Implement a type
JacobianPoint<F:Field>
and functions for computing the group operation 
Familiarize yourself with the types and traits in
ark_ec
. Specifically, todo

Implement
fn decompress<F: SquareRootField>(c: (F, bool)) > (F, F)
Rings
A ring is like a field, but where elements may not be invertible. Basically, it’s a structure where we can

add

multiply

subtract
but not necessarily divide. If you know what polynomials are already, you can think of it as the minimal necessary structure for polynomials to make sense. That is, if $R$ is a ring, then we can define the set of polynomials $R[x]$ (basically arithmetic expressions in the variable $x$) and think of any polynomial $f∈R[x]$ giving rise to a function $R→R$ defined by substituing in for $x$ in $f$ and computing using $+$ and $⋅$ as defined in $R$.
So, in full, a ring $R$ is a set equipped with

$(+):R×R→R$

$(⋅):R×R→R$

$(−):R→R$

$0∈R$

$1∈R$
such that

$(+,0,−)$ gives the structure of an abelian group

$(⋅)$ is associative and commutative with identity $1$

$+$ distributes over $⋅$. I.e., $x⋅(y+z)=x⋅y+x⋅z$ for all $x,y,z$.
Intro
The purpose of this document is to give the reader mathematical, cryptographic, and programming context sufficient to become an effective practictioner of zeroknowledge proofs and ZKSNARKs specifically.
Some fundamental mathematical objects
In this section we’ll discuss the fundamental objects used in the construction of most ZKSNARKs in use today. These objects are used extensively and without ceremony, so it’s important to understand them very well.
If you find you don’t understand something at first, don’t worry: practice is often the best teacher and in using these objects you will find they will become like the bicycle you’ve had for years: like an extension of yourself that you use without a thought.
Fields
A field is a generalization of the concept of a number. To be precise, a field is a set $F$ equipped with

An element $0∈F$

An element $1∈F$

A function $mul:F×F→F$

A function $add:F×F→F$

A function $sub:F×F→F$

A function $div:F×(F∖{0})→F$
Note that the second argument to $div$ cannot be $0$. We write these functions in the traditional infix notation writing

$xy$ or $x⋅y$ for $mul(x,y)$

$x+y$ for $add(x,y)$

$x−y$ for $sub(x,y)$

$yx $ for $div(x,y)$
and we also write $x_{−1}$ for $div(1,x)$ and $−x$ for $sub(0,x)$.
Moreover all these elements and functions must obey all the usual laws of arithmetic, such as

$x+(y+z)=(x+y)+z$

$x+y=y+x$

$x+(−x)=0$

$x+0=x$

$x(yz)=(xy)z$

$xy=yx$

$yx =z$ if and only if $x=zy$, assuming $y=0$.

$1⋅x=x$

$x(y+z)=xy+xz$
In short, $F$ should be an abelian group over $+$ with $0$ as identity and $sub(0,−)$ as inverse, $F∖{0}$ should be an abelian group over $⋅$ with $1$ as identity and $div(1,−)$ as inverse, and addition should distribute over multiplication. If you don’t know what an abelian group is, don’t worry, we will cover it later.
The point of defining a field is that we can algebraically manipulate elements of a field the same way we do with ordinary numbers, adding, multiplying, subtracting, and dividing them without worrying about rounding, underflows, overflows, etc.
In Rust, we use the trait
Field
to represent types that are fields. So, if we haveT : Field
then values of typeT
can be multiplied, subtracted, divided, etc.
Examples
The most familiar examples of fields are the real numbers $R$ and the rational numbers $Q$ (ratios of integers). Some readers may also be friends with the complex numbers $C$ which are also a field.
The fields that we use in ZKPs, however, are different. They are finite fields. A finite field is a field with finitely many elements. This is in distinction to the fields $Q,R$, and $C$, which all have infinitely many elements.
What are finite fields like?
In this section we’ll try to figure out from first principles what a finite field should look like. If you just want to know the answer, feel free to skip to the next section.
Let’s explore what a finite field can possibly be like. Say $F$ is a finite field. If $n$ is a natural number like $4$ or $87$ we can imagine it as an element of $F$ by writing $n1+⋯+1 $.
Since $F$ is finite it must be the case that we can find two distinct natural numbers $n<m$ which are the same when interpreted as elements of $F$.
Then, $m−n=0$, which means the $F$ element $m−n1+⋯+1 $ is $0$. Now that we have established that if you add $1$ to itself enough times you get $0$, let $p$ be the least natural number such that if you add $1$ to itself $p$ times you get $0$.
Now let’s show that $p$ is prime. Suppose it’s not, and $p=ab$. Then since $p=0$ in $F$, $ab=0$. It is a fact about fields (exercise) that if $ab=0$ then either $a$ is 0 or $b$ is 0. Either way, we would then get a natural number smaller than $p$ which is equal to $0$ in $F$, which is not possible since $p$ is the smallest such number. So we have demonstrated that $p$ is prime.
The smallest number of times you have to add $1$ to itself to get 0 is called the characteristic of the field $F$. So, we have established that every finite field has a characteristic and it is prime.
This gives us a big hint as to what finite fields should look like.
Prime order fields
With the above in hand, we are ready to define the simplest finite fields, which are fields of prime order (also called prime order fields).
Let $p$ be a prime. Then $F_{p}$ (pronounced “eff pee” or “the field of order p”) is defined as the field whose elements are the set of natural numbers ${0,1,…,p−1}$.

$0$ is defined to be $0$

$1$ is defined to be $1$

$add(x,y)=(x+y)modp$

$sub(x,y)=(x−y)modp$

$mul(x,y)=(x⋅y)modp$

$div(x,y)=(x⋅y_{p−2})modp$
Basically, you just do arithmetic operations normally, except you take the remainder after dividing by $p$. This is with the exception of division which has a funny definition. Actually, a more efficient algorithm is used in practice to calculate division, but the above is the most succinct way of writing it down.
If you want, you can try to prove that the above definition of division makes the required equations hold, but we will not do that here.
Let’s work out a few examples.
2 is a prime, so there is a field $F_{2}$ whose elements are ${0,1}$. The only surprising equation we have is
 $1+1=0$
Addition is XOR and multiplication is AND.
Let’s do a more interesting example. 5 is a prime, so there is a field whose elements are ${0,1,2,3,4}$. We have
$21 =1⋅2_{5−2}=2_{3}=8=3 $
where the last equality follows because everything is mod 5.
We can confirm that 3 is in fact $21 $ by multiplying 3 and 2 and checking the result is 1.
$21 ⋅2=3⋅2=6=1$
so that checks out.
In cryptography, we typically work with much larger finite fields. There are two ways to get a large finite field.

Pick a large prime $p$ and take the field $F_{p}$. This is what we do in Mina, where we use fields in which $p$ is on the order of $2_{255}$.

Take an extension of a small finite field. We may expand this document to talk about field extensions later, but it does not now.
Algorithmics of prime order fields
For a finite field $F_{p}$ where $p$ fits in $n$ bits (i.e., $p<2_{n}$) we have

Addition, subtraction: $O(n)$

Multiplication $O(n_{2})$

Division $O(n_{2})$ I believe, in practice significantly slower than multiplication.
Actually, on a practical level, it’s more accurate to model the complexity in terms of the number of limbs rather than the number of bits where a limb is 64 bits. Asymptotically it makes no difference but concretely it’s better to think about the number of limbs for the most part.
As a result you can see it’s the smaller $n$ is the better, especially with respect to multiplication, which dominates performance considerations for implementations of zkSNARKs, since they are dominated by elliptic curve operations that consist of field operations.
While still in devolpment, Mina used to use a field of 753 bits or 12 limbs and now uses a field of 255 bits or 4 limbs. As a result, field multiplication became automatically sped up by a factor of $12_{2}/4_{2}=9$, so you can see it’s very useful to try to shrink the field size.
Polynomials
The next object to know about is polynomials. A polynomial is a syntactic object that also defines a function.
Specifically, let $R$ be a field (or more generally it could even be a ring, which is like a field but without necessarily having division, like the integers $Z$). And pick a variable name like $x$. Then $R[x]$ (pronounced, “R adjoin x” or “polynomials in x over R” or “polynomials in x with R coefficients”) is the set of syntactic expressions of the form
$a_{0}+a_{1}x+a_{2}x_{2}+⋯+a_{d}x_{d}$
where each $a_{i}∈R$ and $d$ is any natural number. If we wanted to be very formal about it, we could define $R[x]$ to be the set of lists of elements of $R$ (the lists of coefficients), and the above is just a suggestive way of writing that list.
An important fact about polynomials over $R$ is that they can be interpreted as functions $R→R$. In other words, there is a function $eval:R[x]→(R→R)$ defined by
$eval(a_{0}+a_{1}x+⋯+a_{d}x_{d})=(y:R)↦a_{0}+a_{1}y+⋯+a_{d}y_{d}$
where $x$ is just a variable name without a value assigned to it and $eval$ maps it to a field element $y$ and evaluates the function as the inner product between the list of coefficients and the powers of $y$.
It is important to remember that polynomials are different than the functions that they compute. Polynomials are really just lists of coefficients. Over some fields, there are polynomials $p1$ and $p2$ such that $eval(p1)=eval(p2)$, but $p1=p2$.
For example, consider in $F_{2}[x]$ the polynomials $x$ and $x_{2}$. These map to the same function $F_{2}→F_{2}$ (meaning for $x∈R=F_{2}={0,1}$ it holds that $x=x_{2}$), but are distinct polynomials.
Some definitions and notation
If $f$ is a polynomial in $R[x]$ and $a∈R$, we write $f(a)$ for $eval(f)(a)$
If $f$ is a polynomial, the degree of $f$ is the largest $d$ for which the coefficient of $x_{d}$ is nonzero in $f$. For example, the degree of $x_{3}+2x_{10}$ is 10.
We will use the notation $R[x]_{<d}$ and $R[x]_{≤d}$ for the set of polynomials of degree lessthan and lessthanorequal $d$ respectively.
Polynomials can be added (by adding the coefficients) and multiplied (by carrying out the multiplication formally and collecting like terms to reduce to a list of coefficients). Thus $R[x]$ is a ring.
Fundamental theorem of polynomials
An important fact for zkSNARKs about polynomials is that we can use them to encode arrays of field elements. In other words, there is a way of converting between polynomials of degree $d$ and arrays of field elements of length $d+1$.
This is important for a few reasons

It will allow us to translate statements about arrays of field elements into statements about polynomials, which will let us construct zkSNARKs.

It will allow us to perform certain operations on polynomials more efficiently.
So let’s get to defining and proving this connection between arrays of field elements.
The first thing to understand is that we won’t be talking about arrays directly in this section. Instead, we’ll be talking about functions $A→F$ where $A$ is a finite subset of $F$. The idea is, if $A$ has size $n$, and we put an ordering on $A$, then we can think of a function $A→F$ as the same thing as an immutable array of length $n$, since such a function can be thought of as returning the value of an array at the input position.
With this understanding in hand, we can start describing the “fundamental theorem of polynomials”. If $A⊆F$ has size $d+1$, this theorem will define an isomorphism between functions $A→F$ and $F[x]_{≤d}$, the set of polynomials of degree at most $d$.
Now let’s start defining this isomorphism.
One direction of it is very easy to define. It is none other than the evaluation map, restricted to $A$:
$eval_{A}eval_{A} :F[x]_{≤d}→(A→F)(c_{0}+⋯+c_{d}x_{d})=a↦i≤d∑ c_{i}a_{i} $
We would now like to construct an inverse to this map. What would that mean? It would be a function that takes as input a function $φ:A→F$ (remember, basically an array of length $∣A∣$), and returns a polynomial $f$ which agrees with $φ$ on the set $A$. In other words, we want to construct a polynomial that interpolates between the points $(a,φ(a))$ for $a∈A$.
Our strategy for constructing this polynomial will be straightforward. For each $a∈A$, we will construct a polynomial $f_{a}$ that is equal to $φ(a)$ when evaluated at $a$, and equal to $0$ when evaluated anywhere else in the set $A$.
Then, our final polynomial will be $f:=∑_{a∈A}f_{a}$. Then, when $f$ is evaluated at $a_{0}∈A$, only the $f_{a_{0}}$ term will be nonzero (and equal to $φ(a_{0})$ as desired), all the other terms will disappear.
Constructing the interpolation map requires a lemma.
Lemma. (construction of vanishing polynomials)
Let $S⊆F$. Then there is a polynomial $v_{S}∈F[x]$ of degree $∣S∣$ such that $v_{S}$ evaluates to $0$ on $S$, and is nonzero off of $S$. $v_{S}$ is called the vanishing polynomial of $S$.
Proof. Let $v_{S}=∏_{s∈S}(x−s)$. Clearly $v_{S}$ has degree $∣S∣$. If $t∈S$, then $v_{S}(t)=0$ since $x−t$ is one of the terms in the product defining $v_{S}$, so when we evaluate at $t$, $t−t=0$ is one of the terms. If $t∈/S$, then all the terms in the product are nonzero, and thus $v_{S}(t)$ is nonzero as well. $□$
Now we can define the inverse map. Define
$interp_{A}interp_{A} :(A→F)→F[x]_{≤d}(f)=a∈A∑ v_{A∖{a}}(a)f(a) ⋅v_{A∖{a}} $
Since each $v_{A∖{a}}$ has degree $d$, this polynomial has degree $d$. Now we have, for any $b∈A$,
$eval_{A}(interp_{A}(f))(b) =a∈A∑ v_{A∖{a}}(a)f(a) ⋅v_{A∖{a}}(b)=v_{A∖{b}}(b)f(b) ⋅v_{A∖{b}}(b)+a=b∑ v_{A∖{a}}(a)f(a) ⋅v_{A∖{a}}(b)=f(b)+a=b∑ v_{A∖{a}}(a)f(a) ⋅0=f(b) $
Thus $eval_{A}∘interp_{A}$ is the identity.
So we have successfully devised a way of interpolating a set of $d+1$ points with a polynomial of degree $d$.
What remains to be seen is that these two maps are inverse in the other direction. That is, that for any polynomial $f∈F[x]$, we have
$interp_{A}(eval_{A}(f))=f$
This says that if we interpolate the function that has the values of $f$ on $A$, we get back $f$ itself. This essentially says that there is only one way to interpolate a set of $d+1$ points with a degree $d$ polynomial. So let’s prove it by proving that statement.
Lemma: polynomials that agree on enough points are equal.
Let $F$ be a field and suppose $f,g∈F[x]$ have degree at most $d$. Let $A⊆F$ have size $d+1$.
Suppose for all $a∈A$, $f(a)=g(a)$. Then $f=g$ as polynomials.
Proof. Define $h:=f−g$. Our strategy will be to show that $h$ is the zero polynomial. This would then imply that $f=g$ as polynomials. Note that $h$ vanishes on all of $A$ since $f$ and $g$ are equal on $A$.
To that end, let $a∈A$. Then we can apply the polynomial division algorithm to divide $h$ by $x−a$ and obtain polynomials $q_{a},r_{a}$ such that $h=q_{a}(x−a)+r_{a}$ and $r_{a}$ has degree less than 1. I.e., $r_{a}$ is a constant in $F$.
Now, $0=h(a)=q_{a}(a)⋅0+r_{a}=r_{a}$ so $r_{a}=0$ and thus $h=q_{a}(x−a)$.
Note that $q_{a}$ is 0 on all elements $b∈A$ with $b=a$ since $h(b)=q_{a}(b)(b−a)=0$, but $b−a=0$.
Thus, if we iterate the above, enumerating the elements of $A$ as $a_{0},…,a_{d}$, we find that $h =(x−a_{0})q_{a_{0}}=(x−a_{0})(x−a_{1})q_{a_{1}}=…=(x−a_{0})(x−a_{1})…(x−a_{d})q_{a_{d}} $
Now, if $q_{a_{d}}$ is not the zero polynomial, then $h$ will have degree at least $d+1$ since it would have as a factor the $d+1$ linear terms $(x−a_{i})$. But since $f,g$ both have degree at most $d$ and $h=f−g$, $h$ has degree at most $d$ as well. Thus, $q_{a_{d}}=0$, which means $h=0$ as well.
This gives us the desired conclusion, that $f=g$ as polynomials. $□$
Now we can easily show that interpolating the evaluations of a polynomial yields that polynomial itself. Let $f∈F[x]_{≤d}$. Then $interp_{A}(eval_{A}(f))$ is a polynomial of degree at most $d$ that agrees with $f$ on $A$, a set of size $d+1$. Thus, by the lemma, they are equal as polynomials. So indeed $interp_{A}(eval_{A}(f))=f$ for all $f∈F[x]$.
So far we have proven that $interp_{A}$ and $eval_{A}$ give an isomorphism of sets (i.e., a bijection) between the sets $A→F$ and $F[x]_{≤d}$.
But we can take this theorem a bit further. The set of functions $A→F$ can be given the structure of a ring, where addition and multiplication happen pointwise. I.e., for $f,g:A→F$ we define $f+g:=a↦f(a)+g(a)$ and $f⋅g:=a↦f(a)⋅g(a)$. Then we can strengthen our theorem to say
Fundamental theorem of polynomials (final version)
Let $d∈N$ and let $A⊆F$ with $∣A∣=d+1$. With
$eval_{A}:F[x]_{≤d}→(A→F)interp_{A}:(A→F)→F[x]_{≤d}$
defined as above, these two functions define an isomorphism of rings.
That is is, they are mutually inverse and each one respects addition, subtraction and multiplication.
The fundamental theorem of polynomials is very important when it comes to computing operations on polynomials. As we will see in the next section, the theorem will help us to compute the product of degree $n$ polynomials in time $O(ngn)$, whereas the naive algorithm takes time $O(n_{2})$. To put this in perspective, if $n=2_{16}$, $n_{2}$ is $4096$ times larger than $ngn$ and the gap only gets bigger as $n$ grows.
Schwartz–Zippel lemma
Computer representation
There are three common representations for polynomials used in computer implementations.

Dense coefficient form. A degree $d$ polynomial is represented as a vector of length $d+1$ of all the coefficients. Entry $i$ of the vector corresponds to the coefficient $a_{i}$. This corresponds to the
DensePolynomial
type in arkworks. Sometimes this is also described as writing the polynomial “in the monomial basis”, because it amounts to writing the polynomial as a linear combination of the monomials $x_{i}$. 
Sparse coefficient form. If a polynomial does not have very many nonzero coefficients, the above representation is wasteful. In sparse coefficient form, you represent a polynomial as a vector (or potentially a hashmap) of pairs
(usize, F)
whereF
is the type of coefficients. The polynomial corresponding to the list[(i_0, b_0), ..., (i_n, b_n)]
is $b_{0}x_{i_{0}}+⋯+b_{n}x_{i_{n}}$ 
Evaluation form. We fix an index set $A⊆F$, with $A={a_{0},…,a_{d}}$, and represent a polynomial $f∈F[x]_{≤d}$ as the vector
[f(a_0), ..., f(a_d)]
. By the fundamental theorem of polynomials, this is a valid way of representing the polynomials, since the coefficients form can be obtained by using the $interp_{A}$ function.
The evaluation form is very important. The reason is that multiplying two polynomials in the evaluation form takes time $O(n)$. You just multiply the two vectors entrywise. By contrast, the coefficient forms naively require time $O(n_{2})$ to multiply.
Now, there is a trick. For certain sets $A⊆F$, we can efficiently translate between the dense coefficients form and the evaluation form. That is, for certain $A$, the functions $interp_{A}$ and $eval_{A}$ can be computed more efficiently than $O(n_{2})$.
Multiplying polynomials
The algorithm that allows us to multiply polynomials in $O(ngn)$ is called the CooleyTukey fast Fourier transform, or FFT for short. It was discovered by Gauss 160 years earlier, but then separately rediscovered and publicized by CooleyTukey.
The heart of CooleyTukey FFT is actually about converting between coefficient and evaluation representations, rather than the multiplication itself. Given polynomials $p$ and $q$ in dense coefficient representation, it works like this.
 Convert $p$ and $q$ from coefficient to evaluation form in $O(ngn)$ using CooleyTukey FFT
 Compute $r=p∗q$ in evaluation form by multiplying their points pairwise in $O(n)$
 Convert $r$ back to to coefficient form in $O(ngn)$ using the inverse CooleyTukey FFT
The key observation is that we can choose any $n$ distinct evaluation points to represent any degree $n−1$ polynomial. The CooleyTukey FFT works by selecting points that yield an efficient FFT algorithm. These points are fixed and work for any polynomials of a given degree.
The next section describes the CooleyTukey FFT in detail.
Fast Fourier Transform (FFT)
This section describes how the CooleyTukey fast Fourier transform works. As we learned in the previous section, the key is to select evaluation points that yield an efficient FFT algorithm.
Specifically, say we have $ω∈F$ such that $ω_{n}=1$, and $ω_{r}=1$ for any $0<r<n$.
Put another way, all the values $1,ω,ω_{2},ω_{3},…,ω_{n−1}$ are distinct and $ω_{n}=1$.
Put yet another way, the group generated by $ω$ inside $F_{×}$ (written $⟨ω⟩$) has size $n$.
We call such an $ω$ a primitive $n$th root of unity.
Suppose we have an $ω$ which is a primitive $2_{k}$th root of unity and let $A_{k}={1,ω,…,ω_{2_{k}−1}}$.
The FFT algorithm will let us compute $interp_{A_{k}}$ for this set.
Actually, it is easier to see how it will let us compute the $eval_{A_{k}}$ algorithm efficiently.
We will describe an algorithm $FFT(k,ω,f)$ that takes as input
 $k∈N$
 $ω∈F$ a primitive $2_{k}$th root of unity
 $f∈F[x]_{<2_{k}}$ in dense coefficients form (i.e., as a vector of coefficients of length $n$).
and outputs the vector of evaluations $[f(1),f(ω),f(ω_{2})…,f(ω_{2_{k}−1})]$ and does it in time $O(k2_{k})$ (which is to say, $ngn$ if $n=2_{k}$).
Notice that naively, computing each evaluation $f(ω_{i})$ using the coefficients of $f$ would require time $O(n)$, and so computing all $n$ of them would require time $O(n_{2})$.
The algorithm $FFT(k,ω,f)$ can be defined recursively as follows.
If $k=0$, then $ω$ is a primitive $1$st root of unity, and $f$ is a polynomial of degree $0$. That means $ω=1$ and also $f$ is a constant $c∈F$. So, we can immediately output the array of evaluations $[c]=[f(1)]$.
If $k>0$, then we will split $f$ into two polynomials, recursively call $FFT$ on them, and reconstruct the result from the recursive calls.
To that end, define $f_{0}$ to be the polynomial whose coefficients are all the evenindex coefficients of $f$ and $f_{1}$ the polynomial whose coefficients are all the oddindex coefficients of $f$. In terms of the array representation, this just means splitting out every other entry into two arrays. So that can be done in time $O(n)$.
Write $f=∑_{i<2_{k}}c_{i}x_{i}$, so that $f_{0}=∑_{i<2_{k−1}}c_{2i}x_{i}$ and $f_{1}=∑_{i<2_{k−1}}c_{2i+1}x_{i}$. Then
$f(x) =i<2_{k}∑ c_{i}x_{i}=i<2_{k−1}∑ c_{2i}x_{2i}+i<2_{k−1}∑ c_{2i+1}x_{2i+1}=i<2_{k−1}∑ c_{2i}(x_{2})_{i}+i<2_{k−1}∑ c_{2i+1}x⋅(x_{2})_{i}=i<2_{k−1}∑ c_{2i}(x_{2})_{i}+xi<2_{k−1}∑ c_{2i+1}(x_{2})_{i}=f_{0}(x_{2})+xf_{1}(x_{2}) $
Now, notice that if $ω$ is a $2_{k}$th root of unity, then $ω_{2}$ is a $2_{k−1}$th root of unity. Thus we can recurse with $FFT(k−1,ω_{2},f_{0})$ and similarly for $f_{1}$. Let
$[e_{0,0},…,e_{0,2_{k−1}−1}][e_{1,0},…,e_{1,2_{k−1}−1}] =FFT(k−1,ω_{2},f_{0})=FFT(k−1,ω_{2},f_{1}) $
By assumption $e_{i,j}=f_{i}((ω_{2})_{j})$. So, for any $j$ we have
$f(ω_{j}) =f_{0}((ω_{2})_{j})+ω_{j}f_{1}((ω_{2})_{j}) $
Now, since $j$ may be larger than $2_{k−1}−1$, we need to reduce it mod $2_{k−1}$, relying on the fact that if $τ$ is an $n$th root of unity then $τ_{j}=τ_{jmodn}$ since $τ_{n}=1$. Thus, $(ω_{2})_{j}=(ω_{2})_{jmod2_{k−1}}$ and so we have
$f(ω_{j}) =f_{0}((ω_{2})_{jmod2_{k−1}})+ω_{j}f_{1}((ω_{2})_{jmod2_{k−1}})=e_{0,jmod2_{k−1}}+ω_{j}e_{1,jmod2_{k−1}} $
We can compute the array $W=[1,ω,…,ω_{2_{k}−1}]$ in time $O(n)$ (since each entry is the previous entry times $ω$). Then we can compute each entry of the output in $O(1)$ as
$f(ω_{j}) =e_{0,jmod2_{k−1}}+W[j]⋅e_{1,jmod2_{k−1}} $
There are $n$ such entries, so this takes time $O(n)$.
This concludes the recursive definition of the algorithm $FFT(k,ω,f)$.
Algorithm: computing $eval_{A_{k}}$
 $Inputf=[c_{0},…,c_{2_{k}−1}]$ the coefficients of polynomial $f(x)=∑_{i<2_{k}}c_{i}x_{i}$
 $ComputeW←[1,ω,ω_{2},…,ω_{2_{k}−1}]$
 $FFT(k,ω,f)→[f(1),f(ω),f(ω_{2})…,f(ω_{2_{k}−1})]$
 $ifk==0$
 $returnf$
 $else$
 $Computef_{0}=[c_{0},c_{2},…,c_{2_{k}−2}]$ the even coefficients of $f,$ corresponding to $f_{0}(x)=∑_{i<2_{k−1}}c_{2i}x_{i}$
 $Computef_{1}=[c_{1},c_{3},…,c_{2_{k}−1}]$ the odd coefficients of $f,$ corresponding to $f_{1}(x)=∑_{i<2_{k−1}}c_{2i+1}x_{i}$
 $e_{0}←FFT(k−1,ω_{2},f_{0})$
 $e_{1}←FFT(k−1,ω_{2},f_{1})$
 $forj∈[0,2_{k}−1]$
 $F_{j}←e_{0,jmod2_{k−1}}+W[j]⋅e_{1,jmod2_{k−1}}$
 $returnF$
Now let’s analyze the time complexity. Let $T(n)$ be the complexity on an instance of size $n$ (that is, for $n=2_{k}$).
Looking back at what we have done, we have done
 $O(n)$ for computing $f_{0}$ and $f_{1}$
 two recursive calls, each of size $n/2$
 $O(n)$ for computing the powers of $ω$
 $O(n)$ for combining the results of the recursive calls
In total, this is $O(n)+2T(n/2)$. Solving this recurrence yields $T(n)=O(n)⋅logn=O(ngn)$. Basically, there are $gn$ recursions before we hit the base case, and each step takes time $O(n)$. $□$
Now, in practice there are ways to describe this algorithm nonrecursively that have better concrete performance, but that’s out of scope for this document. Read the code if you are interested.
Using the FFT algorithm to compute $interp_{A_{k}}$
So far we have a fast way to compute $eval_{A_{k}}(f)$ all at once, where $A_{k}$ is the set of powers of a $2_{k}$th root of unity $ω$. For convenience let $n=2_{k}$.
Now we want to go the other way and compute a polynomial given an array of evaluations. Specifically, $n$ evaluations $[f(x_{0}),f(x_{1}),…,f(x_{n−1})]$ uniquely define a degree $n−1$ polynomial. This can be written as a system of $n$ equations
$f(x_{0})f(x_{1})⋮f(x_{n−1}) =c_{0}+c_{1}x_{0}+…+c_{n−1}x_{0}=c_{0}+c_{1}x_{1}+…+c_{n−1}x_{1}=c_{0}+c_{1}x_{n−1}+…+c_{n−1}x_{n−1}, $ which can be rewritten as a matrix vector product. $ f(x_{0})f(x_{1})⋮f(x_{n−1}) = 11⋮1 x_{0}x_{1}⋮x_{n−1} ⋯⋯⋱⋯ x_{0}x_{1}⋮x_{n−1} × c_{0}c_{1}⋮c_{n−1} $ This $n×n$ matrix is a Vandermonde matrix and it just so happens that square Vandermonde matrices are invertible, iff the $x_{i}$ are unique. Since we purposely selected our $x_{i}$ to be the powers of $ω$, a primitive $n$th root of unity, by definition $x_{i}=ω_{i}$ are unique.
Therefore, to compute the polynomial given the corresponding array of evaluations (i.e. interpolation) we can solve for the polynomial’s coefficients using the inverse of the matrix. $ c_{0}c_{1}⋮c_{n−1} = 11⋮1 1ω⋮ω_{n−1} ⋯⋯⋱⋯ 1_{n−1}ω_{n−1}⋮ω_{(n−1)(n−1)} _{−1}× f(1)f(ω)⋮f(ω_{n−1}) $ All we need now is the inverse of this matrix, which is slightly complicated to compute. I’m going to skip it for now, but if you have the details please make a pull request.
Substituting in the inverse matrix we obtain the equation for interpolation. $ c_{0}c_{1}⋮c_{n−1} =n1 11⋮1 1ω_{−1}⋮ω_{−(n−1)} ⋯⋯⋱⋯ 1_{n−1}ω_{−(n−1)}⋮ω_{−(n−1)(n−1)} × f(1)f(ω)⋮f(ω_{n−1}) $ Observe that this equation is nearly identical to the original equation for evaluation, except with the following substitution. $ω_{i}⇒n1 ω_{−1i}$ Consequently and perhaps surprisingly, we can reuse the FFT algorithm $eval_{A_{k}}$ in order to compute the inverse– $interp_{A_{k}}$.
So, suppose we have an array $[a_{0},…,a_{n−1}]$ of field elements (which you can think of as a function $A_{k}→F$) and we want to compute the coefficients of a polynomial $f$ with $f(ω_{i})=a_{i}$.
To this end, define a polynomial $g$ by $g=∑_{j<n}a_{j}x_{j}$. That is, the polynomial whose coefficients are the evaluations in our array that we’re hoping to interpolate.
Now, let $[e_{0},…,e_{n−1}]=FFT(k,ω_{−1},g)$.
That is, we’re going to feed $g$ into the FFT algorithm defined above with $ω_{−1}$ as the $2_{k}$th root of unity. It is not hard to check that if $ω$ is an nth root of unity, so is $ω_{−1}$. Remember: the resulting values are the evaluations of $g$ on the powers of $ω_{−1}$, so $e_{i}=g(ω_{−i})=∑_{j<n}a_{j}ω_{−ij}$.
Now, let $h=∑_{i<n}e_{i}x_{i}$. That is, reinterpret the values $e_{i}$ returned by the FFT as the coefficients of a polynomial. I claim that $h$ is almost the polynomial we are looking for. Let’s calculate what values $h$ takes on at the powers of $ω$.
$h(ω_{s}) =i<n∑ e_{i}ω_{si}=i<n∑ ω_{si}j<n∑ a_{j}ω_{−ij}=i<n∑ j<n∑ a_{j}ω_{si−ij}=j<n∑ a_{j}i<n∑ ω_{i(s−j)} $
Now, let’s examine the quantity $c_{j}:=∑_{i<n}ω_{i(s−j)}$. We claim that if $j=s$, then $c_{j}=n$, and if $j=s$, then $c_{j}=0$. The first claim is clear since
$c_{s}=i<n∑ ω_{i(s−s)}=i<n∑ ω_{0}=i<n∑ 1=n$
For the second claim, we will prove that $ω_{s−j}c_{j}=c_{j}$. This implies that $(1−ω_{s−j})c_{j}=0$. So either $1−ω_{s−j}=0$ or $c_{j}=0$. The former cannot be the case since it implies $ω_{s−j}=1$ which in turn implies $s=j$ which is impossible since we are in the case $j=s$. Thus we have $c_{j}=0$ as desired.
So let’s show that $c_{j}$ is invariant under multiplication by $ω_{s−j}$. Basically, it will come down to the fact that $ω_{n}=ω_{0}$.
$ω_{s−j}c_{j} =ω_{s−j}i<n∑ ω_{i(s−j)}=i<n∑ ω_{i(s−j)+(s−j)}=i<n∑ (ω_{i+1})_{s−j}=(ω_{0+1})_{s−j}+(ω_{1+1})_{s−j}+⋯+(ω_{(n−1)+1})_{s−j}=(ω_{1})_{s−j}+(ω_{2})_{s−j}+⋯+(ω_{n})_{s−j}=(ω_{1})_{s−j}+(ω_{2})_{s−j}+⋯+(ω_{0})_{s−j}=(ω_{0})_{s−j}+(ω_{1})_{s−j}+⋯+(ω_{n−1})_{s−j}=i<n∑ (ω_{i})_{s−j}=c_{j} $
So now we know that
$h(ω_{s}) =j<n∑ a_{j}c_{j}=a_{s}⋅n+j=s∑ a_{j}⋅0=a_{s}⋅n $
So if we define $f=h/n$, then $f(ω_{s})=a_{s}$ for every $s$ as desired. Thus we have our interpolation algorithm, sometimes called an inverse FFT or IFFT:
Algorithm: computing $interp_{A_{k}}$
Input: $[a_{0},…,a_{n−1}]$ the points we want to interpolate and $ω$ a $n$th root of unity.
Interpret the input array as the coefficients of a polynomial $g=∑_{i<n}a_{i}x_{n}$.
Let $[e_{0},…,e_{n}]=FFT(k,ω_{−1},g)$.
Output the polynomial $∑_{i<n}(e_{i}/n)x_{i}$. I.e., in terms of the densecoefficients form, output the vector $[e_{0}/n,…,e_{n−1}/n]$.
Note that this algorithm also takes time $O(ngn)$
Takeaways

Polynomials can be represented as a list of coefficients or a list of evaluations on a set $A$

If the set $A$ is the set of powers of a root of unity, there are time $O(ngn)$ algorithms for converting back and forth between those two representations

In evaluations form, polynomials can be added and multiplied in time $O(n)$
 TODO: caveat about hitting degree
Exercises

Implement types
DensePolynomial<F: FfftField>
andEvaluations<F: FftField>
that wrap aVec<F>
and implement the FFT algorithms described above for converting between them 
Familiarize yourself with the types and functions provided by
ark_poly
Commitments
A “commitment scheme” is a cryptographic scheme that lets us provide a “commitment” to a given piece of data, in such a way that we can later “open” that commitment.
I recommend checking out section 2.4.1 of David’s book RealWorld Cryptography. There are two properties we typically expect of our commitment schemes:

Hiding. A commitment to a piece of data should not reveal anything about that data.

Binding. There should be no ambiguity about what data is committed to. That is, it should not be possible for the committer to open the commitment to a piece of data other than the one they originally committed to.
There are various kinds of commitments that allow opening in different ways, revealing only part of the data, or some function of the data. Sometimes it is even useful to elide the hiding property entirely— socalled nonhiding commitments.
Simple commitments
The simplest kind of a commitment is one in which opening reveals all of the underlying data. Let’s give a simple construction of such a scheme for committing to field elements. Suppose we have

$F_{p}$ a prime order field, with $p$ being rather large (say on the order of $2_{256}$).

A hash function $h:List(F_{p})→F_{p}$.
Then we define
$commit(x,r)=h([x,r])open(c)=(x,r)verify(c,x,r)=true/false$
The argument $r$ of the $commit$ function is data that must only be known to the committer (until the commitment is opened). When a committer wants to commit to a field element $x$, they sample a random “blinder” $r∈F_{p}$ and hash it together with $x$ to form the commitment.
To open the commitment $c$, they simply provide the committed value together with the blinder. Alternatively, if the verifier already knows $x$, they can just provide $r$, i.e. $open(x,c)=r$. Finally, given the commitment and the opening, we can verify whether the input was the value originally committed to using the $verify$ function.
If the hash function is collisionresistant, then this function is binding (because there’s no way the committer could find another preimage of $h([x,r])$).
If the hash function is oneway, then this commitment is also hiding (assuming $x$ is revealed only as part of the opening).
Commitments are often used in protocols between provers and verifiers. The following illustration provides an example with a prover named Peggy and a verifier named Victor. $Peggyinputblinder }→commit commitment→open ⟶⋮⟶⋮ Victorcommitmentinput,blindercommitmentinputblinder }→verify →true/false $
Here Peggy commits to an input using a blinder, obtains the commitment and sends it to Victor. The interlocutors continue their protocol, but eventually to convince Victor of her claims, Peggy must send the opening proof to her earlier commitment. Victor verifies the opening (i.e. the input and blinder) against the commitment. If the verification fails, then Victor knows that Peggy was trying to trick him, otherwise Victor has sufficient assurances that Peggy was telling the truth.
Algebraic and homomorphic commitments
Instead of a cryptographic hash function, we can use elliptic curve scalar multiplication to construct a commitment scheme. Here scalar multiplication is used like a oneway function based on the hardness assumption of the elliptic curve discrete logarithm problem (ECDLP). Suppose we have
 $F_{p}$ a prime order field, with $p$ being large (e.g. something like $2_{256}$).
 Publicly agreed generator point $G$ over an elliptic curve $E(F_{p})$
 Another publicly agreed curve point $H$ for which no one knows the discrete logarithm
$commit(x,r)=xG+rHopen(c)=(x,r)$
where $x∈F_{p}$ is the value being committed to, $r∈F_{p}$ is a random blinding factor and the commitment $c=commit(x,r)$ is a curve point.
These commitments are algebraic (i.e. they do not use a booleanbased cryptographic hash function) and have homomorphic properties: you can add commitments together to form another commitment of the added committed values. For example, if you have commitments $A$ and $B$, you can perform:
$A+B =commit(x_{a},r_{a})+commit(x_{b},r_{b})=x_{a}G+r_{a}H+x_{b}G+r_{b}H=(x_{a}+x_{b})G+(r_{a}+r_{b})H=commit(x_{a}+x_{b},r_{a}+r_{b}) $
In other words, the sum of commitments $A$ and $B$ is equal to the commitment of the sum of the two committed values $x_{a}$ and $x_{b}$ and blinders $r_{a}$ and $r_{b}$. This is possible because in such a scheme scaling is commutative with adding scalars.
As a cryptographic primitive, the ability to find a public curve point $H$ for which no one knows the discrete logarithm may, at first, seem rather mindblowing and powerful.
Actually, it’s as easy as it is awesome to find such a point— simply perform rejection sampling by cryptographically hashing $G$ (or, respectively, the hash output), using the output as the $x$coordinate of a candidate point on $E$ and checking whether it’s valid. The first valid curve point obtained is $H$ and by the hardness assumption of the ECDLP, no one knows it.
Since approximately half of the hash outputs will be valid curve points on $E$, sampling will terminate very quickly. Indeed, as we will see later, this process can be used to sample many public curve points $G_{1},…,G_{n}$ for which the discrete logarithms are unknown; the socalled hash to curve algorithm.
Pedersen commitments
The homomorphic commitment $commit(x,r)=xG+rH$ described above is known as a Pedersen commitment. If you remove the $rH$ term you get a nonhiding commitment, called a Pedersen hash. Both rely on the ECDLP hardness assumption.
This means that, at least theoretically, you might be lucky (or have a quantum computer) and figure out that $H=hG$, which would allow you to find different values $x_{′}$ and $h_{′}$ to open the commitment. We say that pedersen commitments are computationally binding and not unconditionally binding. For example, you could express $c=xG+rH$ alternatively as $c=xG+rhG=(x+rh)G$ and compute a satisfying opening pair $x_{′}=rh$ and $r_{′}=hx $.
On the other hand, Pedersen commitments are unconditionally hiding, as there is no way (even with a magic computer) to reveal what $x$ is without knowing $r$. Lack of perfect binding is the reason why most of the “proofs” we will see later in this book are not referred to as proofs, but instead are referred to as arguments of knowledge (although we may care little about this distinction). Just remember that you need perfect binding to be called a proof.
Interestingly, it is impossible to have a commitment scheme that has both perfect hiding and perfect binding.
To recap, in cryptography the following distinctions are important

Perfect. The property that an algorithm is statistically sound without hardness assumptions, also known as unconditional or statistical soundness.

Computational. The algorithm relies on a hardness assumption or computational limitation for soundness.
Thus, said another way, Pedersen commitments provide perfect hiding and computational binding.
Vector commitments
We can commit to several values $x_{1},⋯,x_{n}$ by sending separate Pedersen commitments to all of these values as such:
$x_{1}G+r_{1}H,⋮x_{n}G+r_{n}H$
But we can instead batch/aggregate all of these commitments together as a single commitment:
$x_{1}G_{1}+⋯+x_{n}G_{n}+rH$
with $G_{1},⋯,G_{n},H$ independent bases with unknown discrete logarithms.
If you represent $x$s and the $G$s as two vectors $x=(x_{1},⋯,x_{n})$ and $G=(G_{1},⋯,G_{n})$, we can quickly write the previous statement as an inner product
$xG+rH$
Vector commitments (sometimes referred to as multicommitments) are a powerful construction because an arbitrarily large vector can be committed with a single curve point.
The naive approach to constructing an opening proof for a length $n$ vector commitment has size $O(n)$. It is simply the tuple $(x_{1},…,x_{n},r)$. As we will see later, opening proofs for vector commitments is an interesting topic and there is a much more efficient algorithm.
Polynomial commitments
To construct SNARKs we use use polynomial commitments. A polynomial commitment scheme for a field $F$ (or it could even be a ring) is a way of committing to a polynomial $f∈F[x]$ to get a commitment $c$, in such a way that for any $α∈F$, you can provide $y=f(α)$, along with an “opening proof” $π$ that proves that the polynomial committed to in $c$ equals $y$ when evaluated at $α$.
In other words, it is a type of commitment $C$, a type of randomness $R$, a type of opening proof $P$ along with algorithms
$commit:F[x]×R→Copen:C×F×(F[x]×R)→F×Pverify:C×(F×P)→Bool$
such that for any $f∈F[x],r∈R,α∈F$ , we have
$c:=commit(f,r)verify(c,open(c,α,(f,r)))=true$
and if $b=f(α)$ then it is not possible to compute $π∈P$ such that
$verify(c,(b,π))=true$
In other words, if $b=f(α)$ then every $π$ which is feasible to compute results in $verify(c,(b,π))=false$.
One thing that’s pretty cool is that because polynomial commitment schemes let you construct zkSNARKs, polynomial commitment schemes imply commitment schemes with arbitrary opening functionality. TODO
Constructing polynomial commitment schemes
All known constructions of polynomial commitment schemes are a bit complicated. The easiest to describe is called the Kate (pronounced “kahTAY”) scheme, also known as “KZG”. It requires a “primeorder group with a pairing”, which is three groups $G_{1},G_{2},G_{T}$ of prime order $p$ (hence, all isomorphic cyclic groups) together with a function $e:G_{1}×G_{2}→G_{T}$ such that for any $a_{1},a_{2}∈Z$, $g_{1}∈G_{1}$, $g_{2}∈G_{2}$, we have
$e(a_{1}⋅g_{1},a_{2}⋅g_{2})=a_{1}a_{2}⋅e(g_{1},g_{2})$
$e$ is called a “pairing” or a “bilinear pairing”. What this lets us do is “multiply in the scalar” but only once.
Fix a degree bound $d$ on the polynomials we would like to be able to commit to. The KZG scheme, will let us commit to polynomials in $F_{p}[x]_{<d}$. As a preliminary, fix generators $g_{1}∈G_{1},g_{2}∈G_{2}$ arbitrarily.
The first thing to know about the KZG scheme is it requires that we randomly sample some group elements to help us. This is the dreaded and much discussed trusted setup. So, anyway, we start by sampling $τ$ at random from $F_{p}$ and computing for $i<d$,
$h_{i}:=(τ_{i})⋅g_{1}w:=τ⋅g_{2}$
And then throw away $τ$. The security depends on no one knowing $τ$, which is sometimes referred to as the toxic waste of the trusted setup. Basically we compute the generator scaled by powers of $τ$ up to the degree bound. We make a security assumption about the groups which says that all anyone can really do with group elements is take linear combinations of them.
Now suppose we have a polynomial $f∈F_{p}[x]_{<d}$ with $f=∑_{i<d}a_{i}x_{i}$ that we would like to commit to. We will describe a version of the scheme that is binding but not hiding, so it may leak information about the polynomial. Now, to commit to $f$, we compute
$c_{f}:=commit(f)=a_{0}⋅h_{0}+a_{1}⋅h_{1}+⋯+a_{d−1}⋅h_{d−1}$
so that $c_{f}∈G_{1}$ and
$c_{f} =i<d∑ a_{i}⋅h_{i}=i<d∑ a_{i}⋅τ_{i}g_{1}=i<d∑ (a_{i}τ_{i})⋅g_{1}=(i<d∑ a_{i}τ_{i})⋅g_{1}=f(τ)⋅g_{1} $
So $c_{f}$ is $g_{1}$ scaled by $f(τ)$ and the fact that $G_{1}$ is an $F_{p}$module (i.e. a vector space whose scalars come from $F_{p}$) means we can compute $f(τ)⋅g_{1}$ from the $h_{i}$ and the coefficients of $f$ without knowing $τ$.
Now how does opening work? Well, say we want to open at a point $a$ to $b=f(a)$. Then the polynomial $f−b$ vanishes at $a$, which means that it is divisible by the polynomial $x−a$ (exercise, use polynomial division and analyze the remainder).
So, the opener can compute the polynomial
$q:=x−af−b $
and commit to it as above to get a commitment $c_{q}$. And $c_{q}$ will be the opening proof. It remains only to describe verification. It works like this
$verify(c_{f},(b,c_{q})):=e(c_{q},w−a⋅g_{2})=_{?}e(c_{f}−b⋅g_{1},g_{2})$
This amounts to checking: “is the polynomial committed to $c_{f}$ equal to the polynomial committed to by $c_{q}$ times $x−a$”?
To see why, remember that $w=τ⋅g_{2}$, and say $c_{q}=s_{q}⋅g_{1}$ and $c_{f}=s_{f}⋅g_{1}$ so we are checking
$e(s_{q}⋅g_{1},(τ−a)⋅g_{2})=_{?}e((s_{f}−b)⋅g_{1},g_{2})$
which by the bilinearity of the pairing is the same as checking
$s_{q}⋅(τ−a)=s_{f}−b$
Bootleproof inner product argument
Polynomial commitments
A polynomial commitment is a scheme that allows you to commit to a polynomial (i.e. to its coefficients). Later, someone can ask you to evaluate the polynomial at some point and give them the result, which you can do as well as provide a proof of correct evaluation.
SchwartzZippel lemma
TODO: move this section where most relevant
Let $f(x)$ be a nonzero polynomial of degree $d$ over a field $F$ of size $n$, then the probability that $f(s)=0$ for a randomly chosen $s$ is at most $nd $.
In a similar fashion, two distinct degree $d$ polynomials $g(X)$ and $h(X)$ can at most intersect in $d$ points. This means that the probability that $g(s)=h(s)$ on a random $s←F$ is $∣F∣d $. This is a direct corollary of the SchwartzZipple lemma, because it is equivalent to the probability that $f(s)=0$ with $f(X)=g(X)−h(X)$
Inner product argument
What is an inner product argument?
The inner product argument is the following construction: given the commitments (for now let’s say the hash) of two vectors $a$ and $b$ of size $n$ and with entries in some field $F$, prove that their inner product $⟨a,b⟩$ is equal to $z$.
There exist different variants of this inner product argument. In some versions, none of the values ($a$, $b$ and $z$) are given, only commitments. In some other version, which is interesting to us and that I will explain here, only $a$ is unknown.
How is that useful?
Inner products arguments are useful for several things, but what we’re using them for in Mina is polynomial commitments. The rest of this post won’t make too much sense if you don’t know what a polynomial commitment is, but briefly: it allows you to commit to a polynomial $f$ and then later prove its evaluation at some point $s$. Check my post on Kate polynomial commitments for more on polynomial commitment schemes.
How does that translate to the inner product argument though? First, let’s see our polynomial $f$ as a vector of coefficients:
$f =(f_{0},⋯,f_{n})such thatf(x)=f_{0}+f_{1}x+f_{2}x_{2}+⋯+f_{n}x_{n}$
Then notice that
$f(s)=⟨f$