Note that this book is a work in progress, not necessarily reflecting the current state of Mina.
Authors: Izaak Meckler, Vanishree Rao, Matthew Ryan, Anaïs Querol, Joseph Spadavecchia, David Wong, Xiang Xie
In memory of Vitaly Zelov.
This document is intended primarily to communicate mathematical ideas to programmers with some experience of math.
To that end, we will often be ambivalent about the difference between sets and types (in whatever programming language, but usually I am imagining some idealized form of Rust or OCaml). So for example we may write
is a type
is a set
Ais a type
and these should all be interpreted the same way, assuming it makes sense in the context. Similarly, we may write either of the following, and potentially mean the same thing:
a : A
- for function types
- for defining functions
Also, we usually assume functions are computed by a program (in whatever sense of “program” makes sense in the given context). So if we say “let ”, we typically mean that
and are types in some programming language which makes sense in the context (or maybe that they are sets, again depending on context)
is actually a program computing a function of the type (again, in whatever sense of program makes sense in context)
Groups are an algebraic structure which
are used in most widely deployed signature schemes
are used in most widely deployed zk-SNARKs
generalize the notion of a collection of symmetries of an object
generalize certain aspects of numbers
I will not touch on the last two too much.
First let’s do a quick definition and then some examples.
Definition. A group is a set with a binary operation , an identity for that operation, and an inverse map such that
is associative: for all .
is an identity for . I.e., for all .
is an inverse map for with identity . I.e., for all .
So basically, an invertible binary operation. Definition in hand, we can see some examples:
The integers with , as identity, and negation as inverse.
For any natural number , the integers mod . This can be thought of as the set of numbers with the operation being followed by taking remainder mod . It can also be thought of as the group of rotations of an -gon.
For any set , the set of invertible functions (a.k.a permutations) with function composition as the operation, the identity function as the identity, and function inverse as the inverse operation.
The set of translations of a plane with composition of translations (since the composition of translations is a translation). The identity is translating by and inverse is reversing the translation. This is equivalent to the group with coordinate-wise addition as the operation and coordinate-wise negation as the inverse.
The set of rotations of a sphere with composition as the operation, doing nothing as the identity, and performing the rotation in reverse as the inverse map.
For any field , the set , with field multiplication as the operation, as the identity, and as the inverse map. This is called the group of units of .
Sometimes, instead of we simply write when is clear from context.
An abelian group is a group in which is commutative, meaning . Typically, in that case, we write the group operation as instead and we write the identity as .
Some examples are
the integers with addition
the integers mod with addition
the group of units of a field
the real numbers with addition
the rational numbers with addition
the real numbers with multiplication
the rational numbers with multiplication
vectors of integers of some fixed dimension with coordinate-wise addition
The set of polynomials over a ring , with addition
Abelian groups are equivalently described as what are called -modules. A -module is like a vector space, but with the integers instead of a field.
Basically a -module (or equivalently an Abelian group) is a structure where you can add elements and where it makes sense to multiply a group element by an integer.
If is an abelian group, we can define this “multiplication by an integer” as follows. If , then for , we can define by
and if , then define . Or equivalently.,
This is the general sense of what is called scalar-multiplication or sometimes exponentiation in cryptography.
A cyclic group is a special kind of abelian group. It is an abelian group generated by a single element . That is, a cyclic group (generated by ) is one in which for every we have for some .
Many cryptographic protocols are defined generically with respect to any abelian group that has a “hard discrete log”.
be a cyclic group
a generator of
a probability distribution on
a set of algorithms of type with runtime and memory usage bounds. In other words, a set of tuples where is a program of type , is a bound on the time that program can be run, and is a bound on the amount of memory that program can use.
In practice you fix this to be something like, the set of all computations that can be run for less than 1 trillion $.
a probability, usually taken to be something close to 0 like
Then we can define the -computational discrete-log assumption which says:
For any , if you sample h from G according to , then the probability that is smaller than , assuming successfully runs within the given resource bounds.
Sometimes this is called the computational Diffie–Helman assumption. Basically what it’s saying is that for a group element sampled randomly, you can’t practically compute how many times you have to add to itself to get .
Another really important assumption is the no-relation assumption (TODO: name).
Basically what this says is that randomly selected group elements have no efficiently computable linear relations between them. Formally, letting be a group and a probability distribution on , and a set of programs (with resource bounds) that take in a list of group elements as inputs and outputs a list of integers of the same length.
Then the -no-relation assumption says for all , for any , if you sample according to , letting , it is not the case that
except with probability (assuming program runs in time with memory ).
Now, what are some concrete groups that we can safely make the no-relation or computational discrete log hardness assumptions about?
Well, the most efficiently implementable groups that people have come up with – and that we believe satisfy the above assumptions for being the class of realistic computations and being something like – are elliptic curves over finite fields.
Giving a complete definition of what an elliptic curve is requires a lot of math, and is not very useful from the point of view of cryptography. So we will give a definition that is not complete, but more useful.
An elliptic curve over a field is a set of the form
for some , plus an additional point which is called the point at infinity. Basically it’s a set of pairs satisfying some equation of that form. The data of the elliptic curve itself is just the field together with the constants and .
What is special about elliptic curves is that there is a way of giving them a group structure with simple to compute operations that we believe satisfy the hardness assumptions described above.
Group negation is defined by
so we just negate the coordinate.
The identity for the group is , the point at infinity. For that point we may also therefore write it as .
Group addition is more complicated to define, so we will not, but here is what’s worth knowing about how to compute
There are three cases
but . In this case it turns out and so the two points are inverse, and we return
In cases 1 and 2, the algorithm to compute the result just performs a simple sequence of some field operations (multiplications, additions, divisions, subtractions) with the input values. In other words, there is a simple arithmetic formula for computing the result.
Given a curve point we know and thus is one of the two square roots of . If is a given square root, the other square root is since . So, if we know , then we almost know the whole curve point: we just need a single bit to tell us which of the two possible values (i.e., the two square roots of ) is the y coordinate.
Here is how this is commonly done over prime order fields , assuming . Remember that we represent elements of as integers in the range . In this representation, for a field element , is the integer (or if ). Thus, if is odd, then is even (since is odd and an odd minus an odd is even). Similarly, if is even, then is odd (since an odd minus an even is odd).
So, for any , unless is 0, and have opposite parities. Parity can easily be computed: it’s just the least significant bit. Thus, we have an easy way of encoding a curve point . Send
in its entirety
The least significant bit of
Given this, we can reconstruct as follows.
Compute any square root of
If the least significant bit of is equal to , then , otherwise,
In the case of Mina, our field elements require 255 bits to store. Thus, we can encode a curve point in bits, or 32 bytes. At the moment this optimized encoding is not implemented for messages on the network. It would be a welcome improvement.
The above form of representing curve points as pairs – which is called affine form – is sometimes sub-optimal from an efficiency perspective. There are several other forms in which curve points can be represented, which are called projective forms.
The simple projective form represents a curve as above as the set
If you think about it, this is saying that is a point on the original curve in affine form. In other words, in projective form we let the first two coordinates get scaled by some arbitrary scaling factor , but we keep track of it as the third coordinate.
To be clear, this means curve points have many different representations. If is a curve point in projective coordinates, and is any element of , then is another representation of the same curve point.
This means curve points require more space to store, but it makes the group operation much more efficient to compute, as we can avoid having to do any field divisions, which are expensive.
There is another form, which is also sometimes called a projective form, which is known as the jacobian form. There, a curve would be represented as
so the triple corresponds to the affine point . These are the fastest for computing group addition on a normal computer.
Use affine coordinates when the cost of division doesn’t matter and saving space is important. Specific contexts to keep in mind:
- Working with elliptic curve points inside a SNARK circuit
Use Jacobian coordinates for computations on normal computers, or other circumstances where space usage is not that costly and division is expensive.
Check out this website for the formulas for implementing the group operation.
- When cryptographers talk about “groups”, they usually mean a “computational group”, which is a group equipped with efficient algorithms for computing the group operation and inversion. This is different because a group in the mathematical sense need not have its operations be computable at all.
Implement a type
JacobianPoint<F:Field>and functions for computing the group operation
Familiarize yourself with the types and traits in
fn decompress<F: SquareRootField>(c: (F, bool)) -> (F, F)
A ring is like a field, but where elements may not be invertible. Basically, it’s a structure where we can
but not necessarily divide. If you know what polynomials are already, you can think of it as the minimal necessary structure for polynomials to make sense. That is, if is a ring, then we can define the set of polynomials (basically arithmetic expressions in the variable ) and think of any polynomial giving rise to a function defined by substituing in for in and computing using and as defined in .
So, in full, a ring is a set equipped with
gives the structure of an abelian group
is associative and commutative with identity
distributes over . I.e., for all .
The purpose of this document is to give the reader mathematical, cryptographic, and programming context sufficient to become an effective practitioner of zero-knowledge proofs and ZK-SNARKs specifically.
In this section we’ll discuss the fundamental objects used in the construction of most ZK-SNARKs in use today. These objects are used extensively and without ceremony, so it’s important to understand them very well.
If you find you don’t understand something at first, don’t worry: practice is often the best teacher and in using these objects you will find they will become like the bicycle you’ve had for years: like an extension of yourself that you use without a thought.
A field is a generalization of the concept of a number. To be precise, a field is a set equipped with
Note that the second argument to cannot be . We write these functions in the traditional infix notation writing
and we also write for and for .
Moreover all these elements and functions must obey all the usual laws of arithmetic, such as
if and only if , assuming .
In short, should be an abelian group over with as identity and as inverse, should be an abelian group over with as identity and as inverse, and addition should distribute over multiplication. If you don’t know what an abelian group is, don’t worry, we will cover it later.
The point of defining a field is that we can algebraically manipulate elements of a field the same way we do with ordinary numbers, adding, multiplying, subtracting, and dividing them without worrying about rounding, underflows, overflows, etc.
In Rust, we use the trait
Fieldto represent types that are fields. So, if we have
T : Fieldthen values of type
Tcan be multiplied, subtracted, divided, etc.
The most familiar examples of fields are the real numbers and the rational numbers (ratios of integers). Some readers may also be friends with the complex numbers which are also a field.
The fields that we use in ZKPs, however, are different. They are finite fields. A finite field is a field with finitely many elements. This is in distinction to the fields , and , which all have infinitely many elements.
In this section we’ll try to figure out from first principles what a finite field should look like. If you just want to know the answer, feel free to skip to the next section.
Let’s explore what a finite field can possibly be like. Say is a finite field. If is a natural number like or we can imagine it as an element of by writing .
Since is finite it must be the case that we can find two distinct natural numbers which are the same when interpreted as elements of .
Then, , which means the element is . Now that we have established that if you add to itself enough times you get , let be the least natural number such that if you add to itself times you get .
Now let’s show that is prime. Suppose it’s not, and . Then since in , . It is a fact about fields (exercise) that if then either is 0 or is 0. Either way, we would then get a natural number smaller than which is equal to in , which is not possible since is the smallest such number. So we have demonstrated that is prime.
The smallest number of times you have to add to itself to get 0 is called the characteristic of the field . So, we have established that every finite field has a characteristic and it is prime.
This gives us a big hint as to what finite fields should look like.
With the above in hand, we are ready to define the simplest finite fields, which are fields of prime order (also called prime order fields).
Let be a prime. Then (pronounced “eff pee” or “the field of order p”) is defined as the field whose elements are the set of natural numbers .
is defined to be
is defined to be
Basically, you just do arithmetic operations normally, except you take the remainder after dividing by . This is with the exception of division which has a funny definition. Actually, a more efficient algorithm is used in practice to calculate division, but the above is the most succinct way of writing it down.
If you want, you can try to prove that the above definition of division makes the required equations hold, but we will not do that here.
Let’s work out a few examples.
2 is a prime, so there is a field whose elements are . The only surprising equation we have is
Addition is XOR and multiplication is AND.
Let’s do a more interesting example. 5 is a prime, so there is a field whose elements are . We have
where the last equality follows because everything is mod 5.
We can confirm that 3 is in fact by multiplying 3 and 2 and checking the result is 1.
so that checks out.
In cryptography, we typically work with much larger finite fields. There are two ways to get a large finite field.
Pick a large prime and take the field . This is what we do in Mina, where we use fields in which is on the order of .
Take an extension of a small finite field. We may expand this document to talk about field extensions later, but it does not now.
For a finite field where fits in bits (i.e., ) we have
Division I believe, in practice significantly slower than multiplication.
Actually, on a practical level, it’s more accurate to model the complexity in terms of the number of limbs rather than the number of bits where a limb is 64 bits. Asymptotically it makes no difference but concretely it’s better to think about the number of limbs for the most part.
As a result you can see it’s the smaller is the better, especially with respect to multiplication, which dominates performance considerations for implementations of zk-SNARKs, since they are dominated by elliptic curve operations that consist of field operations.
While still in development, Mina used to use a field of 753 bits or 12 limbs and now uses a field of 255 bits or 4 limbs. As a result, field multiplication became automatically sped up by a factor of , so you can see it’s very useful to try to shrink the field size.
The next object to know about is polynomials. A polynomial is a syntactic object that also defines a function.
Specifically, let be a field (or more generally it could even be a ring, which is like a field but without necessarily having division, like the integers ). And pick a variable name like . Then (pronounced, “R adjoin x” or “polynomials in x over R” or “polynomials in x with R coefficients”) is the set of syntactic expressions of the form
where each and is any natural number. If we wanted to be very formal about it, we could define to be the set of lists of elements of (the lists of coefficients), and the above is just a suggestive way of writing that list.
An important fact about polynomials over is that they can be interpreted as functions . In other words, there is a function defined by
where is just a variable name without a value assigned to it and maps it to a field element and evaluates the function as the inner product between the list of coefficients and the powers of .
It is important to remember that polynomials are different than the functions that they compute. Polynomials are really just lists of coefficients. Over some fields, there are polynomials and such that , but .
For example, consider in the polynomials and . These map to the same function (meaning for it holds that ), but are distinct polynomials.
If is a polynomial in and , we write for
If is a polynomial, the degree of is the largest for which the coefficient of is non-zero in . For example, the degree of is 10.
We will use the notation and for the set of polynomials of degree less-than and less-than-or-equal respectively.
Polynomials can be added (by adding the coefficients) and multiplied (by carrying out the multiplication formally and collecting like terms to reduce to a list of coefficients). Thus is a ring.
An important fact for zk-SNARKs about polynomials is that we can use them to encode arrays of field elements. In other words, there is a way of converting between polynomials of degree and arrays of field elements of length .
This is important for a few reasons
It will allow us to translate statements about arrays of field elements into statements about polynomials, which will let us construct zk-SNARKs.
It will allow us to perform certain operations on polynomials more efficiently.
So let’s get to defining and proving this connection between arrays of field elements.
The first thing to understand is that we won’t be talking about arrays directly in this section. Instead, we’ll be talking about functions where is a finite subset of . The idea is, if has size , and we put an ordering on , then we can think of a function as the same thing as an immutable array of length , since such a function can be thought of as returning the value of an array at the input position.
With this understanding in hand, we can start describing the “fundamental theorem of polynomials”. If has size , this theorem will define an isomorphism between functions and , the set of polynomials of degree at most .
Now let’s start defining this isomorphism.
One direction of it is very easy to define. It is none other than the evaluation map, restricted to :
We would now like to construct an inverse to this map. What would that mean? It would be a function that takes as input a function (remember, basically an array of length ), and returns a polynomial which agrees with on the set . In other words, we want to construct a polynomial that interpolates between the points for .
Our strategy for constructing this polynomial will be straightforward. For each , we will construct a polynomial that is equal to when evaluated at , and equal to when evaluated anywhere else in the set .
Then, our final polynomial will be . Then, when is evaluated at , only the term will be non-zero (and equal to as desired), all the other terms will disappear.
Constructing the interpolation map requires a lemma.
Lemma. (construction of vanishing polynomials)
Let . Then there is a polynomial of degree such that evaluates to on , and is non-zero off of . is called the vanishing polynomial of .
Proof. Let . Clearly has degree . If , then since is one of the terms in the product defining , so when we evaluate at , is one of the terms. If , then all the terms in the product are non-zero, and thus is non-zero as well.
Now we can define the inverse map. Define
Since each has degree , this polynomial has degree . Now we have, for any ,
Thus is the identity.
So we have successfully devised a way of interpolating a set of points with a polynomial of degree .
What remains to be seen is that these two maps are inverse in the other direction. That is, that for any polynomial , we have
This says that if we interpolate the function that has the values of on , we get back itself. This essentially says that there is only one way to interpolate a set of points with a degree polynomial. So let’s prove it by proving that statement.
Lemma: polynomials that agree on enough points are equal.
Let be a field and suppose have degree at most . Let have size .
Suppose for all , . Then as polynomials.
Proof. Define . Our strategy will be to show that is the zero polynomial. This would then imply that as polynomials. Note that vanishes on all of since and are equal on .
To that end, let . Then we can apply the polynomial division algorithm to divide by and obtain polynomials such that and has degree less than 1. I.e., is a constant in .
Now, so and thus .
Note that is 0 on all elements with since , but .
Thus, if we iterate the above, enumerating the elements of as , we find that
Now, if is not the zero polynomial, then will have degree at least since it would have as a factor the linear terms . But since both have degree at most and , has degree at most as well. Thus, , which means as well.
This gives us the desired conclusion, that as polynomials.
Now we can easily show that interpolating the evaluations of a polynomial yields that polynomial itself. Let . Then is a polynomial of degree at most that agrees with on , a set of size . Thus, by the lemma, they are equal as polynomials. So indeed for all .
So far we have proven that and give an isomorphism of sets (i.e., a bijection) between the sets and .
But we can take this theorem a bit further. The set of functions can be given the structure of a ring, where addition and multiplication happen pointwise. I.e., for we define and . Then we can strengthen our theorem to say
Fundamental theorem of polynomials (final version)
Let and let with . With
defined as above, these two functions define an isomorphism of rings.
That is is, they are mutually inverse and each one respects addition, subtraction and multiplication.
The fundamental theorem of polynomials is very important when it comes to computing operations on polynomials. As we will see in the next section, the theorem will help us to compute the product of degree polynomials in time , whereas the naive algorithm takes time . To put this in perspective, if , is times larger than and the gap only gets bigger as grows.
There are three common representations for polynomials used in computer implementations.
Dense coefficient form. A degree polynomial is represented as a vector of length of all the coefficients. Entry of the vector corresponds to the coefficient . This corresponds to the
DensePolynomialtype in arkworks. Sometimes this is also described as writing the polynomial “in the monomial basis”, because it amounts to writing the polynomial as a linear combination of the monomials .
Sparse coefficient form. If a polynomial does not have very many non-zero coefficients, the above representation is wasteful. In sparse coefficient form, you represent a polynomial as a vector (or potentially a hash-map) of pairs
Fis the type of coefficients. The polynomial corresponding to the list
[(i_0, b_0), ..., (i_n, b_n)]is
Evaluation form. We fix an index set , with , and represent a polynomial as the vector
[f(a_0), ..., f(a_d)]. By the fundamental theorem of polynomials, this is a valid way of representing the polynomials, since the coefficients form can be obtained by using the function.
The evaluation form is very important. The reason is that multiplying two polynomials in the evaluation form takes time . You just multiply the two vectors entry-wise. By contrast, the coefficient forms naively require time to multiply.
Now, there is a trick. For certain sets , we can efficiently translate between the dense coefficients form and the evaluation form. That is, for certain , the functions and can be computed more efficiently than .
The algorithm that allows us to multiply polynomials in is called the Cooley-Tukey fast Fourier transform, or FFT for short. It was discovered by Gauss 160 years earlier, but then separately rediscovered and publicized by Cooley-Tukey.
The heart of Cooley-Tukey FFT is actually about converting between coefficient and evaluation representations, rather than the multiplication itself. Given polynomials and in dense coefficient representation, it works like this.
- Convert and from coefficient to evaluation form in using Cooley-Tukey FFT
- Compute in evaluation form by multiplying their points pairwise in
- Convert back to to coefficient form in using the inverse Cooley-Tukey FFT
The key observation is that we can choose any distinct evaluation points to represent any degree polynomial. The Cooley-Tukey FFT works by selecting points that yield an efficient FFT algorithm. These points are fixed and work for any polynomials of a given degree.
The next section describes the Cooley-Tukey FFT in detail.
This section describes how the Cooley-Tukey fast Fourier transform works. As we learned in the previous section, the key is to select evaluation points that yield an efficient FFT algorithm.
Specifically, say we have such that , and for any .
Put another way, all the values are distinct and .
Put yet another way, the group generated by inside (written ) has size .
We call such an a primitive -th root of unity.
Suppose we have an which is a primitive th root of unity and let .
The FFT algorithm will let us compute for this set.
Actually, it is easier to see how it will let us compute the algorithm efficiently.
We will describe an algorithm that takes as input
- a primitive th root of unity
- in dense coefficients form (i.e., as a vector of coefficients of length ).
and outputs the vector of evaluations and does it in time (which is to say, if ).
Notice that naively, computing each evaluation using the coefficients of would require time , and so computing all of them would require time .
The algorithm can be defined recursively as follows.
If , then is a primitive st root of unity, and is a polynomial of degree . That means and also is a constant . So, we can immediately output the array of evaluations .
If , then we will split into two polynomials, recursively call on them, and reconstruct the result from the recursive calls.
To that end, define to be the polynomial whose coefficients are all the even-index coefficients of and the polynomial whose coefficients are all the odd-index coefficients of . In terms of the array representation, this just means splitting out every other entry into two arrays. So that can be done in time .
Write , so that and . Then
Now, notice that if is a th root of unity, then is a th root of unity. Thus we can recurse with and similarly for . Let
By assumption . So, for any we have
Now, since may be larger than , we need to reduce it mod , relying on the fact that if is an th root of unity then since . Thus, and so we have
We can compute the array in time (since each entry is the previous entry times ). Then we can compute each entry of the output in as
There are such entries, so this takes time .
This concludes the recursive definition of the algorithm .
- the coefficients of polynomial
- the even coefficients of corresponding to
- the odd coefficients of corresponding to
Now let’s analyze the time complexity. Let be the complexity on an instance of size (that is, for ).
Looking back at what we have done, we have done
- for computing and
- two recursive calls, each of size
- for computing the powers of
- for combining the results of the recursive calls
In total, this is . Solving this recurrence yields . Basically, there are recursions before we hit the base case, and each step takes time .
Now, in practice there are ways to describe this algorithm non-recursively that have better concrete performance, but that’s out of scope for this document. Read the code if you are interested.
So far we have a fast way to compute all at once, where is the set of powers of a th root of unity . For convenience let .
Now we want to go the other way and compute a polynomial given an array of evaluations. Specifically, evaluations uniquely define a degree polynomial. This can be written as a system of equations
which can be rewritten as a matrix vector product. This matrix is a Vandermonde matrix and it just so happens that square Vandermonde matrices are invertible, iff the are unique. Since we purposely selected our to be the powers of , a primitive -th root of unity, by definition are unique.
Therefore, to compute the polynomial given the corresponding array of evaluations (i.e. interpolation) we can solve for the polynomial’s coefficients using the inverse of the matrix. All we need now is the inverse of this matrix, which is slightly complicated to compute. I’m going to skip it for now, but if you have the details please make a pull request.
Substituting in the inverse matrix we obtain the equation for interpolation. Observe that this equation is nearly identical to the original equation for evaluation, except with the following substitution. Consequently and perhaps surprisingly, we can reuse the FFT algorithm in order to compute the inverse– .
So, suppose we have an array of field elements (which you can think of as a function ) and we want to compute the coefficients of a polynomial with .
To this end, define a polynomial by . That is, the polynomial whose coefficients are the evaluations in our array that we’re hoping to interpolate.
Now, let .
That is, we’re going to feed into the FFT algorithm defined above with as the th root of unity. It is not hard to check that if is an n-th root of unity, so is . Remember: the resulting values are the evaluations of on the powers of , so .
Now, let . That is, re-interpret the values returned by the FFT as the coefficients of a polynomial. I claim that is almost the polynomial we are looking for. Let’s calculate what values takes on at the powers of .
Now, let’s examine the quantity . We claim that if , then , and if , then . The first claim is clear since
For the second claim, we will prove that . This implies that . So either or . The former cannot be the case since it implies which in turn implies which is impossible since we are in the case . Thus we have as desired.
So let’s show that is invariant under multiplication by . Basically, it will come down to the fact that .
So now we know that
So if we define , then for every as desired. Thus we have our interpolation algorithm, sometimes called an inverse FFT or IFFT:
Input: the points we want to interpolate and a th root of unity.
Interpret the input array as the coefficients of a polynomial .
Output the polynomial . I.e., in terms of the dense-coefficients form, output the vector .
Note that this algorithm also takes time
Polynomials can be represented as a list of coefficients or a list of evaluations on a set
If the set is the set of powers of a root of unity, there are time algorithms for converting back and forth between those two representations
In evaluations form, polynomials can be added and multiplied in time
- TODO: caveat about hitting degree
Evaluations<F: FftField>that wrap a
Vec<F>and implement the FFT algorithms described above for converting between them
Familiarize yourself with the types and functions provided by
A “commitment scheme” is a cryptographic scheme that lets us provide a “commitment” to a given piece of data, in such a way that we can later “open” that commitment.
I recommend checking out section 2.4.1 of David’s book Real-World Cryptography. There are two properties we typically expect of our commitment schemes:
Hiding. A commitment to a piece of data should not reveal anything about that data.
Binding. There should be no ambiguity about what data is committed to. That is, it should not be possible for the committer to open the commitment to a piece of data other than the one they originally committed to.
There are various kinds of commitments that allow opening in different ways, revealing only part of the data, or some function of the data. Sometimes it is even useful to elide the hiding property entirely— so-called non-hiding commitments.
The simplest kind of a commitment is one in which opening reveals all of the underlying data. Let’s give a simple construction of such a scheme for committing to field elements. Suppose we have
a prime order field, with being rather large (say on the order of ).
A hash function .
Then we define
The argument of the function is data that must only be known to the committer (until the commitment is opened). When a committer wants to commit to a field element , they sample a random “blinder” and hash it together with to form the commitment.
To open the commitment , they simply provide the committed value together with the blinder. Alternatively, if the verifier already knows , they can just provide , i.e. . Finally, given the commitment and the opening, we can verify whether the input was the value originally committed to using the function.
If the hash function is collision-resistant, then this function is binding (because there’s no way the committer could find another preimage of ).
If the hash function is one-way, then this commitment is also hiding (assuming is revealed only as part of the opening).
Commitments are often used in protocols between provers and verifiers. The following illustration provides an example with a prover named Peggy and a verifier named Victor.
Here Peggy commits to an input using a blinder, obtains the commitment and sends it to Victor. The interlocutors continue their protocol, but eventually to convince Victor of her claims, Peggy must send the opening proof to her earlier commitment. Victor verifies the opening (i.e. the input and blinder) against the commitment. If the verification fails, then Victor knows that Peggy was trying to trick him, otherwise Victor has sufficient assurances that Peggy was telling the truth.
Instead of a cryptographic hash function, we can use elliptic curve scalar multiplication to construct a commitment scheme. Here scalar multiplication is used like a one-way function based on the hardness assumption of the elliptic curve discrete logarithm problem (ECDLP). Suppose we have
- a prime order field, with being large (e.g. something like ).
- Publicly agreed generator point over an elliptic curve
- Another publicly agreed curve point for which no one knows the discrete logarithm
where is the value being committed to, is a random blinding factor and the commitment is a curve point.
These commitments are algebraic (i.e. they do not use a boolean-based cryptographic hash function) and have homomorphic properties: you can add commitments together to form another commitment of the added committed values. For example, if you have commitments and , you can perform:
In other words, the sum of commitments and is equal to the commitment of the sum of the two committed values and and blinders and . This is possible because in such a scheme scaling is commutative with adding scalars.
As a cryptographic primitive, the ability to find a public curve point for which no one knows the discrete logarithm may, at first, seem rather mind-blowing and powerful.
Actually, it’s as easy as it is awesome to find such a point— simply perform rejection sampling by cryptographically hashing (or, respectively, the hash output), using the output as the -coordinate of a candidate point on and checking whether it’s valid. The first valid curve point obtained is and by the hardness assumption of the ECDLP, no one knows it.
Since approximately half of the hash outputs will be valid curve points on , sampling will terminate very quickly. Indeed, as we will see later, this process can be used to sample many public curve points for which the discrete logarithms are unknown; the so-called hash to curve algorithm.
The homomorphic commitment described above is known as a Pedersen commitment. If you remove the term you get a non-hiding commitment, called a Pedersen hash. Both rely on the ECDLP hardness assumption.
This means that, at least theoretically, you might be lucky (or have a quantum computer) and figure out that , which would allow you to find different values and to open the commitment. We say that pedersen commitments are computationally binding and not unconditionally binding. For example, you could express alternatively as and compute a satisfying opening pair and .
On the other hand, Pedersen commitments are unconditionally hiding, as there is no way (even with a magic computer) to reveal what is without knowing . Lack of perfect binding is the reason why most of the “proofs” we will see later in this book are not referred to as proofs, but instead are referred to as arguments of knowledge (although we may care little about this distinction). Just remember that you need perfect binding to be called a proof.
Interestingly, it is impossible to have a commitment scheme that has both perfect hiding and perfect binding.
To recap, in cryptography the following distinctions are important
Perfect. The property that an algorithm is statistically sound without hardness assumptions, also known as unconditional or statistical soundness.
Computational. The algorithm relies on a hardness assumption or computational limitation for soundness.
Thus, said another way, Pedersen commitments provide perfect hiding and computational binding.
We can commit to several values by sending separate Pedersen commitments to all of these values as such:
But we can instead batch/aggregate all of these commitments together as a single commitment:
with independent bases with unknown discrete logarithms.
If you represent s and the s as two vectors and , we can quickly write the previous statement as an inner product
Vector commitments (sometimes referred to as multi-commitments) are a powerful construction because an arbitrarily large vector can be committed with a single curve point.
The naive approach to constructing an opening proof for a length vector commitment has size . It is simply the tuple . As we will see later, opening proofs for vector commitments is an interesting topic and there is a much more efficient algorithm.
To construct SNARKs we use polynomial commitments. A polynomial commitment scheme for a field (or it could even be a ring) is a way of committing to a polynomial to get a commitment , in such a way that for any , you can provide , along with an “opening proof” that proves that the polynomial committed to in equals when evaluated at .
In other words, it is a type of commitment , a type of randomness , a type of opening proof along with algorithms
such that for any , we have
and if then it is not possible to compute such that
In other words, if then every which is feasible to compute results in .
One thing that’s pretty cool is that because polynomial commitment schemes let you construct zk-SNARKs, polynomial commitment schemes imply commitment schemes with arbitrary opening functionality. TODO
All known constructions of polynomial commitment schemes are a bit complicated. The easiest to describe is called the Kate (pronounced “kah-TAY”) scheme, also known as “KZG”. It requires a “prime-order group with a pairing”, which is three groups of prime order (hence, all isomorphic cyclic groups) together with a function such that for any , , , we have
is called a “pairing” or a “bilinear pairing”. What this lets us do is “multiply in the scalar” but only once.
Fix a degree bound on the polynomials we would like to be able to commit to. The KZG scheme, will let us commit to polynomials in . As a preliminary, fix generators arbitrarily.
The first thing to know about the KZG scheme is it requires that we randomly sample some group elements to help us. This is the dreaded and much discussed trusted setup. So, anyway, we start by sampling at random from and computing for ,
And then throw away . The security depends on no one knowing , which is sometimes referred to as the toxic waste of the trusted setup. Basically we compute the generator scaled by powers of up to the degree bound. We make a security assumption about the groups which says that all anyone can really do with group elements is take linear combinations of them.
Now suppose we have a polynomial with that we would like to commit to. We will describe a version of the scheme that is binding but not hiding, so it may leak information about the polynomial. Now, to commit to , we compute
so that and
So is scaled by and the fact that is an -module (i.e. a vector space whose scalars come from ) means we can compute from the and the coefficients of without knowing .
Now how does opening work? Well, say we want to open at a point to . Then the polynomial vanishes at , which means that it is divisible by the polynomial (exercise, use polynomial division and analyze the remainder).
So, the opener can compute the polynomial
and commit to it as above to get a commitment . And will be the opening proof. It remains only to describe verification. It works like this
This amounts to checking: “is the polynomial committed to equal to the polynomial committed to by times ”?
To see why, remember that , and say and so we are checking
which by the bilinearity of the pairing is the same as checking
A polynomial commitment is a scheme that allows you to commit to a polynomial (i.e. to its coefficients). Later, someone can ask you to evaluate the polynomial at some point and give them the result, which you can do as well as provide a proof of correct evaluation.
TODO: move this section where most relevant
Let be a non-zero polynomial of degree over a field of size , then the probability that for a randomly chosen is at most .
In a similar fashion, two distinct degree polynomials and can at most intersect in points. This means that the probability that on a random is . This is a direct corollary of the Schwartz-Zipple lemma, because it is equivalent to the probability that with
The inner product argument is the following construction: given the commitments (for now let’s say the hash) of two vectors and of size and with entries in some field , prove that their inner product is equal to .
There exist different variants of this inner product argument. In some versions, none of the values (, and ) are given, only commitments. In some other version, which is interesting to us and that I will explain here, only is unknown.
Inner products arguments are useful for several things, but what we’re using them for in Mina is polynomial commitments. The rest of this post won’t make too much sense if you don’t know what a polynomial commitment is, but briefly: it allows you to commit to a polynomial and then later prove its evaluation at some point . Check my post on Kate polynomial commitments for more on polynomial commitment schemes.
How does that translate to the inner product argument though? First, let’s see our polynomial as a vector of coefficients: