Foreign Field Multiplication
This document is an original RFC explaining how we constrain foreign field multiplication (i.e. nonnative field multiplication) in the Kimchi proof system. For a more recent RFC, see FFmul RFC.
Changelog
Author(s)  Date  Details 

Joseph Spadavecchia and Anais Querol  October 2022  Design of foreign field multiplication in Kimchi 
Overview
This gate constrains
$a⋅b=cmodf$
where $a,b,c∈F_{f}$, a foreign field with modulus $f$, using the native field $F_{n}$ with prime modulus $n$.
Approach
In foreign field multiplication the foreign field modulus $f$ could be bigger or smaller than the native field modulus $n$. When the foreign field modulus is bigger, then we need to emulate foreign field multiplication by splitting the foreign field elements up into limbs that fit into the native field element size. When the foreign modulus is smaller everything can be constrained either in the native field or split up into limbs.
Since our projected use cases are when the foreign field modulus is bigger (more on this below) we optimize our design around this scenario. For this case, not only must we split foreign field elements up into limbs that fit within the native field, but we must also operate in a space bigger than the foreign field. This is because we check the multiplication of two foreign field elements by constraining its quotient and remainder. That is, renaming $c$ to $r$, we must constrain
$a⋅b=q⋅f+r,$
where the maximum size of $q$ and $r$ is $f−1$ and so we have
$a⋅b ≤q(f−1) ⋅f+r(f−1) ≤f_{2}−1. $
Thus, the maximum size of the multiplication is quadratic in the size of foreign field.
Naïve approach
Naïvely, this implies that we have elements of size $f_{2}−1$ that must split them up into limbs of size at most $n−1$. For example, if the foreign field modulus is $256$ and the native field modulus is $255$ bits, then we’d need $g_{2}((2_{256})_{2}−1)≈512$ bits and, thus, require $512/255≈3$ native limbs. However, each limb cannot consume all $255$ bits of the native field element because we need space to perform arithmetic on the limbs themselves while constraining the foreign field multiplication. Therefore, we need to choose a limb size that leaves space for performing these computations.
Later in this document (see the section entitled “Choosing the limb configuration”) we determine the optimal number of limbs that reduces the number of rows and gates required to constrain foreign field multiplication. This results in $ℓ=88$ bits as our optimal limb size. In the section about intermediate products we place some upperbounds on the number of bits required when constraining foreign field multiplication with limbs of size $ℓ$ thereby proving that the computations can fit within the native field size.
Observe that by combining the naïve approach above with a limb size of $88$ bits, we would require $512/88≈6$ limbs for representing foreign field elements. Each limb is stored in a witness cell (a native field element). However, since each limb is necessarily smaller than the native field element size, it must be copied to the rangecheck gate to constrain its value. Since Kimchi only supports 7 copyable witness cells per row, this means that only one foreign field element can be stored per row. This means a single foreign field multiplication would consume at least 4 rows (just for the operands, quotient and remainder). This is not ideal because we want to limit the number of rows for improved performance.
Chinese remainder theorem
Fortunately, we can do much better than this, however, by leveraging the chinese remainder theorem (CRT for short) as we will now show. The idea is to reduce the number of bits required by constraining our multiplication modulo two coprime moduli: $2_{t}$ and $n$. By constraining the multiplication with both moduli the CRT guarantees that the constraints hold with the bigger modulus $2_{t}⋅n$.
The advantage of this approach is that constraining with the native modulus $n$ is very fast, allowing us to speed up our nonnative computations. This practically reduces the costs to constraining with limbs over $2_{t}$, where $t$ is something much smaller than $512$.
For this to work, we must select a value for $t$ that is big enough. That is, we select $t$ such that $2_{t}⋅n>f_{2}−1$. As described later on in this document, by doing so we reduce the number of bits required for our use cases to $264$. With $88$ bit limbs this means that each foreign field element only consumes $3$ witness elements, and that means the foreign field multiplication gate now only consumes $2$ rows. The section entitled “Choosing $t$” describes this in more detail.
Overall approach
Bringing it all together, our approach is to verify that
$a⋅b=q⋅f+r $
over $Z_{+}$. In order to do this efficiently we use the CRT, which means that the equation holds mod $M=2_{t}⋅n$. For the equation to hold over the integers we must also check that each side of the equation is less than $2_{t}⋅n$.
The overall steps are therefore
 Apply CRT to equation (1)
 Check validity with binary modulus $mod2_{t}$
 Check validity with native modulus $modn$
 Check each side of equation (1) is less than $M$
 $a⋅b<2_{t}⋅n$
 $q⋅f+r<2_{t}⋅n$
This then implies that
$a⋅b=cmodf.$
where $c=r$. That is, $a$ multiplied with $b$ is equal to $c$ where $a,b,c∈F_{f}$. There is a lot more to each of these steps. That is the subject of the rest of this document.
Other strategies
Within our overall approach, aside from the CRT, we also use a number of other strategies to reduce the number and degree of constraints.
 Avoiding borrows and carries
 Intermediate product elimination
 Combining multiplications
The rest of this document describes each part in detail.
Parameter selection
This section describes important parameters that we require and how they are computed.
 Native field modulus $n$
 Foreign field modulus $f$
 Binary modulus $2_{t}$
 CRT modulus $2_{t}⋅n$
 Limb size in bits $ℓ$
Choosing $t$
Under the hood, we constrain $a⋅b=q⋅f+rmod2_{t}⋅n$. Since this is a multiplication in the foreign field $f$, the maximum size of $q$ and $r$ are $f−1$, so this means
$a⋅b ≤(f−1)⋅f+(f−1)≤f_{2}−1. $
Therefore, we need the modulus $2_{t}⋅n$ such that
$2_{t}⋅n>f_{2}−1,$
which is the same as saying, given $f$ and $n$, we must select $t$ such that
$2_{t}⋅nt ≥f_{2}≥2g_{2}(f)−g_{2}(n). $
Thus, we have an lower bound on $t$.
Instead of dynamically selecting $t$ for every $n$ and $f$ combination, we fix a $t$ that will work for the different selections of $n$ and $f$ relevant to our use cases.
To guide us, from above we know that
$t_{min}=2g_{2}(f)−g_{2}(n)$
and we know the field moduli for our immediate use cases.
vesta = 2^254 + 45560315531506369815346746415080538113 (255 bits)
pallas = 2^254 + 45560315531419706090280762371685220353 (255 bits)
secp256k1 = 2^256  2^32  2^9  2^8  2^7  2^6  2^4  1 (256 bits)
So we can create a table
$n$  $f$  $t_{min}$ 

vesta  secp256k1  258 
pallas  secp256k1  258 
vesta  pallas  255 
pallas  vesta  255 
and know that to cover our use cases we need $t≥258$.
Next, given our native modulus $n$ and $t$, we can compute the maximum foreign field modulus supported. Actually, we compute the maximum supported bit length of the foreign field modulus $f=2_{m}$.
$2_{t}⋅nt+g_{2}(n) ≥f_{2}≥(2_{m})_{2}=2_{2m}>g_{2}(2_{2m})=2m $
So we get
$m_{max}=2t+g_{2}(n) .$
With $t=258,n=255$ we have
$m_{max} =2258+255 =256.5, $
which is not enough space to handle anything larger than 256 bit moduli. Instead, we will use $t=264$, giving $m_{max}=259$ bits.
The above formula is useful for checking the maximum number of bits supported of the foreign field modulus, but it is not useful for computing the maximum foreign field modulus itself (because $2_{m_{max}}$ is too coarse). For these checks, we can compute our maximum foreign field modulus more precisely with
$max_{mod}=⌊2_{t}⋅n ⌋.$
The max prime foreign field modulus satisfying the above inequality for both Pallas and Vesta is 926336713898529563388567880069503262826888842373627227613104999999999999999607
.
Choosing the limb configuration
Choosing the right limb size and the right number of limbs is a careful balance between the number of constraints (i.e. the polynomial degree) and the witness length (i.e. the number of rows). Because one limiting factor that we have in Kimchi is the 12bit maximum for range check lookups, we could be tempted to introduce 12bit limbs. However, this would mean having many more limbs, which would consume more witness elements and require significantly more rows. It would also increase the polynomial degree by increasing the number of constraints required for the intermediate products (more on this later).
We need to find a balance between the number of limbs and the size of the limbs. The limb configuration is dependent on the value of $t$ and our maximum foreign modulus (as described in the previous section). The larger the maximum foreign modulus, the more witness rows we will require to constrain the computation. In particular, each limb needs to be constrained by the range check gate and, thus, must be in a copyable (i.e. permuteable) witness cell. We have at most 7 copyable cells per row and gates can operate on at most 2 rows, meaning that we have an upperbound of at most 14 limbs per gate (or 7 limbs per row).
As stated above, we want the foreign field modulus to fit in as few rows as possible and we need to constrain operands $a,b$, the quotient $q$ and remainder $r$. Each of these will require cells for each limb. Thus, the number of cells required for these is
$cells=4⋅limbs$
It is highly advantageous for performance to constrain foreign field multiplication with the minimal number of gates. This not only helps limit the number of rows, but also to keep the gate selector polynomial small. Because of all of this, we aim to constrain foreign field multiplication with a single gate (spanning at most $2$ rows). As mentioned above, we have a maximum of 14 permuteable cells per gate, so we can compute the maximum number of limbs that fit within a single gate like this.
$limbs_{max} =⌊cells/4⌋=⌊14/4⌋=3 $
Thus, the maximum number of limbs possible in a single gate configuration is 3.
Using $limbs_{max}=3$ and $t=264$ that covers our use cases (see the previous section), we are finally able to derive our limbs size
$ℓ =limbs_{max}t =264/3=88 $
Therefore, our limb configuration is:
 Limb size $ℓ=88$ bits
 Number of limbs is $3$
Avoiding borrows
When we constrain $a⋅b−q⋅f=rmod2_{t}$ we want to be as efficient as possible.
Observe that the expansion of $a⋅b−q⋅f$ into limbs would also have subtractions between limbs, requiring our constraints to account for borrowing. Dealing with this would create undesired complexity and increase the degree of our constraints.
In order to avoid the possibility of subtractions we instead use $a⋅b+q⋅f_{′}$ where
$f_{′} =−fmod2_{t}=2_{t}−f $
The negated modulus $f_{′}$ becomes part of our gate coefficients and is not constrained because it is publicly auditable.
Using the substitution of the negated modulus, we now must constrain $a⋅b+q⋅f_{′}=rmod2_{t}$.
Observe that $f_{′}<2_{t}$ since $f<2_{t}$ and that $f_{′}>f$ when $f<2_{t−1}$.
Intermediate products
This section explains how we expand our constraints into limbs and then eliminate a number of extra terms.
We must constrain $a⋅b+q⋅f_{′}=rmod2_{t}$ on the limbs, rather than as a whole. As described above, each foreign field element $x$ is split into three 88bit limbs: $x_{0},x_{1},x_{2}$, where $x_{0}$ contains the least significant bits and $x_{2}$ contains the most significant bits and so on.
Expanding the righthand side into limbs we have
$ (a_{0}+a_{1}⋅2_{ℓ}+a_{2}⋅2_{2ℓ})⋅(b_{0}+b_{1}⋅2_{ℓ}+b_{2}⋅2_{2ℓ})+(q_{0}+q_{1}⋅2_{ℓ}+q_{2}⋅2_{2ℓ})⋅(f_{0}+f_{1}⋅2_{ℓ}+f_{2}⋅2_{2ℓ})=a_{0}⋅b_{0}+a_{0}⋅b_{1}⋅2_{ℓ}+a_{0}⋅b_{2}⋅2_{2ℓ}+a_{1}⋅b_{0}⋅2_{ℓ}+a_{1}⋅b_{1}⋅2_{2ℓ}+a_{1}⋅b_{2}⋅2_{3ℓ}+a_{2}⋅b_{0}⋅2_{2ℓ}+a_{2}⋅b_{1}⋅2_{3ℓ}+a_{2}⋅b_{2}⋅2_{4ℓ}+q_{0}⋅f_{0}+q_{0}⋅f_{1}⋅2_{ℓ}+q_{0}⋅f_{2}⋅2_{2ℓ}+q_{1}⋅f_{0}⋅2_{ℓ}+q_{1}⋅f_{1}⋅2_{2ℓ}+q_{1}⋅f_{2}⋅2_{3ℓ}+q_{2}⋅f_{0}⋅2_{2ℓ}+q_{2}⋅f_{1}⋅2_{3ℓ}+q_{2}⋅f_{2}⋅2_{4ℓ}=a_{0}⋅b_{0}+q_{0}⋅f_{0}+2_{ℓ}⋅(a_{0}⋅b_{1}+a_{1}⋅b_{0}+q_{0}⋅f_{1}+q_{1}⋅f_{0})+2_{2ℓ}⋅(a_{0}⋅b_{2}+a_{2}⋅b_{0}+q_{0}⋅f_{2}+q_{2}⋅f_{0}+a_{1}⋅b_{1}+q_{1}⋅f_{1})+2_{3ℓ}⋅(a_{1}⋅b_{2}+a_{2}⋅b_{1}+q_{1}⋅f_{2}+q_{2}⋅f_{1})+2_{4ℓ}⋅(a_{2}⋅b_{2}+q_{2}⋅f_{2}) $
Since $t=3ℓ$, the terms scaled by $2_{3ℓ}$ and $2_{4ℓ}$ are a multiple of the binary modulus and, thus, congruent to zero $mod2_{t}$. They can be eliminated and we don’t need to compute them. So we are left with 3 intermediate products that we call $p_{0},p_{1},p_{2}$:
Term  Scale  Product 

$p_{0}$  $1$  $a_{0}b_{0}+q_{0}f_{0}$ 
$p_{1}$  $2_{ℓ}$  $a_{0}b_{1}+a_{1}b_{0}+q_{0}f_{1}+q_{1}f_{0}$ 
$p_{2}$  $2_{2ℓ}$  $a_{0}b_{2}+a_{2}b_{0}+q_{0}f_{2}+q_{2}f_{0}+a_{1}b_{1}+q_{1}f_{1}$ 
So far, we have introduced these checked computations to our constraints
 Computation of $p_{0},p_{1},p_{2}$
Constraining $mod2_{t}$
Let’s call $p:=ab+qf_{′}mod2_{t}$. Remember that our goal is to constrain that $p−r=0mod2_{t}$ (recall that any more significant bits than the 264th are ignored in $mod2_{t}$). Decomposing that claim into limbs, that means
$2_{2ℓ}(p_{2}−r_{2})+2_{ℓ}(p_{1}−r_{1})+p_{0}−r_{0}=0mod2_{t}. $
We face two challenges
 Since $p_{0},p_{1},p_{2}$ are at least $2_{ℓ}$ bits each, the right side of the equation above does not fit in $F_{n}$
 The subtraction of the remainder’s limbs $r_{0}$ and $r_{1}$ could require borrowing
For the moment, let’s not worry about the possibility of borrows and instead focus on the first problem.
Combining multiplications
The first problem is that our native field is too small to constrain $2_{2ℓ}(p_{2}−r_{2})+2_{ℓ}(p_{1}−r_{1})+p_{0}−r_{0}=0mod2_{t}$. We could split this up by multiplying $a⋅b$ and $q⋅f_{′}$ separately and create constraints that carefully track borrows and carries between limbs. However, a more efficient approach is combined the whole computation together and accumulate all the carries and borrows in order to reduce their number.
The trick is to assume a space large enough to hold the computation, view the outcome in binary and then split it up into chunks that fit in the native modulus.
To this end, it helps to know how many bits these intermediate products require. On the left side of the equation, $p_{0}$ is at most $2ℓ+1$ bits. We can compute this by substituting the maximum possible binary values (all bits set to 1) into $p_{0}=a_{0}b_{0}+q_{0}f_{0}$ like this
$maxbits(p_{0}) =g_{2}(a_{0}(2_{ℓ}−1) b_{0}(2_{ℓ}−1) +q_{0}(2_{ℓ}−1) f_{0}(2_{ℓ}−1) )=g_{2}(2(2_{2ℓ}−2_{ℓ+1}+1))=g_{2}(2_{2ℓ+1}−2_{ℓ+2}+2). $
So $p_{0}$ fits in $2ℓ+1$ bits. Similarly, $p_{1}$ needs at most $2ℓ+2$ bits and $p_{2}$ is at most $2ℓ+3$ bits.
Term  $p_{0}$  $p_{1}$  $p_{2}$ 

Bits  $2ℓ+1$  $2ℓ+2$  $2ℓ+3$ 
The diagram below shows the right hand side of the zerosum equality from equation (2). That is, the value $p−r$. Let’s look at how the different bits of $p_{0},p_{1},p_{2},r_{0},r_{1}$ and $r_{2}$ impact it.
0 L 2L 3L 4L

:
p0: 2L + 1
:
:p1 2L + 2
p10➚ : p11➚
p2 2L + 3
:
r0 :
:
r1
:
r2
\__________________________/ \______________________________/
≈ h0 ≈ h1
Within our native field modulus we can fit up to $2ℓ+δ<g_{2}(n)$ bits, for small values of $δ$ (but sufficient for our case). Thus, we can only constrain approximately half of $p−r$ at a time. In the diagram above the vertical line at 2L bisects $p−r$ into two $≈2ℓ$ bit values: $h_{0}$ and $h_{1}$ (the exact definition of these values follow). Our goal is to constrain $h_{0}$ and $h_{1}$.
Computing the zerosum halves: $h_{0}$ and $h_{1}$
Now we can derive how to compute $h_{0}$ and $h_{1}$ from $p$ and $r$.
The direct approach would be to bisect both $p_{0}$ and $p_{1}$ and then define $h_{0}$ as just the sum of the $2ℓ$ lower bits of $p_{0}$ and $p_{1}$ minus $r_{0}$ and $r_{1}$. Similarly $h_{1}$ would be just the sum of upper bits of $p_{0},p_{1}$ and $p_{2}$ minus $r_{2}$. However, each bisection requires constraints for the decomposition and range checks for the two halves. Thus, we would like to avoid bisections as they are expensive.
Ideally, if our $p$’s lined up on the $2ℓ$ boundary, we would not need to bisect at all. However, we are unlucky and it seems like we must bisect both $p_{0}$ and $p_{1}$. Fortunately, we can at least avoid bisecting $p_{0}$ by allowing it to be summed into $h_{0}$ like this
$h_{0}=p_{0}+2_{ℓ}⋅p_{10}−r_{0}−2_{ℓ}⋅r_{1}$
Note that $h_{0}$ is actually greater than $2ℓ$ bits in length. This may not only be because it contains $p_{0}$ whose length is $2ℓ+1$, but also because adding $p_{10}$ may cause an overflow. The maximum length of $h_{0}$ is computed by substituting in the maximum possible binary value of $2_{ℓ}−1$ for the added terms and $0$ for the subtracted terms of the above equation.
$maxbits(h_{0}) =g_{2}(p_{0}(2_{ℓ}−1)(2_{ℓ}−1)+(2_{ℓ}−1)(2_{ℓ}−1) +2_{ℓ}⋅p_{10}(2_{ℓ}−1) )=g_{2}(2_{2ℓ+1}−2_{ℓ+2}+2+2_{2ℓ}−2_{ℓ})=g_{2}(3⋅2_{2ℓ}−5⋅2_{ℓ}+2) $
which is $2ℓ+2$ bits.
N.b. This computation assumes correct sizes values for $r_{0}$ and $r_{1}$, which we assure by range checks on the limbs.
Next, we compute $h_{1}$ as
$h_{1}=p_{11}+p_{2}−r_{2}$
The maximum size of $h_{1}$ is computed as
$maxbits(h_{1}) =maxbits(p_{11}+p_{2}) $
In order to obtain the maximum value of $p_{11}$, we define $p_{11}:=2_{ℓ}p_{1} $. Since the maximum value of $p_{1}$ was $2_{2ℓ+2}−2_{ℓ+3}+4$, then the maximum value of $p_{11}$ is $2_{ℓ+2}−8$. For $p_{2}$, the maximum value was $6⋅2_{2ℓ}−12⋅2_{ℓ}+6$, and thus:
$maxbits(h_{1}) =log_{2}(p_{11}2_{ℓ+2}−8 +p_{2}6⋅2_{2ℓ}−12⋅2_{ℓ}+6 )=g_{2}(6⋅2_{2ℓ}−8⋅2_{ℓ}−2) $
which is $2ℓ+3$ bits.
Term  $h_{0}$  $h_{1}$ 

Bits  $2ℓ+2$  $2ℓ+3$ 
Thus far we have the following constraints
 Composition of $p_{10}$ and $p_{11}$ result in $p_{1}$
 Range check $p_{11}∈[0,2_{ℓ+2})$
 Range check $p_{10}∈[0,2_{ℓ})$
For the next step we would like to constrain $h_{0}$ and $h_{1}$ to zero. Unfortunately, we are not able to do this!

Firstly, as defined $h_{0}$ may not be zero because we have not bisected it precisely at $2ℓ$ bits, but have allowed it to contain the full $2ℓ+2$ bits. Recall that these two additional bits are because $p_{0}$ is at most $2ℓ+1$ bits long, but also because adding $p_{10}$ increases it to $2ℓ+2$. These two additional bits are not part of the first $2ℓ$ bits of $p−r$ and, thus, are not necessarily zero. That is, they are added from the second $2ℓ$ bits (i.e. $h_{1}$).

Secondly, whilst the highest $ℓ+3$ bits of $p−r$ would wrap to zero $mod2_{t}$, when placed into the smaller $2ℓ+3$ bit $h_{1}$ in the native field, this wrapping does not happen. Thus, $h_{1}$’s $ℓ+3$ highest bits may be nonzero.
We can deal with this nonzero issue by computing carry witness values.
Computing carry witnesses values $v_{0}$ and $v_{1}$
Instead of constraining $h_{0}$ and $h_{1}$ to zero, there must be satisfying witness $v_{0}$ and $v_{1}$ such that the following constraints hold.
 There exists $v_{0}$ such that $h_{0}=v_{0}⋅2_{2ℓ}$
 There exists $v_{1}$ such that $h_{1}=v_{1}⋅2_{ℓ}−v_{0}$
Here $v_{0}$ is the last two bits of $h_{0}$’s $2ℓ+2$ bits, i.e., the result of adding the highest bit of $p_{0}$ and any possible carry bit from the operation of $h_{0}$. Similarly, $v_{1}$ corresponds to the highest $ℓ+3$ bits of $h_{1}$. It looks like this
0 L 2L 3L 4L

:
h0: 2L + 2
: ↖v0
:h1 2L + 3
: \____________/
: v1➚
Remember we only need to prove the first $3ℓ$ bits of $p−r$ are zero, since everything is $mod2_{t}$ and $t=3ℓ$. It may not be clear how this approach proves the $3ℓ$ bits are indeed zero because within $h_{0}$ and $h_{1}$ there are bits that are nonzero. The key observation is that these bits are too high for $mod2_{t}$.
By making the argument with $v_{0}$ and $v_{1}$ we are proving that $h_{0}$ is something where the $2ℓ$ least significant bits are all zeros and that $h_{1}+v_{0}$ is something where the $ℓ$ are also zeros. Any nonzero bits after $3ℓ$ do not matter, since everything is $mod2_{t}$.
All that remains is to range check $v_{0}$ and $v_{1}$
 Range check $v_{0}∈[0,2_{2})$
 Range check $v_{1}=∈[0,2_{ℓ+3})$
Subtractions
When unconstrained, computing $u_{0}=p_{0}+2_{ℓ}⋅p_{10}−(r_{0}+2_{ℓ}⋅r_{1})$ could require borrowing. Fortunately, we have the constraint that the $2ℓ$ least significant bits of $u_{0}$ are 0
(i.e. $u_{0}=2_{2ℓ}⋅v_{0}$), which means borrowing cannot happen.
Borrowing is prevented because when the $2ℓ$ least significant bits of the result are 0
it is not possible for the minuend to be less than the subtrahend. We can prove this by contradiction.
Let
 $x=p_{0}+2_{ℓ}⋅p_{10}$
 $y=r_{0}+2_{ℓ}⋅r_{1}$
 the $2ℓ$ least significant bits of $x−y$ be
0
Suppose that borrowing occurs, that is, that $x<y$. Recall that the length of $x$ is at most $2ℓ+2$ bits. Therefore, since $x<y$ the top two bits of $x$ must be zero and so we have
$x−y=x_{2ℓ}−y,$
where $x_{2ℓ}$ denotes the $2ℓ$ least significant bits of $x$.
Recall also that the length of $y$ is $2ℓ$ bits. We know this because limbs of the result are each constrained to be in $[0,2_{ℓ})$. So the result of this subtraction is $2ℓ$ bits. Since the $2ℓ$ least significant bits of the subtraction are 0
this means that
$x−yx =0=y, $
which is a contradiction with $x<y$.
Costs
Range checks should be the dominant cost, let’s see how many we have.
Range check (3) requires two range checks for $p_{11}=p_{111}⋅2_{ℓ}+p_{110}$
 a) $p_{110}∈[0,2_{ℓ})$
 b) $p_{111}∈[0,2_{2})$
Range check (8) requires two range checks and a decomposition check that is merged in (6).
 a) $v_{10}∈[0,2_{ℓ})$
 b) $v_{11}∈[0,2_{3})$
The range checks on $p_{0},p_{1}$ and $p_{2}$ follow from the range checks on $a,b$ and $q$.
So we have 3.a, 3.b, 4, 7, 8.a, 8.b.
Range check  Gate type(s)  Witness  Rows 

7  $(v_{0}−3)(v_{0}−2)(v_{0}−1)v_{0}$  $v_{0}$  < 1 
3.a  $(p_{111}−3)(p_{111}−2)(p_{111}−1)p_{111}$  $p_{111}$  < 1 
8.b  degree8 constraint or plookup  $v_{11}$  1 
3.b, 4, 8.a  multirangecheck  $p_{10},p_{110},v_{10}$  4 
So we have 1 multirangecheck, 1 singlerangecheck and 2 lowdegree range checks. This consumes just over 5 rows.
Use CRT to constrain $a⋅b−q⋅f−r≡0modn$
Until now we have constrained the equation $mod2_{t}$, but remember that our application of the CRT means that we must also constrain the equation $modn$. We are leveraging the fact that if the identity holds for all moduli in $M={n,2_{t}}$, then it holds for $lcm(M)=2_{t}⋅n=M$.
Thus, we must check $a⋅b−q⋅f−r≡0modn$, which is over $F_{n}$.
This gives us equality $mod2_{t}⋅n$ as long as the divisors are coprime. That is, as long as $gcd(2_{t},n)=1$. Since the native modulus $n$ is prime, this is true.
Thus, to perform this check is simple. We compute
$a_{n}b_{n}q_{n}r_{n}f_{n} =amodn=bmodn=qmodn=rmodn=fmodn $
using our native field modulus with constraints like this
$a_{n}b_{n}q_{n}r_{n}f_{n} =2_{2ℓ}⋅a_{2}+2_{ℓ}⋅a_{1}+a_{0}=2_{2ℓ}⋅b_{2}+2_{ℓ}⋅b_{1}+b_{0}=2_{2ℓ}⋅q_{2}+2_{ℓ}⋅q_{1}+q_{0}=2_{2ℓ}⋅r_{2}+2_{ℓ}⋅r_{1}+r_{0}=2_{2ℓ}⋅f_{2}+2_{ℓ}⋅f_{1}+f_{0} $
and then constrain
$a_{n}⋅b_{n}−q_{n}⋅f_{n}−r_{n}=0modn.$
Note that we do not use the negated foreign field modulus here.
This requires a single constraint of the form
 $a_{n}⋅b_{n}−q_{n}⋅f_{n}=r_{n}$
with all of the terms expanded into the limbs according the above equations. The values $a_{n},b_{n},q_{n},f_{n}$ and $r_{n}$ do not need to be in the witness.
Range check both sides of $a⋅b=q⋅f+r$
Until now we have constrained that equation $a⋅b=q⋅f+r$ holds modulo $2_{t}$ and modulo $n$, so by the CRT it holds modulo $M=2_{t}⋅n$. Remember that, as outlined in the “Overview” section, we must prove our equation over the positive integers, rather than $modM$. By doing so, we assure that our solution is not equal to some $q_{′}⋅M+r_{′}$ where $q_{′},r_{′}∈F_{M}$, in which case $q$ or $r$ would be invalid.
First, let’s consider the right hand side $q⋅f+r$. We have
$q⋅f+r<2_{t}⋅n$
Recall that we have parameterized $2_{t}⋅n≥f_{2}$, so if we can bound $q$ and $r$ such that
$q⋅f+r<f_{2}$
then we have achieved our goal. We know that $q$ must be less than $f$, so that is our first check, leaving us with
$(f−1)⋅f+rr <f_{2}<f_{2}−(f−1)⋅f=f $
Therefore, to check $q⋅f+r<2_{t}⋅n$, we need to check
 $q<f$
 $r<f$
This should come at no surprise, since that is how we parameterized $2_{t}⋅n$ earlier on. Note that by checking $q<f$ we assure correctness, while checking $r<f$ assures our solution is unique (sometimes referred to as canonical).
Next, we must perform the same checks for the left hand side (i.e., $a⋅b<2_{t}⋅n$). Since $a$ and $b$ must be less than the foreign field modulus $f$, this means checking
 $a<f$
 $b<f$
So we have
$a⋅b ≤(f−1)⋅(f−1)=f_{2}−2f+1 $
Since $2_{t}⋅n≥f_{2}$ we have
$ f_{2}−2f+1<f_{2}≤2_{t}⋅n⟹a⋅b<2_{t}⋅n $
Bound checks
To perform the above range checks we use the upper bound check method described in the upper bound check section in Foreign Field Addition.
The upper bound check is as follows. We must constrain $0≤q<f$ over the positive integers, so
$2_{t}≤q2_{t}−f≤q +2_{t}<f+2_{t}+2_{t}−f<2_{t} $
Remember $f_{′}=2_{t}−f$ is our negated foreign field modulus. Thus, we have
$f_{′}≤q +f_{′}<2_{t} $
So to check $q<t$ we just need to compute $q_{′}=q+f_{′}$ and check $f_{′}≤q_{′}<2_{t}$
Observe that
$0≤q⟹f_{′}≤q_{′}$
and that
$q_{′}<2_{t}⟹q<f$
So we only need to check
 $0≤q$
 $q_{′}<2_{t}$
The first check is always true since we are operating over the positive integers and $q∈Z_{+}$. Observe that the second check also constrains that $q<2_{t}$, since $f≤2_{259}<2_{t}$ and thus
$q_{′}q+f_{′}qq ≤2_{t}≤2_{t}≤2_{t}−(2_{t}−f)=f<2_{t} $
Therefore, to constrain $q<f$ we only need constraints for
 $q_{′}=q+f_{′}$
 $q_{′}<2_{t}$
and we don’t require an additional multirangecheck for $q<2_{t}$.
Cost of bound checks
This section analyzes the structure and costs of bounds checks for foreign field addition and foreign field multiplication.
Addition
In our foreign field addition design the operands $a$ and $b$ do not need to be less than $f$. The field overflow bit $o$ for foreign field addition is at most 1. That is, $a+b=o⋅f+r$, where $r$ is allowed to be greater than $f$. Therefore,
$(f+a)+(f+b)=1⋅f+(f+a+b)$
These can be chained along $k$ times as desired. The final result
$r=(kf+⋯+f +a_{1}+b_{1}+⋯a_{k}+b_{k})$
Since the bit length of $r$ increases logarithmically with the number of additions, in Kimchi we must only check that the final $r$ in the chain is less than $f$ to constrain the entire chain.
Security note: In order to defer the $r<f$ check to the end of any chain of additions, it is extremely important to consider the potential impact of wraparound in $F_{n}$. That is, we need to consider whether the addition of a large chain of elements greater than the foreign field modulus could wrap around. If this could happen then the $r<f$ check could fail to detect an invalid witness. Below we will show that this is not possible in Kimchi.
Recall that our foreign field elements are comprised of 3 limbs of 88bits each that are each represented as native field elements in our proof system. In order to wrap around and circumvent the $r<f$ check, the highest limb would need to wrap around. This means that an attacker would need to perform about $k≈n/2_{ℓ}$ additions of elements greater than then foreign field modulus. Since Kimchi’s native moduli (Pallas and Vesta) are 255bits, the attacker would need to provide a witness for about $k≈2_{167}$ additions. This length of witness is greater than Kimchi’s maximum circuit (resp. witness) length. Thus, it is not possible for the attacker to generate a false proof by causing wraparound with a large chain of additions.
In summary, for foreign field addition in Kimchi it is sufficient to only bound check the last result $r_{′}$ in a chain of additions (and subtractions)
 Compute bound $r_{′}=r+f_{′}$ with addition gate (2 rows)
 Range check $r_{′}<2_{t}$ (4 rows)
Multiplication
In foreign field multiplication, the situation is unfortunately different, and we must check that each of $a,b,q$ and $r$ are less than $f$. We cannot adopt the strategy from foreign field addition where the operands are allowed to be greater than $f$ because the bit length of $r$ would increases linearly with the number of multiplications. That is,
$(a_{1}+f)⋅(a_{2}+f)=1⋅f+rf_{2}+(a_{1}+a_{2}−1)⋅f+a_{1}⋅a_{2} $
and after a chain of $k$ multiplication we have
$r=f_{k}+…+a_{1}⋯a_{k}$
where $r>f_{k}$ quickly overflows our CRT modulus $2_{t}⋅n$. For example, assuming our maximum foreign modulus of $f=2_{259}$ and either of Kimchi’s native moduli (i.e. Pallas or Vesta), $f_{k}>2_{t}⋅n$ for $k>2$. That is, an overflow is possible for a chain of greater than 1 foreign field multiplication. Thus, we must check $a,b,q$ and $r$ are less than $f$ for each multiplication.
Fortunately, in many situations the input operands may already be checked either as inputs or as outputs of previous operations, so they may not be required for each multiplication operation.
Thus, the $q_{′}$ and $r_{′}$ checks (or equivalently $q$ and $r$) are our main focus because they must be done for every multiplication.
 Compute bound $q_{′}=q+f_{′}$ with addition gate (2 rows)
 Compute bound $r_{′}=r+f_{′}$ with addition gate (2 rows)
 Range check $q_{′}<2_{t}$ (4 rows)
 Range check $r_{′}<2_{t}$ (4 rows)
This costs 12 rows per multiplication. In a subsequent section, we will reduce it to 8 rows.
2limb decomposition
Due to the limited number of permutable cells per gate, we do not have enough cells for copy constraining $q_{′}$ and $r_{′}$ (or $q$ and $r$) to their respective range check gadgets. To address this issue, we must decompose $q_{′}$ into 2 limbs instead of 3, like so
$q_{′}=q_{01}+2_{2ℓ}⋅q_{2}$
and
$q_{01}=q_{0}+2_{ℓ}⋅q_{1}$
Thus, $q_{′}$ is decomposed into two limbs $q_{01}$ (at most $2ℓ$ bits) and $q_{2}$ (at most $ℓ$ bits).
Note that $q_{′}$ must be range checked by a multirangecheck
gadget. To do this the multirangecheck
gadget must
 Store a copy of the limbs $q_{0},q_{1}$ and $q_{2}$ in its witness
 Range check that they are $ℓ$ bit each
 Constrain that $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$ (this is done with a special mode of the
multirangecheck
gadget)  Have copy constraints for $q_{01}$ and $q_{2}$ as outputs of the
ForeignFieldMul
gate and inputs to themultirangecheck
gadget
Note that since the foreign field multiplication gate computes $q_{′}$ from $q$ which is already in the witness and $q_{01}$ and $q_{2}$ have copy constraints to a multirangecheck
gadget that fully constrains their decomposition from $q_{′}$, then the ForeignFieldMul
gate does not need to store an additional copy of $q_{0}$ and $q_{1}$.
An optimization
Since the $q<f$ and $r<f$ checks must be done for each multiplication it makes sense to integrate them into the foreign field multiplication gate. By doing this we can save 4 rows per multiplication.
Doing this doesn’t require adding a lot more witness data because the operands for the bound computations $q_{′}=q+f_{′}$ and $r_{′}=r+f_{′}$ are already present in the witness of the multiplication gate. We only need to store the bounds $q_{′}$ and $r_{′}$ in permutable witness cells so that they may be copied to multirangecheck gates to check they are each less than $2_{t}$.
To constrain $x+f_{′}=x_{′}$, the equation we use is
$x+2_{t}=o⋅f+x_{′},$
where $x$ is the original value, $o=1$ is the field overflow bit and $x_{′}$ is the remainder and our desired addition result (e.g. the bound). Rearranging things we get
$x+2_{t}−f=x_{′},$
which is just
$x+f_{′}=x_{′},$
Recall from the section “Avoiding borrows” that $f_{′}$ is often larger than $f$. At first this seems like it could be a problem because in multiplication each operation must be less than $f$. However, this is because the maximum size of the multiplication was quadratic in the size of $f$ (we use the CRT, which requires the bound that $a⋅b<2_{t}⋅n$). However, for addition the result is much smaller and we do not require the CRT nor the assumption that the operands are smaller than $f$. Thus, we have plenty of space in $ℓ$bit limbs to perform our addition.
So, the equation we need to constrain is
$x+f_{′}=x_{′}.$
We can expand the left hand side into the 2 limb format in order to obtain 2 intermediate sums
$s_{01}=x_{01}+f_{01}s_{2}=x_{2}+f_{2} $
where $x_{01}$ and $f_{01}$ are defined like this
$x_{01}=x_{0}+2_{ℓ}⋅x_{1}f_{01}=f_{0}+2_{ℓ}⋅f_{1} $
and $x$ and $f_{′}$ are defined like this
$x=x_{01}+2_{2ℓ}⋅x_{2}f_{′}=f_{01}+2_{2ℓ}⋅f_{2} $
Going back to our intermediate sums, the maximum bit length of sum $s_{01}$ is computed from the maximum bit lengths of $x_{01}$ and $f_{01}$
$x_{01}(2_{ℓ}−1)+2_{ℓ}⋅(2_{ℓ}−1) +f_{01}(2_{ℓ}−1)+2_{ℓ}⋅(2_{ℓ}−1) =2_{2ℓ+1}−2,$
which means $s_{01}$ is at most $2ℓ+1$ bits long.
Similarly, since $x_{2}$ and $f_{2}$ are less than $2_{ℓ}$, the max value of $s_{2}$ is
$(2_{ℓ}−1)+(2_{ℓ}−1)=2_{ℓ+1}−2,$
which means $s_{2}$ is at most $ℓ+1$ bits long.
Thus, we must constrain
$s_{01}+2_{2ℓ}⋅s_{2}−x_{01}−2_{2ℓ}⋅x_{2}=0mod2_{t}.$
The accumulation of this into parts looks like this.
0 L 2L 3L=t 4L

:
s01: 2L + 1
: ↖w01
s2: L + 1
: ↖w2
:
x'01
:
x'2
:
\____________________________/
≈ z01 \_____________/
≈ z2
The two parts are computed with
$z_{01}z_{2} =s_{01}−x_{01}=s_{2}−x_{2}. $
Therefore, there are two carry bits $w_{01}$ and $w_{2}$ such that
$z_{01}z_{2}+w_{01} =2_{2ℓ}⋅w_{01}=2_{ℓ}⋅w_{2} $
In this scheme $x_{01},x_{2},w_{01}$ and $w_{2}$ are witness data, whereas $s_{01}$ and $s_{2}$ are formed from a constrained computation of witness data $x_{01},x_{2}$ and constraint system public parameter $f_{′}$. Note that due to carrying, witness $x_{01}$ and $x_{2}$ can be different than the values $s_{01}$ and $s_{2}$ computed from the limbs.
Thus, each bound addition $x+f_{′}$ requires the following witness data
 $x_{01},x_{2}$
 $x_{01},x_{2}$
 $w_{01},w_{2}$
where $f_{′}$ is baked into the gate coefficients. The following constraints are needed
 $2_{2ℓ}⋅w_{01}=s_{01}−x_{01}$
 $2_{ℓ}⋅w_{2}=s_{2}+w_{01}−x_{2}$
 $x_{01}∈[0,2_{2ℓ})$
 $x_{2}∈[0,2_{ℓ})$
 $w_{01}∈[0,2)$
 $w_{2}∈[0,2)$
Due to the limited number of copyable witness cells per gate, we are currently only performing this optimization for $q$.
The witness data is
 $q_{0},q_{1},q_{2}$
 $q_{01},q_{2}$
 $q_{carry01},q_{carry2}$
The checks are
 $q_{0}∈[0,2_{ℓ})$
 $q_{1}∈[0,2_{ℓ})$
 $q_{0}=q_{0}+f_{0}$
 $q_{1}=q_{1}+f_{1}$
 $s_{01}=q_{0}+2_{ℓ}⋅q_{1}$
 $q_{01}∈[0,2_{2ℓ})$
 $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$
 $q_{carry01}∈[0,2)$
 $2_{2ℓ}⋅q_{carry01}=s_{01}−q_{01}$
 $q_{2}∈[0,2_{ℓ})$
 $s_{2}=q_{2}+f_{2}$
 $q_{carry2}∈[0,2)$
 $2_{ℓ}⋅q_{carry2}=s_{2}+w_{01}−q_{2}$
Checks (1)  (5) assure that $s_{01}$ is at most $2ℓ+1$ bits. Whereas checks (10)  (11) assure that $s_{2}$ is at most $ℓ+1$ bits. Altogether they are comprise a single multirangecheck
of $q_{0},q_{1}$ and $q_{2}$. However, as noted above, we do not have enough copyable cells to output $q_{1},q_{2}$ and $q_{3}$ to the multirangecheck
gadget. Therefore, we adopt a strategy where the 2 limbs $q_{01}$ and $q_{2}$ are output to the multirangecheck
gadget where the decomposition of $q_{0}$ and $q_{2}$ into $q_{01}=p_{0}+2_{ℓ}⋅p_{1}$ is constrained and then $q_{0},q_{1}$ and $q_{2}$ are range checked.
Although $q_{1},q_{2}$ and $q_{3}$ are not range checked directly, this is safe because, as shown in the “Bound checks” section, rangechecking that $q_{′}∈[0,2_{t})$ also constrains that $q∈[0,2_{t})$. Therefore, the updated checks are
 $q_{0}∈[0,2_{ℓ})$
multirangecheck
 $q_{1}∈[0,2_{ℓ})$
multirangecheck
 $q_{0}=q_{0}+f_{0}$
ForeignFieldMul
 $q_{1}=q_{1}+f_{1}$
ForeignFieldMul
 $s_{01}=q_{0}+2_{ℓ}⋅q_{1}$
ForeignFieldMul
 $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$
multirangecheck
 $q_{carry01}∈[0,2)$
ForeignFieldMul
 $2_{2ℓ}⋅q_{carry01}=s_{01}−q_{01}$
ForeignFieldMul
 $q_{2}∈[0,2_{ℓ})$
multirangecheck
 $s_{2}=q_{2}+f_{2}$
ForeignFieldMul
 $q_{carry2}∈[0,2)$
ForeignFieldMul
 $2_{ℓ}⋅q_{carry2}=s_{2}+q_{carry01}−q_{2}$
ForeignFieldMul
Note that we don’t need to rangecheck $q_{01}$ is at most $2ℓ+1$ bits because it is already implicitly constrained by the multirangecheck
gadget constraining that $q_{0},q_{1}$ and $q_{2}$ are each at most $ℓ$ bits and that $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$. Furthermore, since constraining the decomposition is already part of the multirangecheck
gadget, we do not need to do it here also.
To simplify things further, we can combine some of these checks. Recall our checked computations for the intermediate sums
$s_{01}s_{2} =q_{01}+f_{01}=q_{2}+f_{2} $
where $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$ and $f_{01}=f_{0}+2_{ℓ}⋅f_{1}$. These do not need to be separate constraints, but are instead part of existing ones.
Checks (10) and (11) can be combined into a single constraint $2_{ℓ}⋅q_{carry2}=(q_{2}+f_{2})+q_{carry01}−q_{2}$. Similarly, checks (3)  (5) and (8) can be combined into $2_{2ℓ}⋅q_{carry01}=q_{01}+f_{01}−q_{01}$ with $q_{01}$ and $f_{01}$ further expanded. The reduced constraints are
 $q_{0}∈[0,2_{ℓ})$
multirangecheck
 $q_{1}∈[0,2_{ℓ})$
multirangecheck
 $q_{01}=q_{0}+2_{ℓ}⋅q_{1}$
multirangecheck
 $q_{carry01}∈[0,2)$
ForeignFieldMul
 $2_{2ℓ}⋅q_{carry01}=s_{01}−q_{01}$
ForeignFieldMul
 $q_{2}∈[0,2_{ℓ})$
multirangecheck
 $q_{carry2}∈[0,2)$
ForeignFieldMul
 $2_{ℓ}⋅q_{carry2}=s_{2}+w_{01}−q_{2}$
ForeignFieldMul
Finally, there is one more optimization that we will exploit. This optimization relies on the observation that for bound addition the second carry bit $q_{carry2}$ is always zero. This this may be obscure, so we will prove it by contradiction. To simplify our work we rename some variables by letting $x_{0}=q_{01}$ and $x_{1}=q_{2}$. Thus, $q_{carry2}$ being nonzero corresponds to a carry in $x_{1}+f_{1}$.
Proof: To get a carry in the highest limbs $x_{1}+f_{1}$ during bound addition, we need
$>2_{ℓ}<x_{1}+ϕ_{0}+f_{1}≤2_{ℓ}−1+ϕ_{0}+f_{1}>$
where $2_{ℓ}−1$ is the maximum possible size of $x_{1}$ (before it overflows) and $ϕ_{0}$ is the overflow bit from the addition of the least significant limbs $x_{0}$ and $f_{0}$. This means
$>2_{ℓ}−ϕ_{0}−f_{1}<x_{1}<2_{ℓ}>$
We cannot allow $x$ to overflow the foreign field, so we also have
$>x_{1}<(f−x_{0})/2_{2ℓ}>$
Thus,
$>2_{ℓ}−ϕ_{0}−f_{1}<(f−x_{0})/2_{2ℓ}=f/2_{2ℓ}−x_{0}/2_{2ℓ}>$
Since $x_{0}/2_{2ℓ}=ϕ_{0}$ we have
$>2_{ℓ}−ϕ_{0}−f_{1}<f/2_{2ℓ}−ϕ_{0}>$
so
$>2_{ℓ}−f_{1}<f/2_{2ℓ}>$
Notice that $f/2_{2ℓ}=f_{1}$. Now we have
$>2_{ℓ}−f_{1}<f_{1}>⟺>f_{1}>2_{ℓ}−f_{1}>$
However, this is a contradiction with the definition of our negated foreign field modulus limb $f_{1}=2_{ℓ}−f_{1}$. $■$
We have proven that $q_{carry2}$ is always zero, so that allows use to simplify our constraints. We now have
 $q$