Foreign Field Addition RFC
This document is meant to explain the foreign field addition gate in Kimchi.
Overview
The goal of this gate is to perform the following operation between a left $a$ and a right $b$ input, to obtain a result $r$ $a+s∗b=rmodf$ where $a,b,r∈F_{f}$ belong to the foreign field of modulus $f$ and we work over a native field $F_{n}$ with modulus $n$, and $s$ is a flag that is either $−1$ or $1$ to indicate whether it is a subtraction or addition gate.
If $f<n$ then we can easily perform the above computation. But in this gate we are interested in the contrary case in which $f>n$. In order to deal with this, we will divide foreign field elements into limbs that fit in our native field. We want to be compatible with the foreign field multiplication gate, and thus the parameters we will be using are the following:
 3 limbs of 88 bits each (for a total of 264 bits)
 $2_{264}>f$
 our foreign field will have 256bit length
 our native field has 255bit length
In other words, using 3 limbs of 88 bits each allows us to represent any foreign field element in the range $[0,2_{264})$ for foreign field addition, but only up to $2_{259}$ for foreign field multiplication. Thus, with the current configuration of our limbs, our foreign field must be smaller than $2_{259}$ (because $2_{264}⋅2_{255}>2_{259}_{2}+2_{259}$, more on this in Foreign Field Multiplication or the original FFmul RFC.
Splitting the addition
Let’s take a closer look at what we have, if we split our variables in limbs (using little endian)
bits 0..............8788...........175176...........263
a = (a0a1a2)
+/
b = (b0b1b2)
=
r = (r0r1r2) mod(f)
We will perform the addition in 3 steps, one per limb. Now, when we split long additions in this way, we must be careful with carry bits between limbs. Also, even if $a$ and $b$ are foreign field elements, it could be the case that $a+b$ is already larger than the modulus (such addition could be at most $2f−2$). But it could also be the case that the subtraction produces an underflow because $a<b$ (with a difference of at most $1−f$). Thus we will have to consider the more general case. That is,
$a+s⋅b=q⋅f+rmod2_{264}$
with a field overflow term $q$ that will be either $0$ (if no underflow nor overflow is produced) or $1$ (if there is overflow with $s=1$) or $−1$ (if there is underflow with $s=−1$). Looking at this in limb form, we have:
bits 0..............8788...........175176...........263
a = (a0a1a2)
+
s = 1  1
·
b = (b0b1b2)
=
q = 1  0  1
·
f = (f0f1f2)
+
r = (r0r1r2)
First, if $a+b$ was larger than $f$, then we will have a field overflow (represented by $q=1$) and we will have to subtract $f$ from the sum $a+b$ to obtain $r$. Whereas the foreign field overflow necessitates an overflow bit $q$ for the foreign field equation above, when $q=1$ there is a corresponding subtraction that may introduce carries (or even borrows) between the limbs. This is because $r=a+b−q⋅fmod2_{264}$. Therefore, in the equations for the limbs we will use a carry flag $c_{i}$ for limb $i$ to represent both carries and borrows. The carry flags can be anything in ${−1,0,1}$, where $c_{i}=−1$ represents a borrow and $c_{i}=1$ represents a carry. Next we see more clearly how this works.
In order to perform this operation in parts, we first take a look at the least significant limb, which is the easiest part. This means we want to know how to compute $r_{0}$. First, if the addition of the bits in $a_{0}$ and $b_{0}$ produce a carry (or borrow) bit, then it should propagate to the second limb. That means one has to subtract $2_{88}$ from $a_{0}+b_{0}$, add $1$ to $a_{1}+b_{1}$ and set the low carry flag $c_{0}$ to 1 (otherwise it is zero). Thus, the equation for the lowest bit is
$a_{0}+s⋅b_{0}=q⋅f_{0}+r_{0}+c_{0}⋅2_{88}$
Or put in another way, this is equivalent to saying that $a_{0}+b_{0}−q⋅f_{0}−r_{0}$ is a multiple of $2_{88}$ (or, the existence of the carry coefficient $c_{0}$).
This kind of equation needs an additional check that the carry coefficient $c_{0}$ is a correct carry value (0
, 1
, or 1
). We will use this idea for the remaining limbs as well.
Looking at the second limb, we first need to observe that the addition of $a_{1}$ and $b_{1}$ can, not only produce a carry bit $c_{1}$, but they may need to take into account the carry bit from the first limb; $c_{0}$. Similarly to the above,
$a_{1}+s⋅b_{1}=q⋅f_{1}+r_{1}+c_{1}⋅2_{88}−c_{0}$
Note that in this case, the carry coefficient from the least significant limb is not multiplied by $2_{88}$, but instead is used as is, since it occupies the least significant position of the second limb. Here, the second carry bit $c_{1}$ is the one being followed by $88$ zeros. Again, we will have to check that $c_{1}$ is a bit.
Finally, for the third limb, we obtain a similar equation. But in this case, we do not have to take into account $c_{0}$ anymore, since it was already considered within $c_{1}$. Here, the most significant carry bit $c_{2}$ should always be a zero so we can ignore it.
$a_{2}+s⋅b_{2}=q⋅f_{2}+r_{2}−c_{1}$
Graphically, this is what is happening:
bits 0..............8788...........175176...........263
a = (a0a1a2)
+
s = 1  1
·
b = (b0b1b2)
> > >
= c_0 c_1 c_2
q = 1 0  1
·
f = (f0f1f2)
+
r = (r0r1r2)
Our witness computation is currently using the BigUint
library, which takes care of all of the intermediate carry limbs itself so we do not have to address all casuistries ourselves (such as propagating the low carry flag to the high limb in case the middle limb is all zeros).
Upper bound check
Last but not least, we should perform some range checks to make sure that the result $r$ is contained in $F_{f}$. This is important because there could be other values of the result which still fit in $<2_{264}$ but are larger than $f$, and we must make sure that the final result is the minimum one (that we will be referring to as $r_{min}$ in the following).
Ideally, we would like to reuse some gates that we already have. In particular, we can perform range checks for $0≤X<2_{3ℓ}=2_{3⋅88}$. But we want to check that $0≤r_{min}<f$. The way we can tweak this gate to behave as we want, is the following. First, the above inequality is equivalent to saying that $−f≤r_{min}−f<0$. Then we add $2_{264}$ on both sides to obtain $2_{264}−f≤r_{min}−f+2_{264}<2_{264}$. Thus, all there is left to check is that a given bound value is indeed computed correctly. Meaning, that an upperbound term $u$ is correctly obtained as $r_{min}+2_{264}−f$. Even if we could apply this check for all intermediate results of foreign field additions, it is sufficient to apply it only once at the end of the computations.
This is very similar to a foreign field addition, but simpler: the field overflow bit will always be $1$, the sign is positive $1$, and the right input is formed by the limbs $(0,0,2_{88})$ standing for the number $2_{264}$. There could be intermediate limb carry bits $k_{0}$ and $k_{1}$ as above. Observe that, because the sum is meant to be $<2_{264}$, then the carry bit for the most significant limb should always be zero $k_{2}=0$, so we do not use it. Happily, we can apply the addition gate again to perform the addition limbwise, by selecting the following parameters:
$a_{0}a_{1}a_{2}b_{0}b_{1}b_{2}sq ======== r_{min_{0}},r_{min_{1}},r_{min_{2}},0,0,2_{88},11 $
Calling $u$ the upper bound term, the equation $r_{min}+2_{264}−f$ can be expressed as $r_{min}+2_{264}=1∗f+u$. Finally, we perform a range check on the sum $u$, and we would know that $r_{min}<f$.
bits 0..............8788...........175176...........263
r = (r0r1r2)
+
g = (g0g1g2)
> >
= k_0 k_1
f
+
u = (u0u1u2)
Following the steps above, and representing this equation in limb form, we have:
$u_{0}u_{1}u_{2} =r_{0}+0−f_{0}−k_{0}⋅2_{88}=r_{1}+0−f_{1}−k_{1}⋅2_{88}+k_{0}=r_{2}+2_{88}−f_{2}+k_{1} $
But now we also have to check that $0≤r$. But this is implicitly covered by $r$ being a field element of at most 264 bits (range check).
Subtractions
Mathematically speaking, a subtraction within a field is no more than an addition over that field. Negative elements are not different from “positive” elements in finite fields (or in any modular arithmetic). Our witness computation code computes negative sums by adding the modulus to the result. To give a general example, the element $−e$ within a field $F_{m}$ of order $m$ and $e<m$ is nothing but $m−e$. Nonetheless, for arbitrarily sized elements (not just those smaller than the modulus), the actual field element could be any $c⋅m−e$, for any multiple $c⋅m$ of the modulus. Thus, representing negative elements directly as “absolute” field elements may incur in additional computations involving multiplications and thus would result in a less efficient mechanism.
Instead, our gate encodes subtractions and additions directly within the sign term that is multiplying the right input. This way, there is no need to check that the negated value is performed correctly (which would require an additional row for a potential FFNeg
gate).
Optimization
So far, one can recompute the result as follows:
bits 0..............8788...........175176...........263
r = (r0r1r2)
=
a = (a0a1a2)
+
s = 1  1
·
b = (b0b1b2)

q = 1 0  1
·
f = (f0f1f2)
> > >
c_0 c_1 0
Inspired by the halving approach in foreign field multiplication, an optimized version of the above gate results in a reduction by 2 in the number of constraints and by 1 in the number of witness cells needed. The main idea is to condense the claims about the low and middle limbs in one single larger limb of 176 bits, which fit in our native field. This way, we can get rid of the low carry flag, its corresponding carry check, and the three result checks become just two.
bits 0..............8788...........175176...........263
r = (r0 r1r2)
=
a = (a0 a1a2)
+
s = 1  1
·
b = (b0 b1b2)

q = 1 0  1
·
f = (f0 f1f2)
> >
c 0
These are the new equations:
$r_{bot}r_{top} =(a_{0}+2_{88}⋅a_{1})+s⋅(b_{0}+2_{88}⋅b_{1})−q⋅(f_{0}+2_{88}⋅f_{1})−c⋅2_{176}=a_{2}+s⋅b_{2}−q⋅f_{2}+c $
with $r_{top}=r_{2}$ and $c=c_{1}$.
Gadget
The full foreign field addition/subtraction gadget will be composed of:
 $1$ public input row containing the value $1$;
 $n$ rows with
ForeignFieldAdd
gate type: for the actual $n$ chained additions or subtractions;
 $1$
ForeignFieldAdd
row for the bound addition;  $1$
Zero
row for the final bound addition.  $1$
RangeCheck
gadget for the first left input of the chain $a_{1}:=a$;  Then, $n$ of the following set of rows:
 $1$
RangeCheck
gadget for the $i$th right input $b_{i}$;  $1$
RangeCheck
gadget for the $i$th result which will correspond to the $(i+1)$th left input of the chain $r_{i}=a_{i+1}$.
 $1$
 $1$ final
RangeCheck
gadget for the bound check $u$.
A total of 20 rows with 15 columns in Kimchi for 1 addition. All ranges below are inclusive.
Row(s)  Gate type(s)  Witness 

0  PublicInput  $1$ 
1..n  ForeignFieldAdd  +/ 
n+1  ForeignFieldAdd  bound 
n+2  Zero  bound 
n+3..n+6  multirangecheck  left 
n+7+8i..n+10+8i  multirangecheck  right 
n+11+8i..n+14+8i  multirangecheck  $r$ 
9n+7..9n+10  multirangecheck  $u$ 
This mechanism can chain foreign field additions together. Initially, there are $n$ foreign field addition gates, followed by a foreign field addition gate for the bound addition (whose current row corresponds to the next row of the last foreign field addition gate), and an auxiliary Zero
row that holds the upper bound. At the end, an initial left input range check is performed, which is followed by a $n$ pairs of range check gates for the right input and intermediate result (which become the left input for the next iteration). After the chained inputs checks, a final range check on the bound takes place.
For example, chaining the following set of 3 instructions would result in a full gadget with 37 rows:
$add(add(add(a,b),c),d)$
Row(s)  Gate type(s)  Witness 

0  PublicInput  $1$ 
1  ForeignFieldAdd  $a+b$ 
2  ForeignFieldAdd  $(a+b)+c$ 
3  ForeignFieldAdd  $((a+b)+c)+d$ 
4  ForeignFieldAdd  bound 
5  Zero  bound 
6..9  multirangecheck  $a$ 
10..13  multirangecheck  $b$ 
14..17  multirangecheck  $a+b$ 
18..21  multirangecheck  $c$ 
22..25  multirangecheck  $a+b+c$ 
26..29  multirangecheck  $d$ 
30..33  multirangecheck  $a+b+c+d$ 
34..37  multirangecheck  bound 
Nonetheless, such an exhaustive set of checks are not necessary for completeness nor soundness. In particular, only the very final range check for the bound is required. Thus, a shorter gadget that is equally valid and takes $(8∗n+4)$ fewer rows could be possible if we can assume that the inputs of each addition are correct foreign field elements. It would follow the next layout (with inclusive ranges):
Row(s)  Gate type(s)  Witness 

0  public input row for soundness of bound overflow flag  $1$ 
1..n  ForeignFieldAdd  
n+1  ForeignFieldAdd  
n+2  Zero  
n+3..n+6  multirangecheck for bound  $u$ 
Otherwise, we would need range checks for each new input of the chain, but none for intermediate results; implying $4⋅n$ fewer rows.
Row(s)  Gate type(s)  Witness 

0  public input row for soundness of bound overflow flag  $1$ 
1..n  ForeignFieldAdd  $a_{i}+b_{i}$ 
n+1  ForeignFieldAdd  
n+2  Zero  
n+3..n+6  multirangecheck for first left input  $a_{1}$ 
n+7+4i..n+10+4i  multirangecheck for $i$th right input  $b_{i}$ 
5n+7..5n+10  multirangecheck for bound  $u$ 
For more details see the Bound Addition section in Foreign Field Multiplication or the original Foreign Field Multiplication RFC.
Layout
For the full mode of tests of this gate, we need to perform 4 range checks to assert that the limbs of $a,b,r,u$ have a correct size, meaning they fit in $2_{88}$ (and thus, rangechecking $a,b,r,u$ for $2_{264}$). Because each of these elements is split into 3 limbs, we will have to use 3 copy constraints between the RangeCheck
gates and the ForeignFieldAdd
rows (per element). That amounts to 12 copy constraints. Recalling that Kimchi only allows for the first 7 columns of each row to host a copy constraint, we necessarily have to use 2 rows for the actual addition gate. The layout of these two rows is the following:
Curr  Next  … Final  

Column  ForeignFieldAdd  ForeignFieldAdd  Zero 
0  $a_{0}$ (copy)  $r_{0}$ (copy)  $u_{0}$ (copy) 
1  $a_{1}$ (copy)  $r_{1}$ (copy)  $u_{1}$ (copy) 
2  $a_{2}$ (copy)  $r_{2}$ (copy)  $u_{2}$ (copy) 
3  $b_{0}$ (copy)  
4  $b_{1}$ (copy)  
5  $b_{2}$ (copy)  
6  $q$  
7  $c$  
8  
9  
10  
11  
12  
13  
14 
Constraints
So far, we have pointed out the following sets of constraints:
Main addition
 $0=a_{0}+b_{0}−r_{0}−q⋅f_{0}−2_{88}⋅c_{0}$
 $0=a_{1}+b_{1}−r_{1}−q⋅f_{1}+c_{0}−2_{88}⋅c_{1}$
 $0=a_{2}+b_{2}−r_{2}−q⋅f_{2}+c_{1}$
Carry checks
 $0=c_{0}⋅(c_{0}+1)⋅(c_{0}−1)$
 $0=c_{1}⋅(c_{0}+1)⋅(c_{1}−1)$
Sign checks
TODO: decide if we really want to keep this check or leave it implicit as it is a coefficient value
 $0=(s+1)⋅(s−1)$
Overflow checks
 $0=q⋅(q−s)$
Optimizations
When we use this gate as part of a larger chained gadget, we should optimize the number of range check rows to avoid redundant checks. In particular, if the result of an addition becomes one input of another addition, there is no need to check twice that the limbs of that term have the right length.
The sign is now part of the coefficients of the gate. This will allow us to remove the sign check constraint and release one witness cell element. But more importantly, it brings soundness to the gate as it is now possible to wire the $1$ public input to the overflow witness of the final bound check of every addition chain.