This document is meant to explain the foreign field addition gate in Kimchi.

## Overview

The goal of this gate is to perform the following operation between a left and a right input, to obtain a result where belong to the foreign field of modulus and we work over a native field of modulus , and is a flag indicating whether it is a subtraction or addition gate.

If then we can easily perform the above computation natively since no overflows happen. But in this gate we are interested in the contrary case in which . In order to deal with this, we will divide foreign field elements into limbs that fit in our native field. We want to be compatible with the foreign field multiplication gate, and thus the parameters we will be using are the following:

• 3 limbs of 88 bits each (for a total of 264 bits)
• So , concretely:
• The modulus of the foreign field is 256 bit
• The modulus of our the native field is 255 bit

In other words, using 3 limbs of 88 bits each allows us to represent any foreign field element in the range for foreign field addition, but only up to for foreign field multiplication. Thus, with the current configuration of our limbs, our foreign field must be smaller than (because , more on this in Foreign Field Multiplication or the original FFmul RFC.

Let’s take a closer look at what we have, if we split our variables in limbs (using little endian)

bits  0..............87|88...........175|176...........263

a  =  (-------a0-------|-------a1-------|-------a2-------)
+/-
b  =  (-------b0-------|-------b1-------|-------b2-------)
=
r  =  (-------r0-------|-------r1-------|-------r2-------)  mod(f)


We will perform the addition in 3 steps, one per limb. Now, when we split long additions in this way, we must be careful with carry bits between limbs. Also, even if and are foreign field elements, it could be the case that is already larger than the modulus (in this case could be at most ). But it could also be the case that the subtraction produces an underflow because (with a difference of at most ). Thus we will have to consider the more general case. That is,

with a field overflow term that will be either (if no underflow nor overflow is produced), (if there is overflow with ) or (if there is underflow with ). Looking at this in limb form, we have:

bits  0..............87|88...........175|176...........263

a  =  (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b  =  (-------b0-------|-------b1-------|-------b2-------)
=
q  =  -1 | 0 | 1
·
f  =  (-------f0-------|-------f1-------|-------f2-------)
+
r  =  (-------r0-------|-------r1-------|-------r2-------)


First, if is larger than , then we will have a field overflow (represented by ) and thus will have to subtract from the sum to obtain . Whereas the foreign field overflow necessitates an overflow bit for the foreign field equation above, when there is a corresponding subtraction that may introduce carries (or even borrows) between the limbs. This is because . Therefore, in the equations for the limbs we will use a carry flag for limb to represent both carries and borrows. The carry flags are in , where represents a borrow and represents a carry. Next we explain how this works.

In order to perform this operation in parts, we first take a look at the least significant limb, which is the easiest part. This means we want to know how to compute . First, if the addition of the bits in and produce a carry (or borrow) bit, then it should propagate to the second limb. That means one has to subtract from , add to and set the low carry flag to 1 (otherwise it is zero). Thus, the equation for the lowest bit is

Or put in another way, this is equivalent to saying that is a multiple of (or, the existence of the carry coefficient ).

This kind of equation needs an additional check that the carry coefficient is a correct carry value (belonging to ). We will use this idea for the remaining limbs as well.

Looking at the second limb, we first need to observe that the addition of and can, not only produce a carry bit , but they may need to take into account the carry bit from the first limb; . Similarly to the above,

Note that in this case, the carry coefficient from the least significant limb is not multiplied by , but instead is used as is, since it occupies the least significant position of the second limb. Here, the second carry bit is the one being followed by zeros. Again, we will have to check that is a bit.

Finally, for the third limb, we obtain a similar equation. But in this case, we do not have to take into account anymore, since it was already considered within . Here, the most significant carry bit should always be a zero so we can ignore it.

Graphically, this is what is happening:

bits  0..............87|88...........175|176...........263

a  =  (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b  =  (-------b0-------|-------b1-------|-------b2-------)
>                >                >
=                     c_0              c_1              c_2
q  =  -1 | 0 | 1
·
f  =  (-------f0-------|-------f1-------|-------f2-------)
+
r  =  (-------r0-------|-------r1-------|-------r2-------)


Our witness computation is currently using the BigUint library, which takes care of all of the intermediate carry limbs itself so we do not have to address all casuistries ourselves (such as propagating the low carry flag to the high limb in case the middle limb is all zeros).

### Upper bound check for foreign field membership

Last but not least, we should perform some range checks to make sure that the result is contained in . This is important because there could be other values of the result which still fit in but are larger than , and we must make sure that the final result is the minimum one (that we will be referring to as in the following).

Ideally, we would like to reuse some gates that we already have. In particular, we can perform range checks for . But we want to check that , which is a smaller range. The way we can tweak the existing range check gate to behave as we want, is as follows.

First, the above inequality is equivalent to . Add on both sides to obtain . Thus, we can perform the standard check together with the , which together will imply as intended. All there is left to check is that a given upperbound term is correctly obtained; calling it , we want to enforce .

The following computation is very similar to a foreign field addition (), but simpler. The field overflow bit will always be , the main operation sign is positive (we are doing addition, not subtraction), and the right input , call it , is represented by the limbs . There could be intermediate limb carry bits and as in the general case FF addition protocol. Observe that, because the sum is meant to be , the carry bit for the most significant limb should always be zero , so this condition is enforced implicitly by omitting from the equations. Happily, we can apply the addition gate again to perform the addition limb-wise, by selecting the following parameters:

Calling the upper bound term, the equation can be expressed as . Finally, we perform a range check on the sum , and we would know that .

bits  0..............87|88...........175|176...........263

r  =  (-------r0-------|-------r1-------|-------r2-------)
+
g  =  (-------g0-------|-------g1-------|-------g2-------)
>                >
=                     k_0              k_1
f
+
u  =  (-------u0-------|-------u1-------|-------u2-------)


Following the steps above, and representing this equation in limb form, we have:

But now we also have to check that . But this is implicitly covered by being a field element of at most 264 bits (range check).

When we have a chain of additions with , we could apply the field membership check naively to every intermediate , however it is sufficient to apply it only once at the end of the computations for , and keep intermediate , in a “raw” form. Generally speaking, treating intermediate values lazily helps to save a lot of computation in many different FF addition and multiplication scenarios.

### Subtractions

Mathematically speaking, a subtraction within a field is no more than an addition over that field. Negative elements are not different from “positive” elements in finite fields (or in any modular arithmetic). Our witness computation code computes negative sums by adding the modulus to the result. To give a general example, the element within a field of order and is nothing but . Nonetheless, for arbitrarily sized elements (not just those smaller than the modulus), the actual field element could be any , for any multiple of the modulus. Thus, representing negative elements directly as “absolute” field elements may incur in additional computations involving multiplications and thus would result in a less efficient mechanism.

Instead, our gate encodes subtractions and additions directly within the sign term that is multiplying the right input. This way, there is no need to check that the negated value is performed correctly (which would require an additional row for a potential FFNeg gate).

### Optimization

So far, one can recompute the result as follows:

bits  0..............87|88...........175|176...........263
r  =  (-------r0-------|-------r1-------|-------r2-------)
=
a  =  (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b  =  (-------b0-------|-------b1-------|-------b2-------)
-
q  =  -1 | 0 | 1
·
f  =  (-------f0-------|-------f1-------|-------f2-------)
>                >                >
c_0              c_1               0


Inspired by the halving approach in foreign field multiplication, an optimized version of the above gate results in a reduction by 2 in the number of constraints and by 1 in the number of witness cells needed. The main idea is to condense the claims about the low and middle limbs in one single larger limb of 176 bits, which fit in our native field. This way, we can get rid of the low carry flag, its corresponding carry check, and the three result checks become just two.

bits  0..............87|88...........175|176...........263
r  =  (-------r0------- -------r1-------|-------r2-------)
=
a  =  (-------a0------- -------a1-------|-------a2-------)
+
s = 1 | -1
·
b  =  (-------b0------- -------b1-------|-------b2-------)
-
q  =  -1 | 0 | 1
·
f  =  (-------f0------- -------f1-------|-------f2-------)
>                >
c                0


These are the new equations:

with and .

• public input row containing the value ;
• rows with ForeignFieldAdd gate type:
• for the actual chained additions or subtractions;
• ForeignFieldAdd row for the bound addition;
• Zero row for the final bound addition.
• RangeCheck gadget for the first left input of the chain ;
• Then, of the following set of rows:
• RangeCheck gadget for the -th right input ;
• RangeCheck gadget for the -th result which will correspond to the -th left input of the chain .
• final RangeCheck gadget for the bound check .

A total of 20 rows with 15 columns in Kimchi for 1 addition. All ranges below are inclusive.

Row(s)Gate type(s)Witness
0PublicInput
1..nForeignFieldAdd+/-
n+1ForeignFieldAddbound
n+2Zerobound
n+3..n+6multi-range-checkleft
n+7+8i..n+10+8imulti-range-checkright
n+11+8i..n+14+8imulti-range-check
9n+7..9n+10multi-range-check

This mechanism can chain foreign field additions together. Initially, there are foreign field addition gates, followed by a foreign field addition gate for the bound addition (whose current row corresponds to the next row of the last foreign field addition gate), and an auxiliary Zero row that holds the upper bound. At the end, an initial left input range check is performed, which is followed by a pairs of range check gates for the right input and intermediate result (which become the left input for the next iteration). After the chained inputs checks, a final range check on the bound takes place.

For example, chaining the following set of 3 instructions would result in a full gadget with 37 rows:

Row(s)Gate type(s)Witness
0PublicInput
1ForeignFieldAdd
2ForeignFieldAdd
3ForeignFieldAdd
4ForeignFieldAddbound
5Zerobound
6..9multi-range-check
10..13multi-range-check
14..17multi-range-check
18..21multi-range-check
22..25multi-range-check
26..29multi-range-check
30..33multi-range-check
34..37multi-range-checkbound

Nonetheless, such an exhaustive set of checks are not necessary for completeness nor soundness. In particular, only the very final range check for the bound is required. Thus, a shorter gadget that is equally valid and takes fewer rows could be possible if we can assume that the inputs of each addition are correct foreign field elements. It would follow the next layout (with inclusive ranges):

Row(s)Gate type(s)Witness
0public input row for soundness of bound overflow flag
1..nForeignFieldAdd
n+1ForeignFieldAdd
n+2Zero
n+3..n+6multi-range-check for bound

Otherwise, we would need range checks for each new input of the chain, but none for intermediate results; implying fewer rows.

Row(s)Gate type(s)Witness
0public input row for soundness of bound overflow flag
1..nForeignFieldAdd
n+1ForeignFieldAdd
n+2Zero
n+3..n+6multi-range-check for first left input
n+7+4i..n+10+4imulti-range-check for -th right input
5n+7..5n+10multi-range-check for bound

For more details see the Bound Addition section in Foreign Field Multiplication or the original Foreign Field Multiplication RFC.

### Layout

For the full mode of tests of this gate, we need to perform 4 range checks to assert that the limbs of have a correct size, meaning they fit in (and thus, range-checking for ). Because each of these elements is split into 3 limbs, we will have to use 3 copy constraints between the RangeCheck gates and the ForeignFieldAdd rows (per element). That amounts to 12 copy constraints. Recalling that Kimchi only allows for the first 7 columns of each row to host a copy constraint, we necessarily have to use 2 rows for the actual addition gate. The layout of these two rows is the following:

CurrNext… Final
ColumnForeignFieldAddForeignFieldAddZero
0 (copy) (copy) (copy)
1 (copy) (copy) (copy)
2 (copy) (copy) (copy)
3 (copy)
4 (copy)
5 (copy)
6
7
8
9
10
11
12
13
14

### Constraints

So far, we have pointed out the following sets of constraints:

#### Sign checks

TODO: decide if we really want to keep this check or leave it implicit as it is a coefficient value

## Optimizations

When we use this gate as part of a larger chained gadget, we should optimize the number of range check rows to avoid redundant checks. In particular, if the result of an addition becomes one input of another addition, there is no need to check twice that the limbs of that term have the right length.

The sign is now part of the coefficients of the gate. This will allow us to remove the sign check constraint and release one witness cell element. But more importantly, it brings soundness to the gate as it is now possible to wire the public input to the overflow witness of the final bound check of every addition chain.