Fast Fourier Transform (FFT)
This section describes how the CooleyTukey fast Fourier transform works. As we learned in the previous section, the key is to select evaluation points that yield an efficient FFT algorithm.
Specifically, say we have $ω∈F$ such that $ω_{n}=1$, and $ω_{r}=1$ for any $0<r<n$.
Put another way, all the values $1,ω,ω_{2},ω_{3},…,ω_{n−1}$ are distinct and $ω_{n}=1$.
Put yet another way, the group generated by $ω$ inside $F_{×}$ (written $⟨ω⟩$) has size $n$.
We call such an $ω$ a primitive $n$th root of unity.
Suppose we have an $ω$ which is a primitive $2_{k}$th root of unity and let $A_{k}={1,ω,…,ω_{2_{k}−1}}$.
The FFT algorithm will let us compute $interp_{A_{k}}$ for this set.
Actually, it is easier to see how it will let us compute the $eval_{A_{k}}$ algorithm efficiently.
We will describe an algorithm $FFT(k,ω,f)$ that takes as input
 $k∈N$
 $ω∈F$ a primitive $2_{k}$th root of unity
 $f∈F[x]_{<2_{k}}$ in dense coefficients form (i.e., as a vector of coefficients of length $n$).
and outputs the vector of evaluations $[f(1),f(ω),f(ω_{2})…,f(ω_{2_{k}−1})]$ and does it in time $O(k2_{k})$ (which is to say, $ngn$ if $n=2_{k}$).
Notice that naively, computing each evaluation $f(ω_{i})$ using the coefficients of $f$ would require time $O(n)$, and so computing all $n$ of them would require time $O(n_{2})$.
The algorithm $FFT(k,ω,f)$ can be defined recursively as follows.
If $k=0$, then $ω$ is a primitive $1$st root of unity, and $f$ is a polynomial of degree $0$. That means $ω=1$ and also $f$ is a constant $c∈F$. So, we can immediately output the array of evaluations $[c]=[f(1)]$.
If $k>0$, then we will split $f$ into two polynomials, recursively call $FFT$ on them, and reconstruct the result from the recursive calls.
To that end, define $f_{0}$ to be the polynomial whose coefficients are all the evenindex coefficients of $f$ and $f_{1}$ the polynomial whose coefficients are all the oddindex coefficients of $f$. In terms of the array representation, this just means splitting out every other entry into two arrays. So that can be done in time $O(n)$.
Write $f=∑_{i<2_{k}}c_{i}x_{i}$, so that $f_{0}=∑_{i<2_{k−1}}c_{2i}x_{i}$ and $f_{1}=∑_{i<2_{k−1}}c_{2i+1}x_{i}$. Then
$f(x) =i<2_{k}∑ c_{i}x_{i}=i<2_{k−1}∑ c_{2i}x_{2i}+i<2_{k−1}∑ c_{2i+1}x_{2i+1}=i<2_{k−1}∑ c_{2i}(x_{2})_{i}+i<2_{k−1}∑ c_{2i+1}x⋅(x_{2})_{i}=i<2_{k−1}∑ c_{2i}(x_{2})_{i}+xi<2_{k−1}∑ c_{2i+1}(x_{2})_{i}=f_{0}(x_{2})+xf_{1}(x_{2}) $
Now, notice that if $ω$ is a $2_{k}$th root of unity, then $ω_{2}$ is a $2_{k−1}$th root of unity. Thus we can recurse with $FFT(k−1,ω_{2},f_{0})$ and similarly for $f_{1}$. Let
$[e_{0,0},…,e_{0,2_{k−1}−1}][e_{1,0},…,e_{1,2_{k−1}−1}] =FFT(k−1,ω_{2},f_{0})=FFT(k−1,ω_{2},f_{1}) $
By assumption $e_{i,j}=f_{i}((ω_{2})_{j})$. So, for any $j$ we have
$f(ω_{j}) =f_{0}((ω_{2})_{j})+ω_{j}f_{1}((ω_{2})_{j}) $
Now, since $j$ may be larger than $2_{k−1}−1$, we need to reduce it mod $2_{k−1}$, relying on the fact that if $τ$ is an $n$th root of unity then $τ_{j}=τ_{jmodn}$ since $τ_{n}=1$. Thus, $(ω_{2})_{j}=(ω_{2})_{jmod2_{k−1}}$ and so we have
$f(ω_{j}) =f_{0}((ω_{2})_{jmod2_{k−1}})+ω_{j}f_{1}((ω_{2})_{jmod2_{k−1}})=e_{0,jmod2_{k−1}}+ω_{j}e_{1,jmod2_{k−1}} $
We can compute the array $W=[1,ω,…,ω_{2_{k}−1}]$ in time $O(n)$ (since each entry is the previous entry times $ω$). Then we can compute each entry of the output in $O(1)$ as
$f(ω_{j}) =e_{0,jmod2_{k−1}}+W[j]⋅e_{1,jmod2_{k−1}} $
There are $n$ such entries, so this takes time $O(n)$.
This concludes the recursive definition of the algorithm $FFT(k,ω,f)$.
Algorithm: computing $eval_{A_{k}}$
 $Inputf=[c_{0},…,c_{2_{k}−1}]$ the coefficients of polynomial $f(x)=∑_{i<2_{k}}c_{i}x_{i}$
 $ComputeW←[1,ω,ω_{2},…,ω_{2_{k}−1}]$
 $FFT(k,ω,f)→[f(1),f(ω),f(ω_{2})…,f(ω_{2_{k}−1})]$
 $ifk==0$
 $returnf$
 $else$
 $Computef_{0}=[c_{0},c_{2},…,c_{2_{k}−2}]$ the even coefficients of $f,$ corresponding to $f_{0}(x)=∑_{i<2_{k−1}}c_{2i}x_{i}$
 $Computef_{1}=[c_{1},c_{3},…,c_{2_{k}−1}]$ the odd coefficients of $f,$ corresponding to $f_{1}(x)=∑_{i<2_{k−1}}c_{2i+1}x_{i}$
 $e_{0}←FFT(k−1,ω_{2},f_{0})$
 $e_{1}←FFT(k−1,ω_{2},f_{1})$
 $forj∈[0,2_{k}−1]$
 $F_{j}←e_{0,jmod2_{k−1}}+W[j]⋅e_{1,jmod2_{k−1}}$
 $returnF$
Now let’s analyze the time complexity. Let $T(n)$ be the complexity on an instance of size $n$ (that is, for $n=2_{k}$).
Looking back at what we have done, we have done
 $O(n)$ for computing $f_{0}$ and $f_{1}$
 two recursive calls, each of size $n/2$
 $O(n)$ for computing the powers of $ω$
 $O(n)$ for combining the results of the recursive calls
In total, this is $O(n)+2T(n/2)$. Solving this recurrence yields $T(n)=O(n)⋅logn=O(ngn)$. Basically, there are $gn$ recursions before we hit the base case, and each step takes time $O(n)$. $□$
Now, in practice there are ways to describe this algorithm nonrecursively that have better concrete performance, but that’s out of scope for this document. Read the code if you are interested.
Using the FFT algorithm to compute $interp_{A_{k}}$
So far we have a fast way to compute $eval_{A_{k}}(f)$ all at once, where $A_{k}$ is the set of powers of a $2_{k}$th root of unity $ω$. For convenience let $n=2_{k}$.
Now we want to go the other way and compute a polynomial given an array of evaluations. Specifically, $n$ evaluations $[f(x_{0}),f(x_{1}),…,f(x_{n−1})]$ uniquely define a degree $n−1$ polynomial. This can be written as a system of $n$ equations
$f(x_{0})f(x_{1})⋮f(x_{n−1}) =c_{0}+c_{1}x_{0}+…+c_{n−1}x_{0}=c_{0}+c_{1}x_{1}+…+c_{n−1}x_{1}=c_{0}+c_{1}x_{n−1}+…+c_{n−1}x_{n−1}, $ which can be rewritten as a matrix vector product. $ f(x_{0})f(x_{1})⋮f(x_{n−1}) = 11⋮1 x_{0}x_{1}⋮x_{n−1} ⋯⋯⋱⋯ x_{0}x_{1}⋮x_{n−1} × c_{0}c_{1}⋮c_{n−1} $ This $n×n$ matrix is a Vandermonde matrix and it just so happens that square Vandermonde matrices are invertible, iff the $x_{i}$ are unique. Since we purposely selected our $x_{i}$ to be the powers of $ω$, a primitive $n$th root of unity, by definition $x_{i}=ω_{i}$ are unique.
Therefore, to compute the polynomial given the corresponding array of evaluations (i.e. interpolation) we can solve for the polynomial’s coefficients using the inverse of the matrix. $ c_{0}c_{1}⋮c_{n−1} = 11⋮1 1ω⋮ω_{n−1} ⋯⋯⋱⋯ 1_{n−1}ω_{n−1}⋮ω_{(n−1)(n−1)} _{−1}× f(1)f(ω)⋮f(ω_{n−1}) $ All we need now is the inverse of this matrix, which is slightly complicated to compute. I’m going to skip it for now, but if you have the details please make a pull request.
Substituting in the inverse matrix we obtain the equation for interpolation. $ c_{0}c_{1}⋮c_{n−1} =n1 11⋮1 1ω_{−1}⋮ω_{−(n−1)} ⋯⋯⋱⋯ 1_{n−1}ω_{−(n−1)}⋮ω_{−(n−1)(n−1)} × f(1)f(ω)⋮f(ω_{n−1}) $ Observe that this equation is nearly identical to the original equation for evaluation, except with the following substitution. $ω_{i}⇒n1 ω_{−1i}$ Consequently and perhaps surprisingly, we can reuse the FFT algorithm $eval_{A_{k}}$ in order to compute the inverse– $interp_{A_{k}}$.
So, suppose we have an array $[a_{0},…,a_{n−1}]$ of field elements (which you can think of as a function $A_{k}→F$) and we want to compute the coefficients of a polynomial $f$ with $f(ω_{i})=a_{i}$.
To this end, define a polynomial $g$ by $g=∑_{j<n}a_{j}x_{j}$. That is, the polynomial whose coefficients are the evaluations in our array that we’re hoping to interpolate.
Now, let $[e_{0},…,e_{n−1}]=FFT(k,ω_{−1},g)$.
That is, we’re going to feed $g$ into the FFT algorithm defined above with $ω_{−1}$ as the $2_{k}$th root of unity. It is not hard to check that if $ω$ is an nth root of unity, so is $ω_{−1}$. Remember: the resulting values are the evaluations of $g$ on the powers of $ω_{−1}$, so $e_{i}=g(ω_{−i})=∑_{j<n}a_{j}ω_{−ij}$.
Now, let $h=∑_{i<n}e_{i}x_{i}$. That is, reinterpret the values $e_{i}$ returned by the FFT as the coefficients of a polynomial. I claim that $h$ is almost the polynomial we are looking for. Let’s calculate what values $h$ takes on at the powers of $ω$.
$h(ω_{s}) =i<n∑ e_{i}ω_{si}=i<n∑ ω_{si}j<n∑ a_{j}ω_{−ij}=i<n∑ j<n∑ a_{j}ω_{si−ij}=j<n∑ a_{j}i<n∑ ω_{i(s−j)} $
Now, let’s examine the quantity $c_{j}:=∑_{i<n}ω_{i(s−j)}$. We claim that if $j=s$, then $c_{j}=n$, and if $j=s$, then $c_{j}=0$. The first claim is clear since
$c_{s}=i<n∑ ω_{i(s−s)}=i<n∑ ω_{0}=i<n∑ 1=n$
For the second claim, we will prove that $ω_{s−j}c_{j}=c_{j}$. This implies that $(1−ω_{s−j})c_{j}=0$. So either $1−ω_{s−j}=0$ or $c_{j}=0$. The former cannot be the case since it implies $ω_{s−j}=1$ which in turn implies $s=j$ which is impossible since we are in the case $j=s$. Thus we have $c_{j}=0$ as desired.
So let’s show that $c_{j}$ is invariant under multiplication by $ω_{s−j}$. Basically, it will come down to the fact that $ω_{n}=ω_{0}$.
$ω_{s−j}c_{j} =ω_{s−j}i<n∑ ω_{i(s−j)}=i<n∑ ω_{i(s−j)+(s−j)}=i<n∑ (ω_{i+1})_{s−j}=(ω_{0+1})_{s−j}+(ω_{1+1})_{s−j}+⋯+(ω_{(n−1)+1})_{s−j}=(ω_{1})_{s−j}+(ω_{2})_{s−j}+⋯+(ω_{n})_{s−j}=(ω_{1})_{s−j}+(ω_{2})_{s−j}+⋯+(ω_{0})_{s−j}=(ω_{0})_{s−j}+(ω_{1})_{s−j}+⋯+(ω_{n−1})_{s−j}=i<n∑ (ω_{i})_{s−j}=c_{j} $
So now we know that
$h(ω_{s}) =j<n∑ a_{j}c_{j}=a_{s}⋅n+j=s∑ a_{j}⋅0=a_{s}⋅n $
So if we define $f=h/n$, then $f(ω_{s})=a_{s}$ for every $s$ as desired. Thus we have our interpolation algorithm, sometimes called an inverse FFT or IFFT:
Algorithm: computing $interp_{A_{k}}$
Input: $[a_{0},…,a_{n−1}]$ the points we want to interpolate and $ω$ a $n$th root of unity.
Interpret the input array as the coefficients of a polynomial $g=∑_{i<n}a_{i}x_{n}$.
Let $[e_{0},…,e_{n}]=FFT(k,ω_{−1},g)$.
Output the polynomial $∑_{i<n}(e_{i}/n)x_{i}$. I.e., in terms of the densecoefficients form, output the vector $[e_{0}/n,…,e_{n−1}/n]$.
Note that this algorithm also takes time $O(ngn)$
Takeaways

Polynomials can be represented as a list of coefficients or a list of evaluations on a set $A$

If the set $A$ is the set of powers of a root of unity, there are time $O(ngn)$ algorithms for converting back and forth between those two representations

In evaluations form, polynomials can be added and multiplied in time $O(n)$
 TODO: caveat about hitting degree
Exercises

Implement types
DensePolynomial<F: FfftField>
andEvaluations<F: FftField>
that wrap aVec<F>
and implement the FFT algorithms described above for converting between them 
Familiarize yourself with the types and functions provided by
ark_poly