How does one actually implement polynomial multiplications using FFT?

How does one actually implement polynomial multiplications using FFT? - algorithm

I have been studying this topic in an Algorithms textbook.
The clever usage of the complex roots of unity seems to be mathematically working. However, I do not understand how one could actually represent this in a computer.
I can think of two things:
Use the real/imaginary decomposition to represent the complex numbers. But this means using floats, which means I open up my algorithm to numerical error and I would loose out precision even if I want to multiply two polynomials with integer coefficients.
Represent exp(i 2pi/n) as n. So, I'd eventually get a tuple in omega, and if I have to keep it in this form, I'd essentially be doing polynomial multiplication in omega again, taking us back to square one.
I'd really like to see an implementation of this algorithm in a familiar programming language.

Indeed as you identify, the roots of unity are typically not nice numbers that can be stored well in a computer. Since the numerical error is small, if you know the output should be integers, rounding usually produces the right result.
If you don't want to (or cannot) rely on that, an exact option is the Number Theoretic Transform. It substitutes the roots of unity in the complex plane with roots of unity in a finite field ℤ/pℤ where p is a suitable prime. p has to be large enough for all the necessary roots to exist, and the efficiency is affected by properties of p. If you choose a Fermat prime then the roots of unity have convenient forms and there is a trick to do reduction modulo p more efficiently than usual. That is all exact integer arithmetic and the values stay small, so there is no problem implementing it in a computer.
That technique is used in the Schönhage–Strassen algorithm so you can look up the specifics there.

Related

How should I properly represent multi-variable complexity with Google benchmark?

The Google microbenchmark library supports estimating complexity of an algorithm but everything is expressed by telling the framework what the size of your N is. I'm curious what the best way to represent M+N algorithms in this framework is. In my test I go over a cartesian product of M & N values in my test.
Do I call SetComplexityN with M+N (& for O(MN) algorithms I assumed SetComplexityN is similarly M*N)? If I wanted to hard-code the complexity of the algorithm (vs doing best fit) does benchmark::oN then map to M+N and benchmark::oNSquared maps to O(MN)?

It's not something we've yet considered in the library.
If you set the complexity to M+N and use oN then the fitting curve used for the minimal least square calculation will be linear in M+N.
However, if you set the complexity to M*N and use oNSquared then we'll try to fit to pow(M*N, 2) which is likely not what you want, so I think still using oN would be appropriate.

Which algorithm to choose for a huge integer multiplication, depending on N size

In my free time I'm preparing for interview questions like: implement multiplying numbers represented as arrays of digits. Obviously I'm forced to write it from the scratch in a language like Python or Java, so an answer like "use GMP" is not acceptable (as mentioned here: Understanding Schönhage-Strassen algorithm (huge integer multiplication)).
For which exactly range of sizes of those 2 numbers (i.e. number of digits), I should choose
School grade algorithm
Karatsuba algorithm
Toom-Cook
Schönhage–Strassen algorithm ?
Is Schönhage–Strassen O(n log n log log n) always a good solution? Wikipedia mentions that Schönhage–Strassen is advisable for numbers beyond 2^2^15 to 2^2^17. What to do when one number is ridiculously huge (e.g. 10,000 to 40,000 decimal digits), but second consists of just couple of digits?
Does all those 4 algorithms parallelizes easily?

You can browse the GNU Multiple Precision Arithmetic Library's source and see their thresholds for switching between algorithms.
More pragmatically, you should just profile your implementation of the algorithms. GMP puts a lot of effort into optimizing, so their algorithms will have different constant factors than yours. The difference could easily move the thresholds around by an order of magnitude. Find out where the times cross as input size increases for your code, and set the thresholds correspondingly.
I think all of the algorithms are amenable to parallelization, since they're mostly made up up of divide and conquer passes. But keep in mind that parallelizing is another thing that will move the thresholds around quite a lot.

Changing data structure representation at runtime: looking for other examples

Which programs/algorithms change the representation of their data structure at runtime in order to obtain beter performance?
Context:
Data structures "define" how real-world concepts are structured and represented in computer memory. For different kinds of computations a different data structure should/can be used to achieve acceptable performance (e.g., linked-list versus array implementation).
Self-adaptive (cf. self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.
Other algorithms can benefit from an external change of representation. In
matrix-multiplication, for instance, it is a well know performance trick to transpose "the second matrix" (such that caches are used more efficiently). This is actually changing the matrix representation from row-major to column major order. Because "A" is not the same as "Transposed(A)", the the second matrix is transposed again after the multiplication to keep the program semantically correct.
A second example is using a linked-list at program start-up to populate "the data structure" and change to an array based implementation once the content of the list becomes "stable".
I am looking for programmers that have similar experiences with other example programs where an external change of representation is performed in their application in order to have better performance. Thus, where the representation (chosen implementation) of a data structure is changed at runtime as an explicit part of the program.

The pattern of transforming the input representation in order to enable a more efficient algorithm comes up in many situations. I would go as far as to say this is an important way to think about designing efficient algorithms in general. Some examples that come to mind:
HeapSort. It works by transforming your original input list into a binary heap (probably a min-heap), and then repeatedly calling the remove-min function to get the list elements in sorted order. Asymptotically, it is tied for the fastest comparison-based sorting algorithm.
Finding duplicates in a list. Without changing the input list, this will take O(n^2) time. But if you can sort the list, or store the elements in a hash table or Bloom filter, you can find all the duplicates in O(n log n) time or better.
Solving a linear program. A linear program (LP) is a certain kind of optimization problem with many applications in economics and elsewhere. One of the most important techniques in solving LPs is duality, which means converting your original LP into what is called the "dual", and then solving the dual. Depending on your situation, solving the dual problem may be much easier than solving the original ("primal") LP. This book chapter starts with a nice example of primal/dual LPs.
Multiplying very large integers or polynomials. The fastest known method is using the FFT; see here or here for some nice descriptions. The gist of the idea is to convert from the usual representation of your polynomial (a list of coefficients) to an evaluation basis (a list of evaluations of that polynomial at certain carefully-chosen points). The evaluation basis makes multiplication trivial - you can just multiply each pair of evaluations. Now you have the product polynomial in an evaluation basis, and you interpolate (opposite of evaluation) to get back the coefficients, like you wanted. The Fast Fourier Transform (FFT) is a very efficient way of doing the evaluation and interpolation steps, and the whole thing can be much faster than working with the coefficients directly.
Longest common substring. If you want to find the longest substring that appears in a bunch of text documents, one of the fastest ways is to create a suffix tree from each one, then merge them together and find the deepest common node.
Linear algebra. Various matrix computations are performed most efficiently by converting your original matrix into a canonical form such as Hermite normal form or computing a QR factorization. These alternate representations of the matrix make standard things such as finding the inverse, determinant, or eigenvalues much faster to compute.
There are certainly many examples besides these, but I was trying to come up with some variety.

Find intersection(s) of any two functions - solving simultaneous equations

Imagine having any two functions. You need to find intersections of that functions. You definitely don't want to try all x values to check for f(x)==g(x).
Normally in math, you create simultaneous equations derived from f(x)==g(x). But I see no way how to implement equations in any programing language.
So once more, what am I looking for:
Conceptual algorithm to solve equations.
The same for simultaneous and quadratic equations.
I believe there should be some workaround using function derivations, but I've recently learned derivation concept at school and I have no idea how to use it in this case.

That is a much harder problem than you would imagine. A good place to start for learning about these things is the Newton-Raphson method, which gives numerical approximations to equations of the form h(x) = 0. (When you set h(x) = g(x) - f(x), this provides solutions for the problem you are asking about.)
Exact, algebraic solving of equations (as implemented in Mathematica, for example) are even more difficult, you basically have to recreate everything you would do in your head when solving an equation on a piece of paper.

Obviously this problem is not solvable in the general case because you can construct a "function" which is arbitrarily complex. For example, if you have a "function" with 5 trillion terms in it including various transcendental and complex transformations in it, the computer could take years just to compute a single value, much less intersect it with another similar function.
So, first of all you need to define what you mean by a "function". If you mean a polynomial of degree less than 4 then the problem becomes much more straightforward. In such cases you combine the terms of the polynomial and find the roots of the equation, which will be the intersections.
If the polynomial has more than 5 terms (a quintic or greater) then there is no easy symbolic solution. In this case the terms are combined and you find the roots by iterative approximation. See Root Finding Algorithms.
If the function involves transcendentals such sin/cos/log/e^x, etc, you can potentially find the intersection by representing the functions as a series or a continued fraction. You then subtract one series from the other and set the value to zero. The solution of the continuous equation yields an approximation of the root(s).

why is integer factorization a non-polynomial time?

I am just a beginner of computer science. I learned something about running time but I can't be sure what I understood is right. So please help me.
So integer factorization is currently not a polynomial time problem but primality test is. Assume the number to be checked is n. If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number. I think this program is polynomial time, isn't it?
One possible way that I am wrong would be a factorization program should find all primes, instead of the first prime discovered. So maybe this is the reason why.
However, in public key cryptography, finding a prime factor of a large number is essential to attack the cryptography. Since usually a large number (public key) is only the product of two primes, finding one prime means finding the other. This should be polynomial time. So why is it difficult or impossible to attack?

Casual descriptions of complexity like "polynomial factoring algorithm" generally refer to the complexity with respect to the size of the input, not the interpretation of the input. So when people say "no known polynomial factoring algorithm", they mean there is no known algorithm for factoring N-bit natural numbers that runs in time polynomial with respect to N. Not polynomial with respect to the number itself, which can be up to 2^N.

The difficulty of factorization is one of those beautiful mathematical problems that's simple to understand and takes you immediately to the edge of human knowledge. To summarize (today's) knowledge on the subject: we don't know why it's hard, not with any degree of proof, and the best methods we have run in more than polynomial time (but also significantly less that exponential time). The result that primality testing is even in P is pretty recent; see the linked Wikipedia page.
The best heuristic explanation I know for the difficulty is that primes are randomly distributed. One of the easier-to-understand results is Dirichlet's theorem. This theorem say that every arithmetic progression contains infinitely many primes, in other words, you can think of primes as being dense with respect to progressions, meaning you can't avoid running into them. This is the simplest of a rather large collection of such results; in all of them, primes appear in ways very much analogous to random numbers.
The difficult of factoring is thus analogous to the impossibility of reversing a one-time pad. In a one-time pad, there's a bit we don't know XOR with another one we don't. We get zero information about an individual bit knowing the result of the XOR. Replace "bit" with "prime" and multiplication with XOR, and you have the factoring problem. It's as if you've multiplied two random numbers together, and you get very little information from product (instead of zero information).

If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number.
Even ignoring that the divisibility test will take longer for bigger numbers, this approach takes almost twice as long if you just add a single (binary) digit to n. (Actually it will take twice as long if you add two digits)
I think that is the definition of exponential runtime: Make n one bit longer, the algorithm takes twice as long.
But note that this observation applies only to the algorithm you proposed. It is still unknown if integer factorization is polynomial or not. The cryptographers sure hope that it is not, but there are also alternative algorithms that do not depend on prime factorization being hard (such as elliptic curve cryptography), just in case...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio