I need to multiply several big long integers as efficiently as possible.
I am trying to implement the Harvey & van der Hoeven 2019 algorithm for integer multiplication, but I am stuck on understanding the definition and mathematics behind it, especially the Agarwal–Cooley algorithm.
Any help to understand this algorithm, like a practical example or some pseudo-code would be highly appreciated.
Remember that Big O notation is defined such that there exists some x≥x₀ for which some function |f(x)|≤εg(x) for all such x.
The problem with the Harvey & van der Hoeven (2019) algorithm is that the x₀ involved is quite large. Therefore, for most inputs, their algorithm gives a way to multiply integers inefficiently. For very large, numbers, though, the algorithm does give an O(n log n) algorithm.
But how big are those numbers? David Harvey, one of the authors states:
The new algorithm is not really practical in its current form, because the proof given in our paper only works for ludicrously large numbers. Even if each digit was written on a hydrogen atom, there would not be nearly enough room available in the observable universe to write them down.
On the other hand, we are hopeful that with further refinements, the algorithm might become practical for numbers with merely billions or trillions of digits. If so, it may well become an indispensable tool in the computational mathematician's arsenal.
Therefore, if you are serious about your stated goal---multiplying big numbers quickly---this algorithm is not the way you should go about doing it.
If your long integers are less than about 10000 bits and you are using a regular 32- or 64 bit computer, I suggest Karatsuba-Offman. It can be sped up using parallelism, e.g. multi-threading or a GPU.
If you want to make a custom chip to do it fully parallel, use 4XY = (X+Y)^2-(X-Y)^2 and build a Karatsuba-Offman squarer. That takes less chip area because the squarer has only n input lines instead of 2n
Related
In my free time I'm preparing for interview questions like: implement multiplying numbers represented as arrays of digits. Obviously I'm forced to write it from the scratch in a language like Python or Java, so an answer like "use GMP" is not acceptable (as mentioned here: Understanding Schönhage-Strassen algorithm (huge integer multiplication)).
For which exactly range of sizes of those 2 numbers (i.e. number of digits), I should choose
School grade algorithm
Karatsuba algorithm
Toom-Cook
Schönhage–Strassen algorithm ?
Is Schönhage–Strassen O(n log n log log n) always a good solution? Wikipedia mentions that Schönhage–Strassen is advisable for numbers beyond 2^2^15 to 2^2^17. What to do when one number is ridiculously huge (e.g. 10,000 to 40,000 decimal digits), but second consists of just couple of digits?
Does all those 4 algorithms parallelizes easily?
You can browse the GNU Multiple Precision Arithmetic Library's source and see their thresholds for switching between algorithms.
More pragmatically, you should just profile your implementation of the algorithms. GMP puts a lot of effort into optimizing, so their algorithms will have different constant factors than yours. The difference could easily move the thresholds around by an order of magnitude. Find out where the times cross as input size increases for your code, and set the thresholds correspondingly.
I think all of the algorithms are amenable to parallelization, since they're mostly made up up of divide and conquer passes. But keep in mind that parallelizing is another thing that will move the thresholds around quite a lot.
I need to divide numbers represented as digits in byte arrays with non standard amount of bytes. It maybe 5 bytes or 1 GB or more. Division should be done with numbers represented as byte arrays, without any conversions to numbers.
Divide-and-conquer division winds up being a whole lot faster than the schoolbook method for really big integers.
GMP is a state-of-the-art big-number library. For just about everything, it has several implementations of different algorithms that are each tuned for specific operand sizes.
Here is GMP's "division algorithms" documentation. The algorithm descriptions are a little bit terse, but they at least give you something to google when you want to know more.
Brent and Zimmermann's Modern Computer Arithmetic is a good book on the theory and implementation of big-number arithmetic. Probably worth a read if you want to know what's known.
The standard long division algorithm, which is similar to grade school long division is Algorithm D described in Knuth 4.3.1. Knuth has an extensive discussion of division in that section of his book. The upshot of this that there are faster methods than Algorithm D but they are not a whole lot faster and they are a lot more complicated than Algorithm D.
If you determined to get the fastest possible algorithm, you can resort to what is known as the SRT algorithm.
All of this and more is covered by the way on the Wikipedia Division Algorithm.
I am just a beginner of computer science. I learned something about running time but I can't be sure what I understood is right. So please help me.
So integer factorization is currently not a polynomial time problem but primality test is. Assume the number to be checked is n. If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number. I think this program is polynomial time, isn't it?
One possible way that I am wrong would be a factorization program should find all primes, instead of the first prime discovered. So maybe this is the reason why.
However, in public key cryptography, finding a prime factor of a large number is essential to attack the cryptography. Since usually a large number (public key) is only the product of two primes, finding one prime means finding the other. This should be polynomial time. So why is it difficult or impossible to attack?
Casual descriptions of complexity like "polynomial factoring algorithm" generally refer to the complexity with respect to the size of the input, not the interpretation of the input. So when people say "no known polynomial factoring algorithm", they mean there is no known algorithm for factoring N-bit natural numbers that runs in time polynomial with respect to N. Not polynomial with respect to the number itself, which can be up to 2^N.
The difficulty of factorization is one of those beautiful mathematical problems that's simple to understand and takes you immediately to the edge of human knowledge. To summarize (today's) knowledge on the subject: we don't know why it's hard, not with any degree of proof, and the best methods we have run in more than polynomial time (but also significantly less that exponential time). The result that primality testing is even in P is pretty recent; see the linked Wikipedia page.
The best heuristic explanation I know for the difficulty is that primes are randomly distributed. One of the easier-to-understand results is Dirichlet's theorem. This theorem say that every arithmetic progression contains infinitely many primes, in other words, you can think of primes as being dense with respect to progressions, meaning you can't avoid running into them. This is the simplest of a rather large collection of such results; in all of them, primes appear in ways very much analogous to random numbers.
The difficult of factoring is thus analogous to the impossibility of reversing a one-time pad. In a one-time pad, there's a bit we don't know XOR with another one we don't. We get zero information about an individual bit knowing the result of the XOR. Replace "bit" with "prime" and multiplication with XOR, and you have the factoring problem. It's as if you've multiplied two random numbers together, and you get very little information from product (instead of zero information).
If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number.
Even ignoring that the divisibility test will take longer for bigger numbers, this approach takes almost twice as long if you just add a single (binary) digit to n. (Actually it will take twice as long if you add two digits)
I think that is the definition of exponential runtime: Make n one bit longer, the algorithm takes twice as long.
But note that this observation applies only to the algorithm you proposed. It is still unknown if integer factorization is polynomial or not. The cryptographers sure hope that it is not, but there are also alternative algorithms that do not depend on prime factorization being hard (such as elliptic curve cryptography), just in case...
I've read about algorithm run-time in some algorithm books, where it's expressed as, O(n). For eg., the given code would run in O(n) time for the best case & O(n3) for the worst case. What does it mean & how does one calculate it for their own code? Is it like linear time , and is it like each predefined library function has their own run-time which should be kept in mind before calling it? Thanks...
A Beginner's Guide to Big O Notation might be a good place to start:
http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
also take a look at Wikipedia
http://en.wikipedia.org/wiki/Big_O_notation
there are several related questions and good answers on stackoverflow
What is a plain English explanation of "Big O" notation?
and
Big-O for Eight Year Olds?
Should't this be in math?
If you are trying to sort with bubble sort array, that is already sorted, then
you can check, if this move along array checked anything. If not, all okey -- we done.
Than, for best case you will have O(n) compraisons(n-1, to be exact), for worst case(array is reversed) you will have O(n^2) compraisons(n(n-1)/2, to be exact).
More complicated example. Let's find maximum element of array.
Obvilously, you will always do n-1 compraisons, but how many assignments on average?
Complicated math answers: H(n) -1.
Usually, It is easy to Your Answerget best and worst scenarios, but average require a lot of math.
I would suggest you read Knuth, Volume 1. But who would not?
And, formal definition:
f(n)∈O(g(n)) means exist n∈N: for all m>n f(m)
In fact, you must read O-notation about on wiki.
The big-O notation is one kind of asymptotic notation. Asymptotic notation is an idea from mathematics, which describes the behavior of functions "in the limit" - as you approach infinity.
The problem with defining how long an algorithm takes to run is that you usually can't give an answer in milliseconds because it depends on the machine, and you can't give an answer in clock cycles or as an operation count because that would be too specific to particular data to be useful.
The simple way of looking at asymptotic notation is that it discards all the constant factors in a function. Basically, a n2 will always be bigger that b n if n is sufficiently large (assuming everything is positive). Changing the constant factors a and b doesn't change that - it changes the specific value of n where a n2 is bigger, but doesn't change that it happens. So we say that O(n2) is bigger than O(n), and forget about those constants that we probably can't know anyway.
That's useful because the problems with bigger n are usually the ones where things slow down enough that we really care. If n is small enough, the time taken is small, and the gains available from choosing different algorithms are small. When n gets big, choosing a different algorithm can make a huge difference. Also, the problems happen in the real world are often much bigger than the ones we can easily test against - and often, they keep growing over time (e.g. as databases accumulate more data).
It's a useful mathematical model that abstracts away enough awkward-to-handle detail that useful results can be found, but it's not a perfect system. We don't deal with infinite problems in the real world, and there are plenty of times when problems are small enough that those constants are relevant for real-world performance, and sometimes you just have to time things with a clock.
The MIT OCW Introduction to Algorithms course is very good for this kind of thing. The videos and other materials are available for free, and the course book (not free) is among the best books available for algorithms.
From Wikipedia:
"Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi[11] gave a similar algorithm using modular arithmetic in 2008 achieving the same running time. However, these latter algorithms are only faster than Schönhage–Strassen for impractically large inputs."
I would be very interested in the size of such impractically large integers.
Maybe someone did implement both algorithms in a certain way and could do some benchmarks?
Thanks
Fürer's algorithm and it's modular equivalent (DSKS) are very deep research topics and, for now, remain only as academic interest. Nobody actually knows how big the cross-over point is. And in all likeliness it doesn't matter because that cross-over point is likely to be well beyond 64-bit computing limits.
I've implemented Schönhage-Strassen before and I understand how Fürer's algorithm works. So I'm quite familiar with both of them. I can say it's very possible that the cross-over point between Schönhage-Strassen and Fürer's algorithm is so high that a computer capable of holding the parameters will be larger than the size of the observable universe.
That's the problem when you have complexities that differ by less than a logarithm. It takes exponentially large input sizes to compensate even for small differences in the Big-O constant.
In this case, Fürer's algorithm is known to have a very very very large Big-O constant.