What the most efficient way to implement n choose k using c++ - performance

I am trying to implement RSA crypto system using some equations to get better time for decryption.
The problem that I have is a huge numbers in the function that calculate "n choose k", the factorial for huge numbers take a lot of time.
When I start wrote the code I wrote it with naive calculation, but now I am seeing that the program running time very long, even if I compare to the original RSA.
Also I use in GMP library for big numbers, but it doesn't affecting on the problem, I hope.

Using Pascal's triangle is a fast method for calculating n choose k. You can refer to the answer here for more info.
The fastest method I know of would be to make use of the results from "On the Complexity of Calculating Factorials". Just calculate all 3 factorials, then perform the two division operations, each with complexity M(n logn).

Related

How to apply computational complexity theory

I have the basics down on computational complexity theory. I can understand why I might want to scale with one algorithm compared to another. Now that I'm there, how do I actually determine the complexity of the function I've created? How do I understand which functions to use, which one will scale better? How, for example, will I know that the Telephone Book binary search takes O(log n) notation, or that the Fibonacci sequence takes O(n^2) notation, outside of trial and error? How do I determine the complexity of a function in, for example, scikit-learn?
How do I actually apply this stuff?
The tasks performed in scikit-learn are highly computational that is why it is recommended to get a good GPU for running ML/DS related tasks. All of these tasks are run using cores/threads parallelly.
So It is hard to determine what are the actual complexities of these functions are, but what we can do is test run and check how much time it takes for a given length of the input.
Refer here for better understanding.
scikit-learn is heavily (but not only: a kd-tree for example is probably a more computer-science like algorithm) based on numerical-optimization and a classic computer-science focused treatment of computational-complexity is surely needed, but not enough.
For something like an interior-point solver or coordinate-descent based SVM-solver (two example of concepts behind ML-algorithms), both are iterative methods like nearly everything in num-opt, it's not needed to know how fast you can do binary-search, but more important to know, how many iterations your algorithm will need or more specific: how your algorithm moves through the optimization-space. This is pretty tough, depends on the data (e.g. eigenvalues of the hessian of the cost-function) and the proofs / analysis are heavily math-based: e.g. metric-theory.
Things are even worse when heuristics are in play (and those are very common).
So basically: you won't be able to do this for reasonable complex algorithms.
What you should do:
check the docs / sources which algorithm is used and find the underlying research-paper and the analysis to obtain something like "cubic in sample-space"
do empirical analysis with your data
Basically, you need to calculate the number of operations needed by the algorithm
depending on the input size. The Big O is then just the highest order part of the expression with constant factors ignored.
One does not care about the kind of operations (comparison, assignment, ...) as long as the time for the operation is constant.
For complex algorithms, that analysis can be difficult.
For binary search: with each search step, the number of values to be searched is reduced into its half. So twice the input requires one more search step (operation).
t(2n) = 1 + t(n). This results in t(n) = c ld n = O(log n), at least for the powers of two. For the other n, the expression is more complex but the highest order part is still c ld n.
For Fibonacci: the naive, recursive implementation requires you to calculate fib(n-1) and fib(n-2) for calculating fib(n). Hence you calculate fib(n-2) twice, fib(n-3) three times, fib(n-4) five times and so on (following the Fibonacci series itself). So the number of calculations to do is 1 + 1 + 2 + 3 + 5 + ... + fib(n - 1) = fib(n) - 1. Since we are interested in the asymptotic behavior (for big n), we can apply the asymptotic approximation formula:
This means naive, recursive Fibonacci is O(a^n), i.e. exponential complexity.
A better algorithm starts from the beginning of the Fibonacci series and calculates each number once. That's obviously O(n), as it takes n (or n - 2) equal steps.

How should I properly represent multi-variable complexity with Google benchmark?

The Google microbenchmark library supports estimating complexity of an algorithm but everything is expressed by telling the framework what the size of your N is. I'm curious what the best way to represent M+N algorithms in this framework is. In my test I go over a cartesian product of M & N values in my test.
Do I call SetComplexityN with M+N (& for O(MN) algorithms I assumed SetComplexityN is similarly M*N)? If I wanted to hard-code the complexity of the algorithm (vs doing best fit) does benchmark::oN then map to M+N and benchmark::oNSquared maps to O(MN)?
It's not something we've yet considered in the library.
If you set the complexity to M+N and use oN then the fitting curve used for the minimal least square calculation will be linear in M+N.
However, if you set the complexity to M*N and use oNSquared then we'll try to fit to pow(M*N, 2) which is likely not what you want, so I think still using oN would be appropriate.

Which algorithm to choose for a huge integer multiplication, depending on N size

In my free time I'm preparing for interview questions like: implement multiplying numbers represented as arrays of digits. Obviously I'm forced to write it from the scratch in a language like Python or Java, so an answer like "use GMP" is not acceptable (as mentioned here: Understanding Schönhage-Strassen algorithm (huge integer multiplication)).
For which exactly range of sizes of those 2 numbers (i.e. number of digits), I should choose
School grade algorithm
Karatsuba algorithm
Toom-Cook
Schönhage–Strassen algorithm ?
Is Schönhage–Strassen O(n log n log log n) always a good solution? Wikipedia mentions that Schönhage–Strassen is advisable for numbers beyond 2^2^15 to 2^2^17. What to do when one number is ridiculously huge (e.g. 10,000 to 40,000 decimal digits), but second consists of just couple of digits?
Does all those 4 algorithms parallelizes easily?
You can browse the GNU Multiple Precision Arithmetic Library's source and see their thresholds for switching between algorithms.
More pragmatically, you should just profile your implementation of the algorithms. GMP puts a lot of effort into optimizing, so their algorithms will have different constant factors than yours. The difference could easily move the thresholds around by an order of magnitude. Find out where the times cross as input size increases for your code, and set the thresholds correspondingly.
I think all of the algorithms are amenable to parallelization, since they're mostly made up up of divide and conquer passes. But keep in mind that parallelizing is another thing that will move the thresholds around quite a lot.

How can you tell if one function is faster than another function according to O-notation?

I have an issue with how to determine if 1 function is faster or slower than another function. If the professor uses an example of O(1) and O(n), I know O(1) is faster but I really only know that from memorizing the simple functions running time order... But if more complex examples are given, I don't understand how to find the faster function.
For example, let's say I want to compare n^logn and n^(logn)^2 and n^(sqrt(n)). How can I compare these functions and be able to tell which has a fastest and slowest running time (big-O notation)? Is there a step by step process that I can follow each time so that I can use when comparing functions running time?
Here's my thought about the above example. I know n^2 is faster than n^3. So I want to compare the n^____ of each function. So if I plug in n=1000000 in each, logn will have the smallest value, logn^2 will have the second, and logn^sqrt(n) will have the biggest. Does this mean that the smallest value (n^logn) will be the fastest and the biggest value (n^sqrt(n)) will be the slowest?
1. n^logn (fastest)
2. n^logn^2
3. n^sqrt(n) (slowest)
Usually Big O is written as a function of N (except in case of constant, O(1)).
So it is simple a matter of plugging any N (3 or 4 values, or preferably enough values to see the curve) into both functions you are comparing and compute. Graph them if you can.
But you shouldn't need to do that, you should have a basic understanding for the classes of functions for Big O. If you can't calculate it, you should still know that O(log N) is larger than O(1), etc. O notation is about worst case. So usually the comparisons are easy if you are familiar with the most common functions.
Does this mean that the smallest value (n^logn) will be the fastest
and the biggest value (n^sqrt(n)) will be the slowest? 1. n^logn
(fastest) 2. n^logn^2 3. n^sqrt(n) (slowest)
For the purpose of your comparison, yes. O notation is used to compare worst case, complexity, or class of algorithm, so you just assume worst case on all candidates in the comparison. You can't tell from O notation what the best, typical or average performance will be.
Comparing O notations is basically the matter of comparing the curves. I recommend you to draw the curve - that will be helpful to your understanding.
If you use python, I'd like to recommend to try mathplotlib.pyplot. It's very convenient.

why is integer factorization a non-polynomial time?

I am just a beginner of computer science. I learned something about running time but I can't be sure what I understood is right. So please help me.
So integer factorization is currently not a polynomial time problem but primality test is. Assume the number to be checked is n. If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number. I think this program is polynomial time, isn't it?
One possible way that I am wrong would be a factorization program should find all primes, instead of the first prime discovered. So maybe this is the reason why.
However, in public key cryptography, finding a prime factor of a large number is essential to attack the cryptography. Since usually a large number (public key) is only the product of two primes, finding one prime means finding the other. This should be polynomial time. So why is it difficult or impossible to attack?
Casual descriptions of complexity like "polynomial factoring algorithm" generally refer to the complexity with respect to the size of the input, not the interpretation of the input. So when people say "no known polynomial factoring algorithm", they mean there is no known algorithm for factoring N-bit natural numbers that runs in time polynomial with respect to N. Not polynomial with respect to the number itself, which can be up to 2^N.
The difficulty of factorization is one of those beautiful mathematical problems that's simple to understand and takes you immediately to the edge of human knowledge. To summarize (today's) knowledge on the subject: we don't know why it's hard, not with any degree of proof, and the best methods we have run in more than polynomial time (but also significantly less that exponential time). The result that primality testing is even in P is pretty recent; see the linked Wikipedia page.
The best heuristic explanation I know for the difficulty is that primes are randomly distributed. One of the easier-to-understand results is Dirichlet's theorem. This theorem say that every arithmetic progression contains infinitely many primes, in other words, you can think of primes as being dense with respect to progressions, meaning you can't avoid running into them. This is the simplest of a rather large collection of such results; in all of them, primes appear in ways very much analogous to random numbers.
The difficult of factoring is thus analogous to the impossibility of reversing a one-time pad. In a one-time pad, there's a bit we don't know XOR with another one we don't. We get zero information about an individual bit knowing the result of the XOR. Replace "bit" with "prime" and multiplication with XOR, and you have the factoring problem. It's as if you've multiplied two random numbers together, and you get very little information from product (instead of zero information).
If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number.
Even ignoring that the divisibility test will take longer for bigger numbers, this approach takes almost twice as long if you just add a single (binary) digit to n. (Actually it will take twice as long if you add two digits)
I think that is the definition of exponential runtime: Make n one bit longer, the algorithm takes twice as long.
But note that this observation applies only to the algorithm you proposed. It is still unknown if integer factorization is polynomial or not. The cryptographers sure hope that it is not, but there are also alternative algorithms that do not depend on prime factorization being hard (such as elliptic curve cryptography), just in case...

Resources