How should I properly represent multi-variable complexity with Google benchmark? - google-benchmark

The Google microbenchmark library supports estimating complexity of an algorithm but everything is expressed by telling the framework what the size of your N is. I'm curious what the best way to represent M+N algorithms in this framework is. In my test I go over a cartesian product of M & N values in my test.
Do I call SetComplexityN with M+N (& for O(MN) algorithms I assumed SetComplexityN is similarly M*N)? If I wanted to hard-code the complexity of the algorithm (vs doing best fit) does benchmark::oN then map to M+N and benchmark::oNSquared maps to O(MN)?

It's not something we've yet considered in the library.
If you set the complexity to M+N and use oN then the fitting curve used for the minimal least square calculation will be linear in M+N.
However, if you set the complexity to M*N and use oNSquared then we'll try to fit to pow(M*N, 2) which is likely not what you want, so I think still using oN would be appropriate.

Related

What the most efficient way to implement n choose k using c++

I am trying to implement RSA crypto system using some equations to get better time for decryption.
The problem that I have is a huge numbers in the function that calculate "n choose k", the factorial for huge numbers take a lot of time.
When I start wrote the code I wrote it with naive calculation, but now I am seeing that the program running time very long, even if I compare to the original RSA.
Also I use in GMP library for big numbers, but it doesn't affecting on the problem, I hope.
Using Pascal's triangle is a fast method for calculating n choose k. You can refer to the answer here for more info.
The fastest method I know of would be to make use of the results from "On the Complexity of Calculating Factorials". Just calculate all 3 factorials, then perform the two division operations, each with complexity M(n logn).

Big Theta of Factorial Multiplied by a Coefficient

For a function with a run time of (cn)! where c is a coefficient >= 0 and c != n, would the tight bound of the run be Θ(n!) or Θ((cn)!)? Right now, I believe it would be Θ((cn)!) since they would differ by a coefficent >= n since cn != n.
Thanks!
Edit: A more specific example to clarify what I'm asking:
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
You can use Stirling's approximation to get that if c>1 then (cn)! is asymptotically larger than pow(c,n)*n!, which is not O(n!) since the quotient diverges. As a more elementary approach consider this example for c=2: (2n)!=(2n)(2n-1)...(n+1)n!>n!n! and (n!n!)/n!=n! diverges, so (2n)! is NOT O(n!).
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
I think there are two answers to your question.
The shorter one is from the purely theoretical point of view. Of those 3 only the n! lies in the class of Θ(n!). The second lies in the O(n!) (note big-O instead of big-Theta) and (7n)! is slower than Θ(n!), it lies in Θ((7n)!)
There is also a longer but more practical answer. And to get to it we first need to understand what is the big deal with this whole big-O and big-Theta business in the first place?
The thing is that for many practical tasks there are many algorithms and not all of them are equally or even similarly efficient. So the practical question is: can we somehow capture this difference in performance in an easy to understand and compare way? And this is the problem that big-O/big-Theta are trying to solve. The idea behind this method is that if we look at some algorithm with some complicated real formula for the exact time, there is only 1 term that grows faster than all others and thus dominates the time as the problem gets bigger. So let's compress this big formula to that dominant term. Then we can compare those terms and if they are different, we can easily say which is the better algorithm (7*n^2 is clearly better than 2*n^3).
Another idea is that the term "operation" is usually not that well defined at the level people usually think about algorithms. Which "operation" actually maps to a single CPU instruction and which to a few depends on many factors such as particular hardware. Also the instructions themselves can take different time to execute. Moreover sometimes the algorithm's working time is dominated by memory access than CPU instructions and those components are not easily additive. The morale of this story is that if two algorithms are different only in a scalar coefficient, you can't really compare those algorithms just theoretically. You need to compare some implementations in some particular environment. This is why algorithms complexity measure typically boils down to something like O(n^k) where k is a constant.
There is one more consideration: practicality. If the algorithm is some polynomial, there is a huge practical difference between cases a=3 and a=4 in O(n^a). But if it is something like O(2^(n^a)), then there is not much difference what exactly the a as along as a>1. This is because 2^n grows fast enough to make it impractical for almost any realistic n irrespective of a. So in practical terms it is often good enough approximation to put all such algorithms into a single "exponential algorithms" bucket and say they are all impractical even despite the fact there is a huge difference between them. This is where some mathematically unconventional notations like 2^O(n) come from.
From this last practical perspective the difference between Θ(n!) and Θ((7n)!) is also very little: both are totally impractical because both lie beyond even the exponential bucket of 2^O(n) (see Stirling's formula that shows that n! grows a bit faster than (n/e)^n). So it makes sense to put all such algorithms in another bucket of "factorial complexity" and mark them as impractical as well.

How does one actually implement polynomial multiplications using FFT?

I have been studying this topic in an Algorithms textbook.
The clever usage of the complex roots of unity seems to be mathematically working. However, I do not understand how one could actually represent this in a computer.
I can think of two things:
Use the real/imaginary decomposition to represent the complex numbers. But this means using floats, which means I open up my algorithm to numerical error and I would loose out precision even if I want to multiply two polynomials with integer coefficients.
Represent exp(i 2pi/n) as n. So, I'd eventually get a tuple in omega, and if I have to keep it in this form, I'd essentially be doing polynomial multiplication in omega again, taking us back to square one.
I'd really like to see an implementation of this algorithm in a familiar programming language.
Indeed as you identify, the roots of unity are typically not nice numbers that can be stored well in a computer. Since the numerical error is small, if you know the output should be integers, rounding usually produces the right result.
If you don't want to (or cannot) rely on that, an exact option is the Number Theoretic Transform. It substitutes the roots of unity in the complex plane with roots of unity in a finite field ℤ/pℤ where p is a suitable prime. p has to be large enough for all the necessary roots to exist, and the efficiency is affected by properties of p. If you choose a Fermat prime then the roots of unity have convenient forms and there is a trick to do reduction modulo p more efficiently than usual. That is all exact integer arithmetic and the values stay small, so there is no problem implementing it in a computer.
That technique is used in the Schönhage–Strassen algorithm so you can look up the specifics there.

Is there an algorithm better than O(N²) to determine if matrix is symmetric?

Algorithm requirements
Input is an arbitrary square matrix M of size N×N, which just fits in memory.
The algorithm's output must be true if M[i,j] = M[j,i] for all j≠i, false otherwise.
Obvious solutions
Check if the transpose equals the matrix itself (MT=M). Easiest to program in many environments, but (usually) consumes twice the memory and requires N² comparisons worst case. Therefore, this is O(N²) and has high peak memory.
Check if the lower triangular part equals the upper triangular part. Of course, the algorithm returns on the first inequality found. This would make the worst case (worst case being, the matrix is indeed symmetric) require N²/2 - N comparisons, since the diagonal does not need to be checked. So although it is better than option 1, this is still O(N²).
Question
Although it's hard to see how it would be possible (the N² elements will all have to be compared somehow), is there an algorithm doing this check that is better than O(N²)?
Or, provided there is a proof of non-existence of such an algorithm: how to implement this most efficiently for a multi-core CPU (Intel or AMD) taking into account things like cache-friendliness, optimal branch prediction, other compiler-specific specializations, etc.?
This question stems mostly from academic interest, although I imagine a practical use could be to determine what solver to use if the matrix describes a linear system AX=b...
Since you will have to examine all the elements except the diagonal, the complexity IMO can't be better than O (n^2).
For a dense matrix, the answer is a definite "no", because any uninspected (non-diagonal) elements could be different from their transposed counterparts.
For standard representations of a sparse matrix, the same reasoning indicates that you can't generally do better than the input size.
However, the same reasoning doesn't apply to arbitrary matrix representations. For example, you could store sparse representations of the symmetric and antisymmetric components of your matrix, which can easily be checked for symmetry in O(1) time by checking if antisymmetric element has any components at all...
I think you can take a probabilistic approach here.
I think it's not a chance/coincidence that x randomly picked lower coordinate elements will match to their upper triangular counter part. The chance is very high that the matrix is indeed symmetric.
So instead of going through all the ½n² - n elements you can check p random coordinates and tell if the matrix is symmetric with confidence:
p / (½n² - n)
you can then decide a threshold above which you believe that the matrix must be a symmetric matrix.

Complexity of arbitrary matrix multiplications

I've got a simple question concerning the implementation of matrix multiplications. I know that there are algorithms for matrices of equal size (n x n) that have a complexity of O(n^2.xxx). But if I have two matrices A and B of different sizes (p x q, q x r), what would be the minimal complexity of the implementation to date? I would guess it is O(pqr) since I would implement a multiplication with 3 nested loops with p, q and r iterations. In particular, does anyone now how the Eigen library implements a multiplication?
A common technique is to pad matrices with size (p*q, q*r), so that their sizes becomes (n*n). Then, you can apply Strassen's algorithm.
You're correct about it being O(pqr) for exactly the reasons you stated.
I'm not sure how Eigen implements it, but there are many ways to optimize matrix multiplication algorithms, such as optimizing cache performance by tiling and being aware of whether the language you're using is row major or column major (you want the inner loops to access memory in as small steps as possible to prevent cache misses). Some other optimization techniques are detailed here.
As Yu-Had Lyu mentioned you can pad them with zeroes, but unless p,q and r are close the complexity degenerates (time to perform the padding).
To answer your other question about how Eigen implements it:
The way numerics packages implement matrix multiplication is normally by using the typical O(pqr) algorithm, but heavily optimised in `non-mathematical' ways: blocking for better cache locality, using special processor instructions (SIMD etc.)
Some packages (MATLAB, Octave, ublas) use two libraries called BLAS and LAPACK which provide linear algebra primitives (like matrix multiplication) heavily optimised in this way (sometimes using hardware-specific optimisations).
AFAIK, Eigen simply uses blocking and SIMD instructions.
Few common numeric libraries (Eigen included) use Strassen Algorithm. The reason for this is actually very interesting: while the complexity is better (O(n^(log2 7))) the hidden constants behind the big Oh are very large due to all the additions performed - in other words the algorithm is only useful in practice for very large matrices.
Note: There is an even more efficient (in terms of asymptotic complexity) algorithm than Strassen's algorithm: the Coppersmith–Winograd algorithm with O(n^(2.3727)), but for which the constants are so large, that it is unlikely that it will ever be used in practice. It is in fact believed that there exists an algorithm that runs in O(n^2) (which is the trivial lower-bound, since any algorithm needs to at least read the n^2 elements of the matrices).

Resources